自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(27)
  • 收藏
  • 关注

原创 INVALID_COLUMN_NAME _AS_PATH

[INVALID_COLUMN_NAME _AS_PATH] The datasource HiveFileFormat cannot save the column min(birth_date) because its name contains some characters that are not allowed in file paths. Piease, use an allas to rename i

2025-06-12 10:45:28 65

原创 spark 执行 hive sql数据丢失

摘要:使用spark-sql执行hive查询时,发现结果比hive-sql少了2条数据。原因是表中user_id字段为CHAR类型,在spark中会自动补空格,而hive不会。解决方案是使用TRIM()函数处理user_id字段,如:select date,user_id,pay from dim.isr_pay_failed where trim(user_id)='*******'。其他类型字段如TIMESTAMP也需要注意spark和hive的处理差异。(135字)

2025-06-06 16:41:12 846

原创 pyspark依赖环境设置

Cannot run program "python3": error=13, Permission denied, pyspark spark-submit pyspark_python export, spark hive

2025-04-21 12:03:27 404

原创 手动指定pyhon依赖和依赖兼容问题解决

pandas pyspark Required

2025-04-21 11:40:31 368

原创 hive explode函数使用异常,数据不准(丢失)

hive explode lateral view can not find genColumnStats

2025-04-17 20:56:28 361

原创 nullsCount must be less than or ual to rowcount. nullsCount: 2803359. rowCount:

presto nullcount rowcount hive metastore statistic

2025-03-21 23:37:39 282

原创 spark: Required field ‘filesAdded‘ is unset!

Required field 'filesAdded' is unset! Struct:InsertEventRequestData(files), hive , filesAdded, unset

2025-03-12 15:56:20 336

原创 AssertionError: Invalid size estimation for T-Digest

presto -sql AssertionError T-Digest

2025-03-10 14:21:57 129

原创 yyyy,YYYY引发时间异常问题

yyyy, YYYY , date_format, hive, spark , java, jdk

2024-12-30 14:00:50 500

原创 spark driver: Failed to allocate

spark driver failed allocate page broadcast

2024-12-19 19:36:53 678

原创 parquet和textfiel数据量不一致问题

spark , parquet, presto 数据不一致,结果不一致

2024-12-19 19:20:45 252

原创 Missing parentheses in call to ‘print‘. Did you mean print(rack)

Missing parentheses in call to 'print'. print(rack) spark ambari

2024-11-02 18:27:19 228

原创 ScriptBasedMapping: Script /etc/hadoop/conf/topology_script-py

ScriptBasedMapping: Script /etc/hadoop/conf/topology_script-py returned 13 values when 9 were expected.

2024-11-02 18:24:33 481

原创 unable to inherit permissions for file hdfs://

unable to inherit permission for hdfs

2024-11-02 17:49:01 282

原创 org.apache.hadoop.hive.qlmetadata.HiveException: Attempting to overwrite nextKeyWritables[1]

Attempting to overwrite nextKeyWritables[1] hiveeception hive hadoop

2024-10-31 21:40:00 198

原创 Falling back to /default-rack for all

ambri spark hadoop Failing back default-rack

2024-10-29 14:55:40 296 1

原创 spark load data is not owner by xxx also not ran as xxx

caused by hive metastore load data is not owner by and not ran as xxx

2024-10-16 11:48:28 421

原创 Output column number expected to be 0 when isRepeating

hive column 0 is Repeating COALESCE

2024-10-16 11:03:30 405

原创 ambari启动hive 失败

hive 启动失败

2024-09-20 14:20:56 425

原创 ambari 管理节点 HeartLost

ambari node heartlost

2024-09-20 11:25:58 449

原创 java.util.concurrent. TimeoutException: Futures timed out after

java.util.concurrent. TimeoutException: Futures timed out after [300 seconds]

2024-09-04 20:40:36 372

原创 insufficient memory

idea 编译 oom, insufficient memory。

2024-08-23 16:43:01 243

原创 Container launch failed for container,This token is expired.

container failed toiken expired System time out of time

2024-08-12 18:03:11 395

原创 namenode 莫名其妙挂了

namenode , hdfs

2024-08-01 20:37:03 359

原创 Lock wait timeout exceeded

hive insert overwrite Error Lock wait time

2024-08-01 20:29:14 314

原创 hdp hive创建表失败:Duplicate entry ‘tmp_xw_order_infos_channel-41‘ for key ‘UNIQUETABLE

排查了很久,也没有在hive 中找见这个表,最后怀疑是mysql TBLS 在执行完删除这张表的时候没有删除干净造成。因为管理问题,无法排查。

2024-07-26 15:52:04 524

原创 hive 创建表获取mysql锁超时

大数据、hive、msyql

2024-07-26 14:54:32 512

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除