- 博客(27)
- 收藏
- 关注
原创 INVALID_COLUMN_NAME _AS_PATH
[INVALID_COLUMN_NAME _AS_PATH] The datasource HiveFileFormat cannot save the column min(birth_date) because its name contains some characters that are not allowed in file paths. Piease, use an allas to rename i
2025-06-12 10:45:28
65
原创 spark 执行 hive sql数据丢失
摘要:使用spark-sql执行hive查询时,发现结果比hive-sql少了2条数据。原因是表中user_id字段为CHAR类型,在spark中会自动补空格,而hive不会。解决方案是使用TRIM()函数处理user_id字段,如:select date,user_id,pay from dim.isr_pay_failed where trim(user_id)='*******'。其他类型字段如TIMESTAMP也需要注意spark和hive的处理差异。(135字)
2025-06-06 16:41:12
846
原创 pyspark依赖环境设置
Cannot run program "python3": error=13, Permission denied, pyspark spark-submit pyspark_python export, spark hive
2025-04-21 12:03:27
404
原创 hive explode函数使用异常,数据不准(丢失)
hive explode lateral view can not find genColumnStats
2025-04-17 20:56:28
361
原创 nullsCount must be less than or ual to rowcount. nullsCount: 2803359. rowCount:
presto nullcount rowcount hive metastore statistic
2025-03-21 23:37:39
282
原创 spark: Required field ‘filesAdded‘ is unset!
Required field 'filesAdded' is unset! Struct:InsertEventRequestData(files), hive , filesAdded, unset
2025-03-12 15:56:20
336
原创 AssertionError: Invalid size estimation for T-Digest
presto -sql AssertionError T-Digest
2025-03-10 14:21:57
129
原创 spark driver: Failed to allocate
spark driver failed allocate page broadcast
2024-12-19 19:36:53
678
原创 Missing parentheses in call to ‘print‘. Did you mean print(rack)
Missing parentheses in call to 'print'. print(rack) spark ambari
2024-11-02 18:27:19
228
原创 ScriptBasedMapping: Script /etc/hadoop/conf/topology_script-py
ScriptBasedMapping: Script /etc/hadoop/conf/topology_script-py returned 13 values when 9 were expected.
2024-11-02 18:24:33
481
原创 unable to inherit permissions for file hdfs://
unable to inherit permission for hdfs
2024-11-02 17:49:01
282
原创 org.apache.hadoop.hive.qlmetadata.HiveException: Attempting to overwrite nextKeyWritables[1]
Attempting to overwrite nextKeyWritables[1] hiveeception hive hadoop
2024-10-31 21:40:00
198
原创 Falling back to /default-rack for all
ambri spark hadoop Failing back default-rack
2024-10-29 14:55:40
296
1
原创 spark load data is not owner by xxx also not ran as xxx
caused by hive metastore load data is not owner by and not ran as xxx
2024-10-16 11:48:28
421
原创 Output column number expected to be 0 when isRepeating
hive column 0 is Repeating COALESCE
2024-10-16 11:03:30
405
原创 java.util.concurrent. TimeoutException: Futures timed out after
java.util.concurrent. TimeoutException: Futures timed out after [300 seconds]
2024-09-04 20:40:36
372
原创 Container launch failed for container,This token is expired.
container failed toiken expired System time out of time
2024-08-12 18:03:11
395
原创 hdp hive创建表失败:Duplicate entry ‘tmp_xw_order_infos_channel-41‘ for key ‘UNIQUETABLE
排查了很久,也没有在hive 中找见这个表,最后怀疑是mysql TBLS 在执行完删除这张表的时候没有删除干净造成。因为管理问题,无法排查。
2024-07-26 15:52:04
524
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人