20220714
在git上面查原始的建表语句
20220308
rebuilding Selector io.netty.channel.nio.SelectedSelectionKeySetSelector@2f312c56.
操作次数太大可能上亿
Failed to connect to master k8s04:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
hadoop和spark服务没开启
20220214
SQL 错误 [1064] [42000]: there is no scanNode Backend
SQL 错误: Error executing query
服务器连不上,堡垒机出问题了或者大数据组件没成功启动
各组件ui控制台打不开很可能就是防火墙没关
https://jingyan.baidu.com/article/ff42efa9fd8c1cc19e2202bb.html
18/11/20 16:44:44 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message. org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
在配置资源中加入这句话也许能解决你的问题:–conf spark.dynamicAllocation.enabled=false
https://www.codercto.com/a/39980.html
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1091099277-192.168.100.251-
https://blog.youkuaiyun.com/weixin_44519124/article/details/107211857
HIVE启动错误:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeExcept
https://blog.youkuaiyun.com/qq_34885598/article/details/84935218
20211229
https://blog.youkuaiyun.com/u010374412/article/details/103148374
Spark-SparkSession.Builder 源码解析
spark.master 可以是 local、lcoal[*]、local[int]
20211220
Doris暂不支持update,如果是主键,可以重新insert新数据
或者先delete 再insert
20211116
SQL 错误 [1]: Query failed (#20211116_082325_00132_qzuea): line 4:5: backquoted identifiers are not supported; use double quotes to quote identifiers
`level`飘号去掉
SQL 错误 [84148226]: Query failed (#20211116_083721_00144_qzuea): Exceeded limit of 100 open writers for partitions
写入数据的时候
因为不同的数据源导致的问题 改成同一个数据源
dt分区字段放在最后
SQL 错误 [47]: Query failed (#20211116_090238_00166_qzuea): line 26:3: Column 'primary_key_id' cannot be resolved
字段未编写对
20210831
Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
https://blog.youkuaiyun.com/weixin_45171416/article/details/107525222
记录一次定位spark shuffle总是报connection reset by peer的问题
https://blog.youkuaiyun.com/zhuge134/article/details/86556319
“spark”+“java.lang.StackOverflowError”
https://blog.youkuaiyun.com/u010936936/article/details/88363449
java.io.ioexception: 磁盘空间不足,Spark:java.io.IOException:设备上没有剩余空间
https://blog.youkuaiyun.com/kwu_ganymede/article/details/49094881
https://blog.youkuaiyun.com/weixin_35399893/article/details/118842634
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
spark 连接问题 估计和sparkui有关
20210826
python 通过pyhive 连接 hive
import pandas as pd
from sqlalchemy.engine import create_engine
from pyhive import hive
conn = hive.connect(host='192.168.1.73', port=10000, database='ztdata_hive', username='root')
cursor = conn.cursor()
sql = "select * from tb_customer_sec_type"
cursor.execute(sql)
for i in cursor:
print(i)
pyhive报错Could not start SASL: b‘Error in sasl_client_start (-4) SASL(-4)
https://blog.youkuaiyun.com/ssgmoshou/article/details/107767680
import os
os.environ['JAVA_HOME']=r'D:\Java\jdk1.8.0_301'
import findspark
findspark.init()
from pyspark.sql import SparkSession
pyspark导包顺序
pyspark 不支持中文路径名称
20210823
1.所用的mysql驱动要和服务器的mysql版本对应
2.python 要装64位
3.spark conf 目录下 的conf 文件对应驱动路径要改
4.java 1.8 有两个 jre 目录
5. 各种环境变量路径不要搞错了
6. parcharm下 要安装 pyspark
7. cmd 中文乱码,应该是spark路径没配对
pyspark 环境搭建可能出错的原因
pip安装locust时报错-ERROR: Could not build wheels for gevent which use PEP 517 and cannot be installed
https://blog.youkuaiyun.com/ly021499/article/details/103288570
安装版本可能要改了
20210820
[Spark] DataFram读取JSON文件异常 出现 Since Spark 2.3, the queries from raw JSON/CSV files are disallowed…
从文件读只能通过schema来读
pyspark program : CreateProcess error=5, 拒绝访问。
环境变量中没有所谓的 pyspark_home
py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils… does not exist in the JVM
https://www.javatt.com/p/46998
py4j.protocol.Py4JJavaError: An error occurred while calling o45.load.
也有可能是数据库驱动的问题
dfs.namenode.rpc-address 172.16.1.102:8080xml 配置格式
数据库驱动找不到 把 connector jar包 copy 到lib下面
hive-site.xml的 数据库驱动包路径不要改
File “D:\dashuju\spark-3.0.0-bin-hadoop2.7\python\lib\py4j-0.10.9-src.zip\py4j\protocol.py”, line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o43.jdbc.
Mysql错误:The server time zone value is unrecognized or represents more than one time zone
https://blog.youkuaiyun.com/zjccsg/article/details/69254134
https://blog.youkuaiyun.com/hy_coming/article/details/104128024 重点修改my.ini
https://blog.youkuaiyun.com/sxeric/article/details/113832302
py4j.protocol.Py4JJavaError: An error occurred while calling o43.jdbc.
: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
pyspark myslq 服务没启动
错误:Error: JAVA_HOME is incorrectly set. Please update $HADOOP_HOME\etc\hadoop\hadoop\hadoop-env.cmd
https://blog.youkuaiyun.com/weixin_39971186/article/details/88842359
这种情况是因为jdk 安装路径有空格 program file 文件夹最好不要装在这个文件夹
win10 Java环境变量,hadoop 环境变量
https://www.cnblogs.com/lijins/p/10091485.html
ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.ClassNotFoundException: main.s
python not found 也是把版本
spark,hadoop,pyspark 的版本不对
根据pyspakr的官方文档下载对应的 spark和hadoop 版本
Spark Python error“FileNotFoundError:[WinError 2]系统找不到指定的文件”
重启pycharm
pyspark解决报错“py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled ”
https://blog.youkuaiyun.com/Together_CZ/article/details/90402660
ERROR: Could not import pypandoc - required to package PySpark
用pip安装pypandoc即可
装pyspark的时候出现的问题
pip 安装 pypandoc 即可
Exception: Java gateway process exited before sending the driver its port number(以解决)附源码
还有可能是 java版本的问题 换成
# os.environ['JAVA_HOME']="D:\Java\jdk1.8.0_301"
报错消失
https://blog.youkuaiyun.com/a2099948768/article/details/79580634
Could not reserve enough space for 2097152KB object heap pyspark
把java 换成64位
pyspark–报错java.util.NoSuchElementException: key not found: _PYSPARK_DRIVER_CALLBACK_HOST解决
https://blog.youkuaiyun.com/a200822146085/article/details/89467002
pyspark异常处理之:java.lang.OutOfMemoryError: Java heap space
https://blog.youkuaiyun.com/a5685263/article/details/102265838
20210819
Spark :【error】System memory 259522560 must be at least 471859200
https://blog.youkuaiyun.com/qq_30505673/article/details/84992068?utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-1.control&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-1.control
https://www.cnblogs.com/drl-blogs/p/11086826.html
https://blog.youkuaiyun.com/aubekpan/article/details/85329768
Idea Maven构建Scala项目Cannot connect compileserver 解决方法
ava.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200
https://blog.youkuaiyun.com/weixin_43322685/article/details/82961748
scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps
https://blog.youkuaiyun.com/weixin_42129080/article/details/80961878
当前目录导包报错 直接按alt+enter 选第一个
idea + spark 报错:object apache is not a member of package org
https://blog.youkuaiyun.com/xl132598798/article/details/105695593
导入spark 安装目录的jar包 就可以
IDEA安装完插件Scala后 通过add frameworks support找到不到scala插件
https://blog.youkuaiyun.com/weixin_43520450/article/details/108677784
https://blog.youkuaiyun.com/tanhaodi2012/article/details/100182735
idea 无法创建Scala class 选项解决办法汇总
https://www.cnblogs.com/libaoquan/p/9004531.html
pom文件配置spark的依赖