from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("spark_app").getOrCreate()
查看spark运行 版本
spark-submit --version
[main] WARN org.apache.spark.util.Utils - Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 10.255.132.9 instead (on interface ens3)
1 [main] WARN org.apache.spark.util.Utils - Set SPARK_LOCAL_IP if you need to bind to another address
391 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
1571 [Thread-5] WARN org.apache.spark.util.Utils - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
3.1.2
Traceback (most recent call last):
File "/home/byzerllm/softwares/test6.py", line 21, in <module>
spark = SparkSession.builder.config(conf=conf).getOrCreate()
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/pyspark/sql/session.py", line 500, in getOrCreate
session = SparkSession(sc, options=self._options)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/pyspark/sql/session.py", line 589, in __init__
jsparkSession = self._jvm.SparkSession(self._jsc.sc(), options)
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/py4j/java_gateway.py", line 1587, in __call__
return_value = get_return_value(
File "/home/byzerllm/miniconda3/envs/byzerllm-dev/lib/python3.10/site-packages/py4j/protocol.py", line 330, in get_return_value
raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.sql.SparkSession. Trace:
py4j.Py4JException: Constructor org.apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) does not exist
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179)
at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196)
at py4j.Gateway.invoke(Gateway.java:237)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
解决方法:
pip install pyspark==3.2.1
参考: