Spark踩坑

本文记录了在使用Spark SQL与Elasticsearch进行数据交互时遇到的问题及解决方案,包括问题1、问题2的详细描述,并提供了一个简单的代码修复示例用于解决第四类问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Spark sql以elasticSearch为数据源,访问数据,问题记录表:

问题1:

java.lang.RuntimeException: java.io.InvalidClassException: org.apache.spark.rpc.netty.RequestMessage; local class incompatible: stream classdesc serialVersionUID = -2221986757032131007, local class serialVersionUID = -5447855329526097695

at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)

at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)

at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

解决办法:客户端spark版本和spark服务层版本不一致。统一客户端jar和服务端版本解决。

问题2:

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:F:/WorkSpace/Work/AI+/AI+doc/spark/spark/spark-warehouse
	at org.apache.hadoop.fs.Path.initialize(Path.java:206)
	at org.apache.hadoop.fs.Path.<init>(Path.java:172)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:114)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
	at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
	at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
	at org.apache.spark.sql.internal.SessionState$$anon$1.<init>(SessionState.scala:112)
	at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
	at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
	at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)


未指定spark.sql.warehouse.dir路径。

代码中添加一行即可解决。

conf.set("spark.sql.warehouse.dir", "F:/spark-warehouse");

问题3:

org.apache.spark.sql.AnalysisException: Table not found: emobilelog; line 1 pos 21  
    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)  
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:305)  
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:314)  
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:309)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)  
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)  
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)  

解决方案:需JavaEsSparkSQL来指定es的index并创建关联表。

Dataset<Row> log = JavaEsSparkSQL.esDF(sqlContext, "aipplog_2017_05");
		log.createOrReplaceTempView("aipplog");
		// log.show();
		Dataset<Row> dataset = sqlContext.sql(sql);

问题4:

17/09/22 09:43:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/09/22 09:44:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/09/22 09:43:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/09/22 09:44:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

本地测试不需要连接服务端spark,

conf.setMaster(PropReader.getString(Constants.sparkUrl).trim());
spark.url=local
指定本地模式。 传递给spark的master url可以有如下几种:

local 本地单线程
local[K] 本地多线程(指定K个内核)
local[*] 本地多线程(指定所有可用内核)
spark://HOST:PORT 连接到指定的 Spark standalone cluster master,需要指定端口。
mesos://HOST:PORT 连接到指定的 Mesos 集群,需要指定端口。
yarn-client客户端模式 连接到 YARN 集群。需要配置 HADOOP_CONF_DIR。
yarn-cluster集群模式 连接到 YARN 集群 。需要配置 HADOOP_CONF_DIR。



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值