spark 常用配置参数调优

本文介绍了Spark参数调优的方法及实例,包括解决内存溢出、连接重置等问题,并提供了资源调整、自适应框架、动态资源分配等方面的配置建议。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

spark 参数调优

(spark.sql.hive.metastore.version,1.2.1)


三.ERROR

问题1:

ERROR YarnScheduler: Lost executor 53 on node100p32: Container killed by YARN for exceeding memory limits.
10.0 GB of 10 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.

解决:

暂时换用 hive: 控制 reduce数750,过程中 Allocated memory max约3.3T,20个Job 正好8小时 。

-- set mapreduce.map.memory.mb=3000;
-- set mapreduce.reduce.memory.mb=6000;
set hive.hadoop.supports.splittable.combineinputformat=true;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.max.split.size=512000000;
set mapred.min.split.size.per.node=128000000;
set mapred.min.split.size.per.rack=128000000;
set hive.merge.mapfiles=true;
set hive.map.aggr=true;
set hive.merge.smallfiles.avgsize=128000000;
set hive.exec.reducers.max=750;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=1500;
set hive.exec.max.dynamic.partitions.pernode=1500;

原因分析(待写):

set mapreduce.map.memory.mb=2048
set mapreduce.reduce.memory.mb=6000;
set spark.yarn.executor.memoryOverhead
set yarn.nodemanager.vmem-check-enabled
set hive.groupby.skewindata=true;
set hive.optimize.skewjoin=true;
set hive.skewjoin.key=5000000;

一.spark常用配置

1.spark-sql:

spark-sql --name “$0”
–master yarn --deploy-mode client --queue deve
–driver-memory 4g --executor-memory 6g --num-executors 50 --executor-cores 3
–conf spark.dynamicAllocation.enabled=true
–conf spark.shuffle.service.enabled=true
–conf spark.dynamicAllocation.minExecutors=20
–conf spark.dynamicAllocation.maxExecutors=56
–conf spark.sql.adaptive.enabled=true
–conf spark.sql.adaptive.maxNumPostShufflePartitions=500
–conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=256000000
–conf spark.yarn.executor.memoryOverhead=1200m
-i /opt/data/dev/util/spark_com.sql
–hiveconf hive.cli.print.header=true
–hiveconf hive.resultset.use.unique.column.names=false
–conf ‘spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/data/dev/spark/log4j.properties’
-v -e
" ${sql_query_insert} "

2.spark-submit:

spark-submit --master yarn --queue deve
–driver-memory 6G --executor-memory 7G --num-executors 32 --executor-cores 3
–conf spark.yarn.executor.memoryoverhead=8096M
–conf spark.sql.shuffle.partitions=1000
–conf spark.default.parallelism=150
–conf spark.shuffle.service.enabled=true
–conf spark.shuffle.service.port=
–class com.ecnomic.test
/package/package.jar 2 2020-10-01 2020-10-01 > /log.log 2>&1

3.udf加载方式:(hive的udf不用考虑线程安全,而spark的udf需考虑线程安全)

方式1.初始化文件: spark-sql -i /opt/data/dev/util/spark_com.sql
方式2.source: source /opt/data/dev/util/spark_com.sql;

such:
add jar /opt/data/lib/udf.jar;
create temporary function udf_date_format as ‘com.hive.udf.DateFormat’; spark/hive -e " source /opt/data/dev/util/spark_com.sql; select * from table_test limit 5;"


二.资源调整

mapreduce.map.memory.mb=3000  指定这个mapreduce任务运行时内存的大小
mapreduce.reduce.memory.mb=6000  
spark.yarn.executor.memoryoverhead=6000     解决OOM,调节对外内存大小,以满足JVM自身的开销
spark.shuffle.service.enabled=true          NodeManager中一个长期运行的辅助服务,用于提升Shuffle计算性能。默认为false,表示不启用该功能。
    (1).Spark系统在运行含shuffle过程的应用时,Executor进程除了运行task,还要负责写shuffle数据,给其他Executor提供shuffle数据。
        当Executor进程任务过重,导致GC而不能为其他Executor提供shuffle数据时,会影响任务运行。
    (2).External shuffle Service是长期存在于NodeManager进程中的一个辅助服务。通过该服务来抓取shuffle数据,减少了Executor的压力,
        在Executor GC的时候也不会影响其他Executor的任务运行。
        
参考: https://blog.youkuaiyun.com/zuodaoyong/article/details/107172810 Spark之Shuffle参数调优解析

1.自适应框架

spark.sql.adaptive.enabled 自适应执行框架的开关,默认 false,启用 Adaptive Execution ,从而启用自动设置 Shuffle Reducer 特性
spark.sql.adaptive.minNumPostShufflePartitions 默认 1,reduce个数区间最小值
spark.sql.adaptive.maxNumPostShufflePartitions 默认 500,reduce个数区间最大值
spark.sql.adaptive.shuffle.targetPostShuffleInputSize 默认为67108864(64MB),动态调整reduce个数的partition大小依据,为每个Reducer读取的目标数据量,如设置64MB则reduce阶段每个task最少处理64MB的数据,一般改成集群块大小
spark.sql.adaptive.shuffle.targetPostShuffleRowCount 默认为20000000 动态调整reduce个数的partition条数依据,如设置20000000则reduce阶段每个task最少处理20000000条的数据
参考:https://blog.youkuaiyun.com/qq_14950717/article/details/105302842 Spark-SQL adaptive 自适应框架

2.动态资源 :

spark.dynamicAllocation.enabled 是否开启动态资源配置,根据工作负载来衡量是否应该增加或减少executor,默认false
spark.shuffle.service.enabled=true **
spark.dynamicAllocation.minExecutors 动态分配最小executor个数,在启动时就申请好的,默认0,初始executor数量
spark.dynamicAllocation.maxExecutors 动态分配最大executor个数,(默认infinity,默认是无限制的。## 待验证)
spark.dynamicAllocation.initialExecutors 动态分配初始executor个数默认值=spark.dynamicAllocation.minExecutors,如果–num-executors设置的值比这个值大,那么将使用–num-executors设置的值作为初始executor数量。
spark.dynamicAllocation.executorIdleTimeout 当某个executor空闲超过这个设定值,就会被kill,默认60s
spark.dynamicAllocation.cachedExecutorIdleTimeout 如果executor内有缓存数据(cache data),并且空闲了N秒。则remove该executor。默认值无限制。
spark.dynamicAllocation.schedulerBacklogTimeout 任务队列非空,资源不够,申请 executor的时间间隔,默认1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 同schedulerBacklogTimeout,是申请了新executor之后继续申请的间隔,默认=schedulerBacklogTimeout
参考: https://blog.youkuaiyun.com/zyzzxycj/article/details/82256893

3.数据倾斜

spark.sql.adaptive.enabled 默认
false,自适应执行框架的开关 spark.sql.adaptive.skewedJoin.enabled
默认 false 倾斜处理开关 spark.sql.adaptive.skewedPartitionFactor
默认 10 当一个partition的size大小 大于 该值乘以所有parititon大小的中位数 且
大于spark.sql.adaptive.skewedPartitionSizeThreshold,或者parition的条数大于该值乘以所有parititon条数的中位数且
大于 spark.sql.adaptive.skewedPartitionRowCountThreshold,
才会被当做倾斜的partition进行相应的处理
spark.sql.adaptive.skewedPartitionSizeThreshold 默认 67108864
倾斜的partition大小不能小于该值,该值还需要参照HDFS使用的压缩算法以及存储文件类型(如ORC、Parquet等)
spark.sql.adaptive.skewedPartitionRowCountThreshold 默认 10000000
倾斜的partition条数不能小于该值 spark.shuffle.statistics.verbose
默认 false 打开后MapStatus会采集每个partition条数的信息,用于倾斜处理

参考:https://blog.youkuaiyun.com/qq_14950717/article/details/105302842 Spark-SQL adaptive 自适应框架

4. 内存管理

参见:https://www.iteblog.com/archives/2342.html
https://blog.youkuaiyun.com/zyzzxycj/article/details/81011540
https://my.oschina.net/freelili/blog/1853714
https://blog.yoodb.com/sugarliny/article/detail/1307

三.ERROR

问题2:

WARN TaskSetManager: Lost task 90.0 in stage 17.0 (TID 8770, n20p191,
executor 136): FetchFailed(BlockManagerId(65, n20p193, 7337, None),
shuffleId=3, mapId=247, reduceId=90, message=
org.apache.spark.shuffle.FetchFailedException: Connection reset by
peer
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:554)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:485)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:64)
at scala.collection.IteratorKaTeX parse error: Can't use function '$' in math mode at position 5: anon$̲12.nextCur(Iter…anon12.hasNext(Iterator.scala:441)atscala.collection.Iterator12.hasNext(Iterator.scala:441) at scala.collection.Iterator12.hasNext(Iterator.scala:441)atscala.collection.Iterator$anon11.hasNext(Iterator.scala:409)atorg.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)atorg.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)atscala.collection.Iterator11.hasNext(Iterator.scala:409) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at scala.collection.Iterator11.hasNext(Iterator.scala:409)atorg.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)atorg.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)atscala.collection.Iterator$anon11.hasNext(Iterator.scala:409)atorg.apache.spark.sql.catalyst.expressions.GeneratedClass11.hasNext(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass11.hasNext(Iterator.scala:409)atorg.apache.spark.sql.catalyst.expressions.GeneratedClassGeneratedIteratorForCodegenStage2.sort_addToSorter_0(UnknownSource)atorg.apache.spark.sql.catalyst.expressions.GeneratedClass(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass(UnknownSource)atorg.apache.spark.sql.catalyst.expressions.GeneratedClassGeneratedIteratorForCodegenStage2.processNext(Unknown
Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExecKaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲13anon1.hasNext(WholeStageCodegenExec.scala:636)atorg.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83)atorg.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:811)atorg.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextOuterJoinRows(SortMergeJoinExec.scala:770)atorg.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceStream(SortMergeJoinExec.scala:934)atorg.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceNext(SortMergeJoinExec.scala:970)atorg.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)atorg.apache.spark.sql.catalyst.expressions.GeneratedClass1.hasNext(WholeStageCodegenExec.scala:636) at org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83) at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:811) at org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextOuterJoinRows(SortMergeJoinExec.scala:770) at org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceStream(SortMergeJoinExec.scala:934) at org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceNext(SortMergeJoinExec.scala:970) at org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68) at org.apache.spark.sql.catalyst.expressions.GeneratedClass1.hasNext(WholeStageCodegenExec.scala:636)atorg.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83)atorg.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:811)atorg.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextOuterJoinRows(SortMergeJoinExec.scala:770)atorg.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceStream(SortMergeJoinExec.scala:934)atorg.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceNext(SortMergeJoinExec.scala:970)atorg.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)atorg.apache.spark.sql.catalyst.expressions.GeneratedClassGeneratedIteratorForCodegenStage6.sort_addToSorter_0(UnknownSource)atorg.apache.spark.sql.catalyst.expressions.GeneratedClass(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass(UnknownSource)atorg.apache.spark.sql.catalyst.expressions.GeneratedClassGeneratedIteratorForCodegenStage6.processNext(Unknown
Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExecKaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲13anon1.hasNext(WholeStageCodegenExec.scala:636)atorg.apache.spark.sql.execution.aggregate.SortAggregateExec1.hasNext(WholeStageCodegenExec.scala:636) at org.apache.spark.sql.execution.aggregate.SortAggregateExec1.hasNext(WholeStageCodegenExec.scala:636)atorg.apache.spark.sql.execution.aggregate.SortAggregateExecanonfunanonfunanonfundoExecute111$anonfun3.apply(SortAggregateExec.scala:80)atorg.apache.spark.sql.execution.aggregate.SortAggregateExec3.apply(SortAggregateExec.scala:80) at org.apache.spark.sql.execution.aggregate.SortAggregateExec3.apply(SortAggregateExec.scala:80)atorg.apache.spark.sql.execution.aggregate.SortAggregateExecanonfunanonfunanonfundoExecute111$anonfun3.apply(SortAggregateExec.scala:77)atorg.apache.spark.rdd.RDD3.apply(SortAggregateExec.scala:77) at org.apache.spark.rdd.RDD3.apply(SortAggregateExec.scala:77)atorg.apache.spark.rdd.RDDanonfunanonfunanonfunmapPartitionsWithIndexInternal111$anonfun13.apply(RDD.scala:845)atorg.apache.spark.rdd.RDD13.apply(RDD.scala:845) at org.apache.spark.rdd.RDD13.apply(RDD.scala:845)atorg.apache.spark.rdd.RDDanonfunanonfunanonfunmapPartitionsWithIndexInternal111$anonfun13.apply(RDD.scala:845)atorg.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)atorg.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)atorg.apache.spark.rdd.RDD.iterator(RDD.scala:310)atorg.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)atorg.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)atorg.apache.spark.rdd.RDD.iterator(RDD.scala:310)atorg.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)atorg.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)atorg.apache.spark.scheduler.Task.run(Task.scala:123)atorg.apache.spark.executor.Executor13.apply(RDD.scala:845) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor13.apply(RDD.scala:845)atorg.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)atorg.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)atorg.apache.spark.rdd.RDD.iterator(RDD.scala:310)atorg.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)atorg.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)atorg.apache.spark.rdd.RDD.iterator(RDD.scala:310)atorg.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)atorg.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)atorg.apache.spark.scheduler.Task.run(Task.scala:123)atorg.apache.spark.executor.ExecutorTaskRunner$$anonfun10.apply(Executor.scala:408)atorg.apache.spark.util.Utils10.apply(Executor.scala:408) at org.apache.spark.util.Utils10.apply(Executor.scala:408)atorg.apache.spark.util.Utils.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.ExecutorTaskRunner.run(Executor.scala:414)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)atjava.util.concurrent.ThreadPoolExecutorTaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutorTaskRunner.run(Executor.scala:414)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)atjava.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
… 1 more

解决:

deep sleep

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值