org.apache.hadoop.hive.ql.exec.mr.MapRedTask. GC overhead limit exceeded

刘超 18天前 ⋅ 4886 阅读   编辑

一、描述

  airflow调度spark程序,调度配置如下

week_stat_filed_oper =SparkSubmitOperator(pool='adx',priority_weight=11,task_id="week_stat_filed_%s"%project,conf={"spark.network.timeout":3600,"spark.sql.shuffle.partitions":1000,"spark.yarn.executor.memoryOverhead":8192,"spark.default.parallelism":2000,"spark.serializer":"org.apache.spark.serializer.KryoSerializer","spark.memory.useLegacyMode":True,"spark.storage.memoryFraction":0.6,"spark.shuffle.memoryFraction":0.2,"spark.storage.unrollFraction":0.2,"spark.shuffle.io.maxRetries":60,"spark.shuffle.io.retryWait":60,"spark.sql.windowExec.buffer.spill.threshold":1000000,"spark.shuffle.sort.bypassMergeThreshold":10000,"spark.memory.offHeap.enabled":True,"spark.memory.offHeap.size":"16G"},executor_cores=day_stat_resource["executor_cores"],num_executors=30,executor_memory='16G',driver_memory='4G',application="/home/sdev/liujichao/online/adx_stat_2.11-0.3-SNAPSHOT.jar",java_class=stat_class,name="day_stat_filed_%s_%s"%(project,week),application_args=[project,day,'week',run_mode,'--job_names','statRequestFiled'],jars=jars,dag=dag,verbose=True)

  spark程序从hive读取orc数据,将结果写入到tidb,调度后报如下错误

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:5005)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:5000)
at org.apache.hive.com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.(OrcProto.java:14334)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics.(OrcProto.java:14281)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:14370)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:14365)
at org.apache.hive.com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.(OrcProto.java:15008)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata.(OrcProto.java:14955)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:15044)
at org.apache.hadoop.hive.ql.io.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:15039)
at org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)

二、分析

  1、这是内存不足导致的,由于使用了默认的分割策略(混合策略),该策略需要更多的内存;将hive.exec.orc.split.strategy设为BI,如下

week_stat_filed_oper = SparkSubmitOperator(pool='adx',priority_weight=11,task_id="week_stat_filed_%s"%project,conf={"spark.network.timeout":3600,"spark.sql.shuffle.partitions":1000,"spark.yarn.executor.memoryOverhead":8192,"spark.default.parallelism":2000,"spark.serializer":"org.apache.spark.serializer.KryoSerializer","spark.memory.useLegacyMode":True,"spark.storage.memoryFraction":0.6,"spark.shuffle.memoryFraction":0.2,"spark.storage.unrollFraction":0.2,"spark.shuffle.io.maxRetries":60,"spark.shuffle.io.retryWait":60,"spark.sql.windowExec.buffer.spill.threshold":1000000,"spark.shuffle.sort.bypassMergeThreshold":10000,"spark.memory.offHeap.enabled":True,"spark.memory.offHeap.size":"16G","hive.exec.orc.split.strategy":"BI"},executor_cores=day_stat_resource["executor_cores"],num_executors=30,executor_memory='16G',driver_memory='4G',application="/home/sdev/liujichao/online/adx_stat_2.11-0.3-SNAPSHOT.jar",java_class=stat_class,name="day_stat_filed_%s_%s"%(project,week),application_args=[project,day,'week',run_mode,'--job_names','statRequestFiled'],jars=jars,dag=dag,verbose=True)

  参数不生效,有如下提示

[2019-11-11 08:00:44,035] {logging_mixin.py:95} INFO - [2019-11-11 08:00:44,035] {spark_submit_hook.py:415} INFO - Warning: Ignoring non-spark config property: --hiveconf hive.exec.orc.split.strategy=BI

  2、前面拼上spark.,如下,再次尝试

week_stat_filed_oper = SparkSubmitOperator(pool='adx',priority_weight=11,task_id="week_stat_filed_%s"%project,conf={"spark.network.timeout":3600,"spark.sql.shuffle.partitions":1000,"spark.yarn.executor.memoryOverhead":8192,"spark.default.parallelism":2000,"spark.serializer":"org.apache.spark.serializer.KryoSerializer","spark.memory.useLegacyMode":True,"spark.storage.memoryFraction":0.6,"spark.shuffle.memoryFraction":0.2,"spark.storage.unrollFraction":0.2,"spark.shuffle.io.maxRetries":60,"spark.shuffle.io.retryWait":60,"spark.sql.windowExec.buffer.spill.threshold":1000000,"spark.shuffle.sort.bypassMergeThreshold":10000,"spark.memory.offHeap.enabled":True,"spark.memory.offHeap.size":"16G","spark.hive.exec.orc.split.strategy":"BI"},executor_cores=day_stat_resource["executor_cores"],num_executors=30,executor_memory='16G',driver_memory='4G',application="/home/sdev/liujichao/online/adx_stat_2.11-0.3-SNAPSHOT.jar",java_class=stat_class,name="day_stat_filed_%s_%s"%(project,week),application_args=[project,day,'week',run_mode,'--job_names','statRequestFiled'],jars=jars,dag=dag,verbose=True)

  网上查到也可以通过-D指定 "-Djavax.jdo.option.ConnectionURL=jdbc:mysql://testip/hive?createDatabaseIfNotExist=true -Dhive.metastore.uris=thrift://testip:9083" ,前缀拼上spark,如果不生效,试试这样设置

  3、又跑失败了,报如下错误

19/11/11 14:50:53 ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /tmp/blockmgr-c11ea1b6-ba3c-4e67-8ff0-35bf0c2be6f5/0c/temp_shuffle_82aa614d-b304-4069-a985-ecec16496892
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:225)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:141)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.flush(UnsafeRowSerializer.scala:91)
at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:157)

  64G内存被吃了

  4、将driver调度到yarn上,也报一样的错误

  5、增加--conf "spark.driver.extraJavaOptions=-XX:-UseGCOverheadLimit" --conf "spark.executor.extraJavaOptions=-XX:-UseGCOverheadLimit"属性,禁用检查,其实这个参数解决不了内存问题,只是把错误的信息延后,如下

spark-submit --master yarn --conf spark.shuffle.io.maxRetries=60 --conf spark.default.parallelism=2000 --conf spark.shuffle.memoryFraction=0.2 --conf spark.storage.unrollFraction=0.2 --conf spark.memory.offHeap.size=16G --conf spark.sql.windowExec.buffer.spill.threshold=1000000 --conf spark.sql.shuffle.partitions=1000 --conf spark.memory.useLegacyMode=True --conf spark.storage.memoryFraction=0.6 --conf spark.network.timeout=3600 --conf spark.memory.offHeap.enabled=True --conf spark.shuffle.io.retryWait=60 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.shuffle.sort.bypassMergeThreshold=10000 --conf spark.yarn.executor.memoryOverhead=16384  --conf "spark.driver.extraJavaOptions=-XX:-UseGCOverheadLimit" --conf "spark.executor.extraJavaOptions=-XX:-UseGCOverheadLimit" --conf spark.yarn.driver.memoryOverhead=16384  --conf spark.executor.heartbeatInterval=1000s --jars /home/sdev/adx_stat/lib/mysql-connector-java-5.1.8.jar,/home/sdev/adx_stat/lib/scopt_2.11-3.7.1.jar,/home/sdev/adx_stat/lib/play-functional_2.11-2.4.3.jar,/home/sdev/adx_stat/lib/twirl-api_2.11-1.1.1.jar,/home/sdev/adx_stat/lib/play-datacommons_2.11-2.4.3.jar,/home/sdev/adx_stat/lib/play-ws_2.11-2.4.3.jar,/home/sdev/adx_stat/lib/zero-allocation-hashing-0.8.jar,/home/sdev/adx_stat/lib/mariadb-java-client-2.4.0.jar,/home/sdev/adx_stat/lib/build-link-2.4.3.jar,/home/sdev/adx_stat/lib/geoip2-2.5.0.jar,/home/sdev/adx_stat/lib/spark-redis-0.3.2.jar,/home/sdev/adx_stat/lib/opera-s3o-2.8.2.jar,/home/sdev/adx_stat/lib/flink-connector-kafka-0.9_2.11-1.7.0.jar,/home/sdev/adx_stat/lib/maxmind-db-1.1.0.jar,/home/sdev/adx_stat/lib/flink-scala_2.11-1.7.0.jar,/home/sdev/adx_stat/lib/play-netty-utils-2.4.3.jar,/home/sdev/adx_stat/lib/play-exceptions-2.4.3.jar,/home/sdev/adx_stat/lib/fastjson-1.2.58.jar,/home/sdev/adx_stat/lib/flink-streaming-java_2.11-1.7.0.jar,/home/sdev/adx_stat/lib/commons-pool2-2.4.3.jar,/home/sdev/adx_stat/lib/play-json_2.11-2.4.3.jar,/home/sdev/adx_stat/lib/RoaringBitmap-0.8.6.jar,/home/sdev/adx_stat/lib/tispark-core-1.2-jar-with-dependencies.jar,/home/sdev/adx_stat/lib/flink-streaming-scala_2.11-1.7.0.jar,/home/sdev/adx_stat/lib/mariadb-java-client-2.2.3.jar,/home/sdev/adx_stat/lib/play_2.11-2.4.3.jar,/home/sdev/adx_stat/lib/play-iteratees_2.11-2.4.3.jar,/home/sdev/adx_stat/lib/jedis-3.0.0.jar,/home/sdev/adx_stat/lib/flink-core-1.7.0.jar,/home/sdev/adx_stat/lib/tikv-client-1.2-jar-with-dependencies.jar,/home/sdev/adx_stat/lib/flink-connector-kafka-base_2.11-1.7.0.jar,/home/sdev/adx_stat/lib/async-http-client-1.9.21.jar,/home/sdev/adx_stat/lib/kafka-clients-0.9.0.1.jar --num-executors 15 --executor-cores 1 --executor-memory 16G --driver-memory 4G --name day_stat_filed_adx_request_44 --class com.opera.adx.job.Stat --verbose --queue adx --deploy-mode cluster /home/sdev/liujichao/online/adx_stat_2.11-0.3-SNAPSHOT.jar  adx_request 20191110 week official --job_names statRequestFiled

  但是报如下错误

19/11/14 11:08:46 WARN NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
	at io.netty.util.internal.MpscLinkedQueue.offer(MpscLinkedQueue.java:126)
	at io.netty.util.internal.MpscLinkedQueue.add(MpscLinkedQueue.java:221)
	at io.netty.util.concurrent.SingleThreadEventExecutor.fetchFromScheduledTaskQueue(SingleThreadEventExecutor.java:259)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:346)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
	at java.lang.Thread.run(Thread.java:745)
19/11/14 11:08:46 ERROR ApplicationMaster: User class threw exception: java.lang.OutOfMemoryError: Java heap space

  6、从日中找到applicationId,查看applicationr日志,如下

sdev@n-adx-hadoop-client-3:~/liujichao$ yarn application -status application_1535783866149_350166
19/11/15 08:50:29 INFO impl.TimelineClientImpl: Timeline service address: http://n17-07-04:8188/ws/v1/timeline/
19/11/15 08:50:29 INFO client.RMProxy: Connecting to ResourceManager at n17-07-04/172.17.28.13:8050
19/11/15 08:50:29 INFO client.AHSProxy: Connecting to Application History server at n17-07-04/172.17.28.13:10200
Application Report : 
	Application-Id : application_1535783866149_350166
	Application-Name : day_stat_filed_adx_request_44
	Application-Type : SPARK
	User : sdev
	Queue : adx
	Application Priority : null
	Start-Time : 1573804134950
	Finish-Time : 1573804520228
	Progress : 100%
	State : FINISHED
	Final-State : FAILED
	Tracking-URL : n17-06-02:18081/history/application_1535783866149_350166/2
	RPC Port : 0
	AM Host : 172.17.30.205
	Aggregate Resource Allocation : 231288441 MB-seconds, 5895 vcore-seconds
	Log Aggregation Status : TIME_OUT
	Diagnostics : User class threw exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
	Unmanaged Application : false
	Application Node Label Expression : 
	AM container Node Label Expression : 

  除了提示java.lang.OutOfMemoryError: GC overhead limit exceeded,没啥有用信息

  7、根据application获得尝试列表,如下

sdev@n-adx-hadoop-client-3:~/liujichao$ yarn applicationattempt -list application_1535783866149_350166
19/11/15 08:53:25 INFO impl.TimelineClientImpl: Timeline service address: http://n17-07-04:8188/ws/v1/timeline/
19/11/15 08:53:25 INFO client.RMProxy: Connecting to ResourceManager at n17-07-04/172.17.28.13:8050
19/11/15 08:53:25 INFO client.AHSProxy: Connecting to Application History server at n17-07-04/172.17.28.13:10200
Total number of application attempts :2
         ApplicationAttempt-Id	               State	                    AM-Container-Id	                       Tracking-URL
appattempt_1535783866149_350166_000001	              FAILED	container_e39_1535783866149_350166_01_000001	http://n17-07-04:8088/cluster/app/application_1535783866149_350166
appattempt_1535783866149_350166_000002	            FINISHED	container_e39_1535783866149_350166_02_000001	http://n17-07-04:8088/proxy/application_1535783866149_350166/

  从中可以看到appattempt_1535783866149_350166_000001失败了,查看其状态,如下

sdev@n-adx-hadoop-client-3:~/liujichao$ yarn applicationattempt -status appattempt_1535783866149_350166_000001
19/11/15 08:55:01 INFO impl.TimelineClientImpl: Timeline service address: http://n17-07-04:8188/ws/v1/timeline/
19/11/15 08:55:01 INFO client.RMProxy: Connecting to ResourceManager at n17-07-04/172.17.28.13:8050
19/11/15 08:55:01 INFO client.AHSProxy: Connecting to Application History server at n17-07-04/172.17.28.13:10200
Application Attempt Report : 
	ApplicationAttempt-Id : appattempt_1535783866149_350166_000001
	State : FAILED
	AMContainer : container_e39_1535783866149_350166_01_000001
	Tracking-URL : http://n17-07-04:8088/cluster/app/application_1535783866149_350166
	RPC Port : -1
	AM Host : N/A
	Diagnostics : AM Container for appattempt_1535783866149_350166_000001 exited with  exitCode: 15
For more detailed output, check the application tracking page: http://n17-07-04:8088/cluster/app/application_1535783866149_350166 Then click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e39_1535783866149_350166_01_000001
Exit code: 15
Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : run as user is nobody
main : requested yarn user is sdev
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /data01/hadoop/yarn/log/nmPrivate/application_1535783866149_350166/container_e39_1535783866149_350166_01_000001/container_e39_1535783866149_350166_01_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...


Container exited with a non-zero exit code 15
Failing this attempt

  没看出来/data01/hadoop/yarn/log/nmPrivate/application_1535783866149_350166/container_e39_1535783866149_350166_01_000001/container_e39_1535783866149_350166_01_000001.pid.tmp这个日志文件放在哪个节点啦

  8、查看appattempt_1535783866149_350166_000001有哪些container

sdev@n-adx-hadoop-client-3:~/liujichao$ yarn container -list appattempt_1535783866149_350166_000001
19/11/15 09:03:08 INFO impl.TimelineClientImpl: Timeline service address: http://n17-07-04:8188/ws/v1/timeline/
19/11/15 09:03:08 INFO client.RMProxy: Connecting to ResourceManager at n17-07-04/172.17.28.13:8050
19/11/15 09:03:08 INFO client.AHSProxy: Connecting to Application History server at n17-07-04/172.17.28.13:10200
Total number of containers :1
                  Container-Id	          Start Time	         Finish Time	               State	                Host	   Node Http Address	                            LOG-URL
container_e39_1535783866149_350166_01_000001	Fri Nov 15 07:48:55 +0000 2019	Fri Nov 15 07:52:51 +0000 2019	            COMPLETE	n35-02.fn.ams.osa:45454	http://n35-02.fn.ams.osa:8042	http://n17-07-04:8188/applicationhistory/logs/n35-02.fn.ams.osa:45454/container_e39_1535783866149_350166_01_000001/container_e39_1535783866149_350166_01_000001/sdev

  9、查看container_e39_1535783866149_350166_01_000001状态,如下

sdev@n-adx-hadoop-client-3:~/liujichao$ yarn container -status  container_e39_1535783866149_350166_01_000001
19/11/15 09:04:39 INFO impl.TimelineClientImpl: Timeline service address: http://n17-07-04:8188/ws/v1/timeline/
19/11/15 09:04:39 INFO client.RMProxy: Connecting to ResourceManager at n17-07-04/172.17.28.13:8050
19/11/15 09:04:39 INFO client.AHSProxy: Connecting to Application History server at n17-07-04/172.17.28.13:10200
Container Report : 
	Container-Id : container_e39_1535783866149_350166_01_000001
	Start-Time : 1573804135113
	Finish-Time : 1573804371325
	State : COMPLETE
	LOG-URL : http://n17-07-04:8188/applicationhistory/logs/n35-02.fn.ams.osa:45454/container_e39_1535783866149_350166_01_000001/container_e39_1535783866149_350166_01_000001/sdev
	Host : n35-02.fn.ams.osa:45454
	NodeHttpAddress : http://n35-02.fn.ams.osa:8042
	Diagnostics : Exception from container-launch.
Container id: container_e39_1535783866149_350166_01_000001
Exit code: 15
Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
	at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Shell output: main : command provided 1
main : run as user is nobody
main : requested yarn user is sdev
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /data01/hadoop/yarn/log/nmPrivate/application_1535783866149_350166/container_e39_1535783866149_350166_01_000001/container_e39_1535783866149_350166_01_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...


Container exited with a non-zero exit code 15

  从上可知/data01/hadoop/yarn/log/nmPrivate/application_1535783866149_350166/container_e39_1535783866149_350166_01_000001/container_e39_1535783866149_350166_01_000001.pid.tmp在n35-02.fn.ams.osa节点,ssh到该节点查看文件,发现没权限,郁闷!!!!

sdev@n-adx-hadoop-client-3:~/liujichao$ ssh n35-02.fn.ams.osa
The authenticity of host 'n35-02.fn.ams.osa (172.17.30.14)' can't be established.
ECDSA key fingerprint is bb:ca:01:8d:1a:14:f2:51:06:58:e0:95:e3:cc:15:8d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'n35-02.fn.ams.osa,172.17.30.14' (ECDSA) to the list of known hosts.
Permission denied (publickey).

  10、那只能使用yarn log拉取日志啦,如下

sdev@n-adx-hadoop-client-3:~/liujichao$ yarn logs -applicationId application_1535783866149_350166 -containerId container_e39_1535783866149_350166_01_000001 --nodeAddress  n35-02.fn.ams.osa:45454 -logFiles stdout > executor1stdout1.txt
19/11/15 09:10:37 INFO impl.TimelineClientImpl: Timeline service address: http://n17-07-04:8188/ws/v1/timeline/
19/11/15 09:10:37 INFO client.RMProxy: Connecting to ResourceManager at n17-07-04/172.17.28.13:8050
19/11/15 09:10:37 INFO client.AHSProxy: Connecting to Application History server at n17-07-04/172.17.28.13:10200
19/11/15 09:10:38 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
19/11/15 09:10:38 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
19/11/15 09:10:38 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
sdev@n-adx-hadoop-client-3:~/liujichao$ 

  将生成的文件下载,使用GC Easy解析报如下错误

 

  https://anish749.github.io/spark/analyzing-java-garbage-collection-logs-debugging-optimizing-apache-spark/

  

三、解决方法

四、参考文章

  1、解决ORC元数据片段超过protobuf message大小限制问题
  2、Hive GC overhead limit exceeded
  3、Hive 各InputFormat切分算法整理


注意:本文归作者所有,未经作者允许,不得转载

全部评论: 0

    我有话说: