
spark
侠客刀
简单,清晰,高效
展开
-
sparkstream重启+删除任务日志
sparkstream定时清除日志和重启原创 2022-06-21 16:27:06 · 283 阅读 · 1 评论 -
spark-env.sh配置
vim /conf/spark-env.sh#!/usr/bin/env bashexport JAVA_HOME=/opt/module/jdk1.8.0_221export SCALA_HOME=/opt/module/scala-2.13.5export HADOOP_HOME=/opt/module/hadoop-3.1.4export HADOOP_CONF_DIR=/opt/module/hadoop-3.1.4/etc/hadoopexport SPARK_MASTER_IP=n原创 2021-03-11 11:57:09 · 3950 阅读 · 0 评论 -
is running beyond virtual memory limits.【虚拟内存超出运行】
虚拟内存超出运行报错信息:Container [pid=30866,containerID=container_1600927953860_0003_02_000001] is running beyond virtual memory limits. Current usage: 117.3 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.yarn Container原创 2020-09-28 15:16:30 · 2955 阅读 · 1 评论 -
pyspark sql简单入门
采用python开发spark sql简单入门1.读取本地csv文件转换为DataFrame2.DataFrame注册为spark sql临时表3.spark sql()函数查询返回DataFrame数据,或者直接DataFrame API返回数据原创 2020-09-28 11:01:14 · 1934 阅读 · 0 评论 -
spark-submit提交python作业
spark-submit提交python作业spark-submit执行pyspark脚本指定python环境提交:spark-submit --conf "spark.pyspark.driver.python=/usr/bin/python3.5" --conf "spark.pyspark.python=/usr/bin/python3.5" main.py原创 2020-09-28 10:19:04 · 884 阅读 · 0 评论 -
spark整合hive配置
spark整合hive配置spark整合hive配置1.hive环境配置完成,略~~2.JAR包3.hive-site.xml4.测试spark整合hive配置1.hive环境配置完成,略~~2.JAR包cp {HIVE_HOME}/lib/mysql-connector-java-5.1.44-bin.jar {SPARK_HOME}/jars/cp {HIVE_HOME}/conf/hive-site.xml {SPARK_HOME}/conf3.hive-site.xml编辑{SPAR原创 2020-09-27 15:27:01 · 419 阅读 · 0 评论 -
idea快速入门spark编程
ideaI快速入门一站式spark编程,下载安装idea、配置JDK3,maven,Scala,spark原创 2020-09-18 16:53:56 · 539 阅读 · 0 评论