主要内容
(一)JDK+Hadoop安装及配置(centos7)
工具:Xshell(连接CentOS主机), Xftp(Windows与CentOS主机之间的文件传输);
提示:本文中下载的安装包都推荐放入/usr/local目录中;
安装前的部署
- 关闭防火墙;
- 关闭SELINUX;
- 下载jdk(这里推荐1.8后缀为tar.gz的版本),上传到centos主机中;
- 解压jdk安装包,并配置centos的java环境变量,然后使配置生效;
安装Hadoop(伪分布式模式)
- 下载Hadoop(本文使用2.7.7 的版本),之后传入centos主机中;
- 解压安装包;
- 配置Hadoop环境变量,然后是配置生效;
- 配置Hadoop的五个配置文件(进入到hadoop的/etc/hadoop目录中)(单机模式中不需要配置);
第一个:hadoop-env.sh;
[root@hadoop hadoop]# vi hadoop-env.sh //添加如下的一行变量
#hadoop-2.7.7是第25行
#可以使用 :set number来显示行数
export JAVA_HOME=/usr/java
第二个:core-site.xml(HADOOP-HDFS系统内核文件);
[root@hadoop hadoop]# vi core-site.xml //添加如下几行
<configuration>
<!--指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:9000</value> //hadoop为主机名
</property>
<!--指定HADOOP运行时产生文件的存储目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
第三个:hdfs-site.xml;
[root@hadoop hadoop]# vi hdfs-site.xml //添加如下几行
<configuration>
<!--指定HDFS副本的数量-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
第四个:mapred-site.xml
[root@hadoop hadoop]# vi mapred-site.xml //添加如下几行
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
第五个:yarn-site.xml;
[root@hadoop hadoop]# vi yarn-site.xml //添加如下几行
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 指定YARN的老大(ResourceManager)的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop</value>
</property>
<!-- 指定reducer获取数据的方式-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- 解决互信问题 (配置ssh,生成密钥,使ssh可以免密码连接localhost);
- 启动Hadoop集群;
首先格式化NameNode:
注意:如果不是第一次格式化,格式化之前先删除/usr/local/hadoop-版本号/下面的tmp、logs两个目录;
[root@hadoop hadoop]# hdfs namenode -format #中间没有报错并且最后显示如下信息表示格式化成功
启动Hadoop集群命令:start-all.sh
停止Hadoop集群命令:stop-all.sh
查看Hadoop进程:
[root@hadoop hadoop]# jps //有一下几个则表示配置成功
1776 SecondaryNameNode
2340 Jps
1478 NameNode
1609 DataNode
1930 ResourceManager
2219 NodeManager
- 页面UI端口:50070(HDFS管理界面(NameNode))、8088(MR管理界面);
- wordcount实验
hdfs dfs -put in.txt /intput 上传本地当前路径下的in.txt文件 到hdfs的/input目录下;
运行hadoop jar hadoop-mapreduce-examples-2.7.7.jar wordcount /intput/in.txt output/;
在端口50070页面中 查看/output/part-r-00000文件里的词频统计结果。
(二)安装JDK+Eclipse+Maven(win10)
JDK部分
- 下载JDK(同样推荐1.8 ,下载后缀名为.exe);
官网链接:点击下载JDK1.8; - 配置JDk环境变量 ;
Eclipse部分
- 官网下载Eclipse安装包eclipse-inst-win64:点击下载Eclipse;
- 安装Eclipse;
- 在Eclipse中配置java环境变量;
Maven部分
- 下载Maven;
Maven官网:点击下载mavan,选择最近的镜像,选择Maven压缩包apache-maven-3.6.0-bin.tar.gz开始下载。 - 解压Maven压缩包
解压Maven压缩包apache-maven-3.6.0-bin.tar.gz,解压后的文件夹\apache-maven-3.6.0,将其考入自定义路径,如C:\eclipse\apache-maven-3.6.0。 - 配置Maven的环境变量;
- 在Eclipse中配置Maven;
①修改settings.xml;
在安装所在文件夹\apache-maven-3.6.0下面,新建\repository文件夹,作为Maven本地仓库。在文件settings.xml里添加 C:\eclipse\apache-maven-3.6.0\repository。
②配置Maven的installation和User Settings;
【Preferences】→【Maven】→【Installations】配置Maven安装路径,【User Settings】配置settings.xml的路径。 - 在Eclipse中新建一个Maven项目
- 修改Maven项目中的pom.xml:
依赖(Maven Repository: hadoop)所在网址:Maven依赖 。
//找到对应版本的三个依赖(如下),拷贝至pom.xml的<project>与</project>之间,保存之后自动生成Maven Dependencies
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.7</version>
</dependency>
</dependencies>
HDFS的Java程序
- HDFSMKdir.java新建HDFS目录/aadir。
package hdfs.files;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HDFSMKdir {
public static void main(String[] args) throws IOException {
System.setProperty("HADOOP_USER_NAME", "root");
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop:9000");
FileSystem client = FileSystem.get(conf);
client.mkdirs(new Path("/aadir"));
client.close();
System.out.println("successfully!");
}
}
- HDFSUpload.java写入/上传 本地文件c:\hdfs\aa.txt 到HDFS的/aadir目录下。
package hdfs.files;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HDFSUpload {
private static InputStream input;
private static OutputStream output;
public static void main(String[] args) throws IOException{
System.setProperty("HADOOP_USER_NAME", "root");
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop:9000");
FileSystem client = FileSystem.get(conf);
input = new FileInputStream("/usr/local/hdfs/aa.txt");
output = client.create(new Path("/aadir/aaout.txt"));
byte[] buffer = new byte[1024];
int len = 0;
while ((len=input.read(buffer))!=-1){
output.write(buffer, 0, len);
}
output.flush();
input.close();
output.close();
}
}
- HDFSDownload.java读/下载 HDFS的根目录文件/bb.txt 到本地c:\hdfs目录下。
package hdfs.files;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HDFSDownload {
// 声明输入流、输出流
private static InputStream input;
private static OutputStream output;
public static void main(String[] args) throws IOException {
//设置root权限
System.setProperty("HADOOP_USER_NAME", "root");
//创建HDFS连接对象client
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop:9000");
FileSystem client = FileSystem.get(conf);
//创建本地文件的输入流
input = new FileInputStream("/usr/local/hdfs/bbout.txt");
//创建HDFS的输出流
output = client.create(new Path("/bb.txt"));
//写文件到HDFS
byte[] buffer = new byte[1024];
int len = 0;
while ((len=input.read(buffer))!=-1){
output.write(buffer, 0, len);
}
//防止输出数据不完整
output.flush();
//使用工具类IOUtils上传或下载
//IOUtils.copy(input, output);
//关闭输入输出流
input.close();
output.close();
}
}
- HDFSFileIfExist.java查看HDFS文件/bb.txt是否存在。
package hdfs.files;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HDFSFileIfExist {
public static void main(String[] args) throws IOException{
System.setProperty("HADOOP_USER_NAME", "root");
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop:9000");
FileSystem client = FileSystem.get(conf);
String fileName = "/bb.txt";
if (client.exists(new Path(fileName))) {
System.out.println("seccessfully!");
}else {
System.out.println("file no exist!");
}
}
}
- 创建四个Maven项目,分别将四个java程序放入;
- 将四个Maven项目打成jar包;
在每个项目的pom.xml文件添加打包工具:
//在<projec> </project>中添加
<build>
<plugins>
<plugin>
<artifactId> maven-assembly-plugin </artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>spark.files.WordCountJava</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
- 将生成的jar包上传到CentOS主机中运行;
命令:java -jar jar包名(进入到放jar包的目录中运行)
结果:
WordCount的java程序实验
- 创建一个Maven项目
- 在pom.xml中添加如下依赖:
注意:在标签中添加
//**注意**:在标签<project></project>中添加
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.0</version>
</dependency>
</dependencies>
- 创建如下java程序(**注意:**该程序需要先在centos主机中创建/usr/local/hdfs/input/cc.txt 的目录和文件)
WordCountDriver.java:
package hdfs.files;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountDriver {
public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
//对数据进行打散
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//输入数据 hello world love work
String line = value.toString();
//对数据切分
String[] words=line.split(" ");
//写出<hello, 1>
for(String w:words) {
//写出reducer端
context.write(new Text(w), new IntWritable(1));
}
}
}
public static class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable>{
protected void reduce(Text Key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
//记录出现的次数
int sum=0;
//累加求和输出
for(IntWritable v:values) {
sum +=v.get();
}
context.write(Key, new IntWritable(sum));
}
}
public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException {
// 设置root权限
System.setProperty("HADOOP_USER_NAME", "root");
//创建job任务
Configuration conf=new Configuration();
Job job=Job.getInstance(conf);
//指定jar包位置
job.setJarByClass(WordCountDriver.class);
//关联使用的Mapper类
job.setMapperClass(WordCountMapper.class);
//关联使用的Reducer类
job.setReducerClass(WordCountReducer.class);
//设置Mapper阶段输出的数据类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//设置Reducer阶段输出的数据类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//设置数据输入路径和文件名
FileInputFormat.setInputPaths(job, new Path("/usr/local/hdfs/input/cc.txt"));
//设置数据输出路径
FileOutputFormat.setOutputPath(job, new Path("/usr/local/hdfs/output"));
//提交任务
Boolean rs=job.waitForCompletion(true);
//退出
System.exit(rs?0:1);
}
}
- 将该Maven程序打包后上传到centos主机中,然后运行运行步骤与前面一样;
(添加之前的打包工具,复制过来就行);
(三)HBase的安装和配置
HBase的安装
- 下载HBase压缩包,然后通过Xftp传到centos主机中;
注意:选择与自己安装的Hadoop版本的兼容版本;
下载:官网下载地址:点击下载HBase
选择稳定版hbase-1.4.9-bin.tar.gz,在Windows里面下载; - 解压到/usr/local目录中;
- 配置环境变量,然后使环境变量生效;
HBase配置(伪分布式模式)
- 修改配置文件(在/usr/local/hbase/conf中):
①配置hbase-env.sh
设置Java安装路径
[root@hadoop conf]# vi hbase-env.sh //添加如下几行
export JAVA_HOME=/usr/java
设置HBase的配置文件路径
export HBASE_CLASSPATH=/usr/local/hbase/conf
采用HBase自带Zookeeper,设置参数true
export HBASE_MANAGES_ZK=true
②配置hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop:9000/hbase</value>
</property>
<!--分布式运行模式,false(默认)为单机模式-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!--Zookeeper集群的地址列表,伪分布式用默认localhost-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
</configuration>
③启动并运行HBase(之前启动Hadoop),UI界面端口为:16010
启动HBase(start-hbase.sh),并jps查看
停止HBase(stop-hbase.sh)
使用HBase数据库
- 进入shell界面:hbase shell
- 创建表hbase_1102有两个列族CF1和CF2
hbase(main):041:0> create 'hbase_1102', {NAME=>'cf1'}, {NAME=>'cf2'}
- 向表中添加数据,在想HBase的表中添加数据的时候,只能一列一列的添加,不能同时添加多列。
hbase(main):042:0> put'hbase_1102', '001','cf1:name','Tom'
hbase(main):043:0> put'hbase_1102', '001','cf1:gender','man'
hbase(main):044:0> put'hbase_1102', '001','cf2:chinese','90'
hbase(main):045:0> put'hbase_1102', '001','cf2:math','91
- 查看表中的所有数据
hbase(main):046:0> scan 'hbase_1102'
- 查看其中某一个Key的数据
hbase(main):048:0> get'hbase_1102','001'
- 删除一个单元格
hbase(main):050:0> delete '表名' '行键名' '列族名'
- 删除一行
hbase(main):052:0> delete '表名' '行键名'
- 删除表(先disable ‘表名’;然后drop ‘表名’)
HBase的java API
1.和之前的实验一样,先创建一个Maven项目;
2. 在pom.xml中添加相关依赖;
如下:
//在标签<project></project>中添加
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-it</artifactId>
<version>1.4.9</version>
</dependency>
</dependencies>
- 编写java程序:
package hbase.tables;
import java.io.IOException;
import java.util.Scanner;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;
public class HbaseTables {
public static Configuration conf;
public static Connection con;
public static Admin adm;
@SuppressWarnings("all")
public static void init() throws IOException {
conf=HBaseConfiguration.create();
conf.set("hbase.rootdir", "hdfs://47.106.78.4:9000/hbase");
con= ConnectionFactory.createConnection(conf);
adm = con.getAdmin();
System.out.println(adm);
}
public static void createTable(String myTableName, String[] colFamily) throws IOException {
init();
TableName tableName = TableName .valueOf(myTableName);
if (adm.tableExists(tableName)) {
System.out.println("table is exists!"); }else {
HTableDescriptor htd=new HTableDescriptor(tableName);
for(String str:colFamily) {
HColumnDescriptor hcd =new HColumnDescriptor(str);
htd.addFamily(hcd);
}
adm.createTable(htd);
}
close();
}
public static void close() {
try {
if (adm != null) { adm.close();
}
if (con != null) { con.close();
}
}catch (IOException e) {
e.printStackTrace();}
}
public static void deleteTable(String myTableName) throws IOException {
init();
TableName tableName = TableName .valueOf(myTableName);
if (adm.tableExists(tableName)) {
adm.disableTable(tableName);
adm.deleteTable(tableName);
}
close();
}
public static void listTables() throws IOException {
init();
HTableDescriptor htds[] =adm.listTables();
for(HTableDescriptor htd : htds) {
System.out.println(htd.getNameAsString());
}
close();
}
public static void insertRow(String myTableName, String rowKey, String colFamily, String col, String val) throws IOException {
init();
TableName tableName = TableName .valueOf(myTableName);
@SuppressWarnings("deprecation")
HTable table = new HTable(conf,tableName);
Put put=new Put(rowKey.getBytes());
put.addColumn(colFamily.getBytes(), col.getBytes(), val.getBytes());
table.put(put);
table.close();
close();
}
private static void deleteRow(String myTableName, String rowKey, String colFamily, String col) throws IOException {
init();
TableName tableName =TableName .valueOf(myTableName);
@SuppressWarnings("deprecation")
HTable table = new HTable(conf, tableName);
Delete delete=new Delete(rowKey.getBytes());
delete.addFamily(Bytes.toBytes(colFamily));
delete.addColumn(Bytes.toBytes(colFamily), Bytes.toBytes(col));
table.delete(delete);
table.close();
close();
}
public static void getData(String myTableName, String rowKey, String colFamily, String col) throws IOException {
init();
TableName tableName = TableName .valueOf(myTableName);
@SuppressWarnings("deprecation")
HTable table = new HTable(conf, tableName);
Get get= new Get(rowKey.getBytes());
Result result = table.get(get);
showCell(result);
table.close();
close();
}
private static void showCell(Result result) {
Cell[] cells = result.rawCells();
for (Cell cell : cells) {
System.out.println("RowName:" + new String(CellUtil.cloneRow(cell)) + " ");
System.out.println("Timetamp:" + cell.getTimestamp() + " ");
System.out.println("column Family:" + new String(CellUtil.cloneFamily(cell)) + " ");
System.out.println("row Name:" + new String(CellUtil.cloneQualifier(cell)) + " ");
System.out.println("value:" + new String(CellUtil.cloneValue(cell)) + " ");
}
}
public static void main(String[] args) throws IOException {
System.out.println("*****Please enter the number:1.createtable/2.insertRow/3.getData/4.deleteRow/5.listTables/6.deleteTable*****");
for(int j=0;j<7;j++) {
int i = 0;
@SuppressWarnings("resource")
Scanner scan = new Scanner(System.in);
i = scan.nextInt();
switch (i) {
case 1:
System.out.println("please enter tablename:");
String tbn = scan.next();
String[] cf = {"cf1,cf2"};
HbaseTables.createTable(tbn, cf);
System.out.println("createTable success!!!");
break;
case 2:
System.out.println("please enter tablename:");
String tbn1 = scan.next();
System.out.println("please enter rowkey:");
String rk1 = scan.next();
System.out.println("please enter column:");
String clm1 = scan.next();
System.out.println("please enter colname:");
String cn1 = scan.next();
System.out.println("please enter colvalue:");
String cv1 = scan.next();
HbaseTables.insertRow(tbn1, rk1, clm1, cn1, cv1);
System.out.println("insertRow success!!!");
break;
case 3:
System.out.println("please enter tablename:");
String tbn2 = scan.next();
System.out.println("please enter rowkey:");
String rk2 = scan.next();
System.out.println("please enter colname:");
String cn2 = scan.next();
System.out.println("please enter colvalue:");
String cv2 = scan.next();
HbaseTables.getData(tbn2, rk2, cn2, cv2);
System.out.println("getData success!!!");
break;
case 4:
System.out.println("please enter tablename:");
String tbn3 = scan.next();
System.out.println("please enter rowkey:");
String rk3 = scan.next();
System.out.println("please enter column:");
String clm3 = scan.next();
System.out.println("please enter colname:");
String cn3 = scan.next();
HbaseTables.deleteRow(tbn3, rk3, clm3, cn3);
System.out.println("deleteRow success!!!");
break;
case 5:
HbaseTables.listTables();
System.out.println("listTables success!!!");
break;
case 6:
System.out.println("please enter tablename:");
String tbn4 = scan.next();
HbaseTables.deleteTable(tbn4);
System.out.println("deleteTable success!!!");
break;
default:
System.out.println("input error!!!");
break;
}
}
}
}
- 将项目打成jar包,上传到centos中运行(应先启动hbase)
(四)使用Redis 和MongoDB
安装和使用Redis
- 下载安装包:
wget http://download.redis.io/releases/redis-4.0.2.tar.gz
- 解压:
tar xzf redis-4.0.2.tar.gz
cd redis-4.0.2
make
make install
- 启动Redis
[root@hadoop bin]# redis-server ./redis.conf
- 使用Redis shell
redis-cli //进入shell
quit; //退出shell
安装和使用MongoDB
- 创建仓库文件:
vi /etc/yum.repos.d/mongodb-org-3.4.repo
添加如下配置,保存退出:
[mongodb-org-3.4] name=MongoDB Repository baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.4/x86_64/ gpgcheck=1 enabled=1 gpgkey=https://www.mongodb.org/static/pgp/server-3.4.asc
- yum安装
yum install -y mongodb-org
- 启动
service mongod start
- 停止
service mongod stop
- 重启
service mongod restart
- 进入数据库
mongo
(五)Hive安装配置及使用
安装Mysql
- 官网下载mysql-server(yum安装)
wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm
若wget不可用,下载安装wget:yum -y install wget - 解压rpm -ivh mysql-community-release-el7-5.noarch.rpm
- 安装yum install mysql-community-server
- 重启mysql服务:service mysqld restart
- 进入mysql:mysql -u root -p
- 添加用户hive,设置密码,然后授权
mysql> CREATE DATABASE hive;
mysql> USE hive;
mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
mysql> GRANT ALL ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive';
mysql> GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';
mysql> FLUSH PRIVILEGES;
mysql> quit;
Hive的安装
- 下载安装包:apache-hive-2.3.4-bin.tar.gz
- 解压安装包(解压到/usr/local 的目录中)
- 配置Hive的环境变量
- 配置hive
cd /usr/local/hive/conf
touch hive-site.xml //之后将hive-default.xml.template的头部部分复制过来
[root@hadoop conf]# vi hive-site.xml //打开后复制以下内容
<configuration>
<!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
<!-- WARNING!!! Any changes you make to this file will be ignored by Hive. -->
<!-- WARNING!!! You must make your changes in hive-site.xml instead. -->
<!-- Hive Execution Parameters -->
<!--Hive作业的HDFS根目录位置 -->
<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/tmp</value>
</property>
<!--Hive作业的HDFS根目录创建写权限 -->
<property>
<name>hive.scratch.dir.permission</name>
<value>733</value>
</property>
<!--hdfs上hive元数据存放位置 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<!--连接数据库地址,名称 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://47.106.78.4:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<!--连接数据库驱动 -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<!--连接数据库用户名称 -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<!--连接数据库用户密码 -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>350321</value>
</property>
<!--客户端显示当前查询表的头信息 -->
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<!--客户端显示当前数据库名称信息 -->
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<!-- 这是hiveserver2 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>192.168.1.14</value>
</property>
</configuration>
- MySQL Connector/J安装
(1)下载压缩包,然后上传到centos主机中;
官网下载地址:http://ftp.ntu.edu.tw/MySQL/Downloads/Connector-J/
mysql-connector-java-5.1.47.tar.gz
(2)解压到/usr/local目录中;
(3)将驱动包mysql-connector-java-5.1.47-bin.jar复制到/usr/local/hive/lib目录中。
启动及使用Hive
- 启动Hadoop
- 初始化Metastore架构:schematool -dbType mysql -initSchema
- 启动Hive的命令:hive(进入到hive shell中)
- Hive应用实例:wordcount
(1)建数据源文件并上传到hdfs的/user/input目录下;
(2)建数据源表t1:create table t1 (line string);
(3)装载数据:load data inpath ‘/user/input’ overwrite into table t1;
(4)编写HiveQL语句实现wordcount算法,建表wct1保存计算结果:
create table wct1 as select word, count(1) as count from (select explode (split (line, ' ')) as word from t1) w group by word order by word;
(5)查看wordcount计算结果:select * from wct1;
(六)Spark的安装及使用
Scala的安装
- 下载Scala:scala-2.12.8.tgz ,然后上传到centos的/usr/local目录中
- 解压Scala: tar -zxvf scala-2.12.8.tgz
- 重命名:mv scala-2.12.8 scala
- 测试是否安装成功:scala -version
- 启动Scala:scala
Spark的安装
- 下载:spark-2.4.2-bin-hadoop2.7.tgz,上传目录同上;
- 解压;
- 重命名;
- 启动:
(1)先启动Hadoop集群;
(2)到spark的/sbin目录下启动spark:./start-all.sh
(3)使用jps 查看时会多出worker 和 mater 两个进程;
(4)查看spark的页面的端口为8080;
(4)进入spark-shell的命令为:spark-shell(先进入bin目录)
Spark应用程序:WordCount
加载本地文件
- 新建目录;
cd /usr/local/spark
mkdir mycode
cd mycode
mkdir wordcount
cd wordcount
- 新建文件,往里面写入几个单词,并用空格隔开;
vi word.txt
- 启动spark-shell;
- 把textFile变量中的内容再次写回到另外一个文本文件wordback.txt中:
val textFile = sc.textFile("file:///usr/local/spark/mycode/wordcount/word.txt")
textFile.saveAsTextFile("file:///usr/local/spark/mycode/wordcount/writeback")
- 查看结果:
cd /usr/local/spark/mycode/wordcount/writeback/
cat part-00000
加载HDFS
- 先启动Hadoop
- 新建目录:
hdfs dfs -mkdir -p /user/hadoop
- 上传本地的word.txt到HDFS:
hdfs dfs -put /usr/local/spark/mycode/wordcount/word.txt /user/hadoop
- 回到spark-shell窗口,编写语句,把textFile变量中的内容再次写回到另外一个文本文件wordback.txt中:
val textFile = sc.textFile("hdfs://hadoop:9000/user/hadoop/word.txt")
textFile.saveAsTextFile("hdfs://hadoop:9000/user/hadoop/writeback")
- 查看结果:hdfs dfs -cat /user/hadoop/writeback/part-00000
词频统计
切换到spark-shell
scala> val textFile = sc.textFile("file:///usr/local/spark/mycode/wordcount/word.txt")
scala> val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
scala> wordCount.collect()
编写Scala程序来执行词频统计
- 新建目录
cd /usr/local/spark/mycode/wordcount/
mkdir -p src/main/scala
- 新建Scala文件
cd /usr/local/spark/mycode/wordcount/src/main/scala
vi test.scala //打开后写入以下代码,保存后退出;
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object WordCount {
def main(args: Array[String]) {
val inputFile = "file:///usr/local/spark/mycode/wordcount/word.txt"
val conf = new SparkConf().setAppName("WordCount").setMaster("local[2]")
val sc = new SparkContext(conf)
val textFile = sc.textFile(inputFile)
val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
wordCount.foreach(println)
}
}
- 安装sbt
先下载sbt:点击下载sbt
mkdir /usr/local/sbt
cd /usr/local/sbt //然后将下载好的sbt传到这个目录下面
vi ./sbt 添加一下内容,之后保存并退出;
#!/bin/bash
SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
java $SBT_OPTS -jar `dirname $0`/sbt-launch.jar "$@"
chmod u+x ./sbt
./sbt sbt-version //该步骤时间较长,耐心等待
- 打包程序项目
cd /usr/local/spark/mycode/wordcount/
vi simple.sbt //添加一下几行
注意Scala和spark的版本
name := "Simple Project"
version := "1.0"
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.2"
保存退出后继续如下操作
cd /usr/local/spark/mycode/wordcount/
find .
cd /usr/local/spark/mycode/wordcount/ //请一定把这目录设置为当前目录
/usr/local/sbt/sbt package
- 运行jar包
/usr/local/spark/bin/spark-submit --class "WordCount" /usr/local/spark/mycode/wordcount/target/scala-2.12/simple-project_2.12-1.0.jar
用java语言编写spark WordCount程序
- 在eclipse中新建一个Maven项目
- 修改生成的pom.xml文件(里面包含依赖包和打包工具)
//在<project></project>标签中添加;**注意**这里不要复制
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>2.4.2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId> maven-assembly-plugin </artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>spark.files.WordCountJava</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
- 编写java程序
package spark.files;
import java.util.Arrays;
import java.util.Iterator;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;
import scala.Tuple2;
public class WordCountJava {
public static void main(String[] args) {
// 1.创建SparkConf
SparkConf sparkConf = new SparkConf()
.setAppName("wordCountLocal")
.setMaster("local");
// 2.创建JavaSparkContext
// SparkContext代表着程序入口
JavaSparkContext sc = new JavaSparkContext(sparkConf);
// 3.读取本地文件
JavaRDD<String> lines = sc.textFile("/user/hadoop/word.txt");
// 4.每行以空格切割
JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
public Iterator<String> call(String t) throws Exception {
return Arrays.asList(t.split(" ")).iterator();
}
});
// 5.转换为 <word,1>格式
JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() {
public Tuple2<String, Integer> call(String t) throws Exception {
return new Tuple2<String, Integer>(t, 1);
}
});
// 6.统计相同Word的出现频率
JavaPairRDD<String, Integer> wordCount = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() {
public Integer call(Integer v1, Integer v2) throws Exception {
return v1 + v2;
}
});
// 7.执行action,将结果打印出来
wordCount.foreach(new VoidFunction<Tuple2<String,Integer>>() {
public void call(Tuple2<String, Integer> t) throws Exception {
System.out.println(t._1()+" "+t._2());
}
});
// 8.主动关闭SparkContext
sc.close();
}
}
- 将这个Maven项目打包,然后传到centos主机上;
- 运行jar包之前先启动Hadoop和spark;
- 运行jar包,得出词频统计的结果;
/usr/local/spark/bin/spark-submit 后面加上jar包所在目录和jar包名
结束… … …