1.前言
本综合练习,既能加强对scala的使用,又能回顾大数据的知识,同时也涉及到部分新的知识点Actor,所有有必要整理一下,由于代码部分不便于边讲解边写,而是直接给出代码,所以对于理解上还是有点困难,本文章是面向复习知识的人而写的,初学者建议找类似的视频看一下,再来阅读有助于你更好的理解。当然,在综合练习前,还要提及一下RPC及Actor等知识,还会有部分小案例代码让你循序渐进的理解这部分的内容。
2.RPC通信
RPC(Remote Procedure Call)—远程过程调用协议 ,其实就是不同进程(代码)间方法的调用而已。协议,也就是接口(规范)。我们也可以这样理解:RPC 其实就是客户端去调用服务端的方法,方法的执行是在服务端。
准备阶段:构建maven项目,pom.xml配置如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.sutdy</groupId>
<artifactId>1709RPC</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.11.8</scala.version>
<scala.compat.version>2.11</scala.compat.version>
<akka.version>2.4.17</akka.version>
<hadoop.version>2.6.5</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-actor_2.11</artifactId>
<version>${akka.version}</version>
</dependency>
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-remote_2.11</artifactId>
<version>${akka.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>
<build>
//注意,这里要在maven的目录中手动创建scala目录
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.7</arg>
</args>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<configuration>
<downloadSources>true</downloadSources>
<buildcommands>
<buildcommand>ch.epfl.lamp.sdt.core.scalabuilder</buildcommand>
</buildcommands>
<additionalProjectnatures>
<projectnature>ch.epfl.lamp.sdt.core.scalanature</projectnature>
</additionalProjectnatures>
<classpathContainers>
<classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
<classpathContainer>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER</classpathContainer>
</classpathContainers>
</configuration>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
</plugins>
</reporting>
</project>
案例1:Hadoop(HBase) 源码都是架构在RPC之上的,这里我们简单的模拟一下hadoop 的RPC。
/**
* 这里我们定义接口,规范
*/
public interface BizProtocal {
//定义了一个方法
void sayHello(String name);
//创建一个目录
void mkdirs(String path);
//版本号 是必须有的
long versionID=12345L;
}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.ipc.RPC;
import java.io.IOException;
import java.net.InetSocketAddress;
public class MyClient {
public static void main(String[] args) {
/**
* Class<T> protocol,
long clientVersion,
InetSocketAddress addr,
Configuration conf
*/
try {
final BizProtocal myNamenode = RPC.getProxy(BizProtocal.class,
BizProtocal.versionID,
new InetSocketAddress("localhost", 8888),
new Configuration());
myNamenode.sayHello("spark ");
myNamenode.mkdirs("data/output");
System.out.println("客户端代码调用结束");
} catch (IOException e) {
e.printStackTrace();
}
}
}
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.ipc.RPC;
import java.io.IOException;
//这个就是Hadoop RPC的服务端
public class MyNameNode implements BizProtocal {
@Override
public void sayHello(String name) {
System.out.println("hell "+name);
}
@Override
public void mkdirs(String path) {
System.out.println(path + " 目录已经创建!!");
}
public static void main(String[] args) {
RPC.Server server=null;
try {
server = new RPC.Builder(new Configuration())
.setProtocol(BizProtocal.class)
.setInstance(new MyNameNode())
.setBindAddress("localhost")
.setPort(8888)
.build();
} catch (IOException e) {
e.printStackTrace();
}
//启动起来 等待别人的调用
server.start();
}
}
hadoop的 底层namenode RPC就是这样的:
public class Namenode{
private NameNodeRpcServer rpcServer;
}
NameNodeRpcServer{
server = new RPC.Builder(new Configuration())
.setProtocol(BizProtocal.class)
.setInstance(new MyNameNode())
.setBindAddress("localhost")
.setPort(8888)
.build();
}
//NameNodeRpcServer实现的接口NamenodeProtocols
class NameNodeRpcServer implements NamenodeProtocols {
}
//NamenodeProtocols该接口继承了如下众多协议
//namenode 的RPC有哪些?如下
public interface NamenodeProtocols
extends ClientProtocol,
DatanodeProtocol,
NamenodeProtocol,
RefreshAuthorizationPolicyProtocol,
RefreshUserMappingsProtocol,
RefreshCallQueueProtocol,
GenericRefreshProtocol,
GetUserMappingsProtocol,
HAServiceProtocol,
TraceAdminProtocol {
}
HDFS API:客户端调用服务端的方法,方法的执行在服务端(namenode)
final FileSystem fileSystem = FileSystem.newInstance(new Configuration());
fileSystem.mkdirs(new Path(""));
public class DistributedFileSystem extends FileSystem
==============> DFSClient
namenode:ClientProtocol
return namenode.mkdirs(src, absPermission, createParent);
3.Actor
在做综合练习之前,我们为了更好的理解,还有进行一个小的练习:
案例2:需求如图
直接上代码:这里基本都是固定的套路用法,只要了解需要用到什么就行啦。
客户端:
package lesson01
import akka.actor.{Actor, ActorSystem, Props}
import akka.actor.Actor.Receive
import com.typesafe.config.ConfigFactory
/**
* 客户端
*/
class MyNodeManager extends Actor{
//Actor跟servlet很像
//Actor一开始启动首先调用的生命周期的方法
override def preStart(): Unit = {
val myRMRef = context.actorSelection("akka.tcp://MyResourceManagerActorSystem@localhost:19888/user/MyResourceManagerActor")
myRMRef ! "hello"
}
override def receive: Receive = {
case "hi" =>{
println("我是NodeManager 我接收到resourcemanager发送过来的hi的消息")
}
}
}
object MyNodeManager{
def main(args: Array[String]): Unit = {
val str=
"""
|akka.actor.provider = "akka.remote.RemoteActorRefProvider"
|akka.remote.netty.tcp.hostname = localhost
""".stripMargin
val config = ConfigFactory.parseString(str)
val actorSystem = ActorSystem("MyNodeManagerActorSystem",config)
//创建并启动了Actor
actorSystem.actorOf(Props(new MyNodeManager() ),"MyNodeManagerActor")
}
服务端:package lesson01
import akka.actor.{Actor, ActorSystem, Props}
import akka.actor.Actor.Receive
import com.typesafe.config.{Config, ConfigFactory}
/**
* 服务端
*/
class MyResourceManager extends Actor {
/**
* 偏函数:偏函数是用来做匹配的
* Any 参数一
* Unit 参数二
* 参数一:传入需要匹配的数据的数据类型(任何类型都可以)
* 参数二:返回值类型 Unit 代表我们的这儿没有返回值
* type Receive = PartialFunction[Any, Unit]
* @return
*/
override def receive: Receive = {
case "hello" => {
println("我接收到客户端给我传了hello消息")
//sender() 接收到谁 的消息 这个就是谁
sender() ! "hi"
}
}
}
object MyResourceManager{
def main(args: Array[String]): Unit = {
//def apply(name: String, config: Config): ActorSystem
val str="""
|akka.actor.provider = "akka.remote.RemoteActorRefProvider"
|akka.remote.netty.tcp.hostname =localhost
|akka.remote.netty.tcp.port=19888
""".stripMargin
val config: Config = ConfigFactory.parseString(str)
//创建ActorSystem
val actorSystem = ActorSystem("MyResourceManagerActorSystem",config)
//def actorOf(props: Props, name: String): ActorRef
//使用ActorSystem创建并启动Actor
actorSystem.actorOf(Props(new MyResourceManager),"MyResourceManagerActor")
}
}
好啦,以上只是真正综合练习的热身,接下来就是这个综合练习啦。
4.综合练习
需求如图:
代码:
样例类:作为消息别传递的样例类。
package lesson02
/**
*Message样例类
*/
trait Message extends Serializable
//NodeManager -> ResourceManager 进行注册
case class RegisterNodeManager(val NodeManagerId:String,val cpu:Int,val memory:Int) extends Message
//ResourceManager -> NodeManager
case class RegisteredNodeManager(val ResourceManagerURL:String) extends Message
//心跳消息
case class Heartbeat(val NodeManagerId:String) extends Message
//以下不是最初就能想到的设计,而是代码敲到某部分,需要这样的设计而出现的
case object Sendheartbeat
case object CheckTimeOut
服务端:
package lesson02
import akka.actor.{Actor, ActorSystem, Props}
import com.typesafe.config.{Config, ConfigFactory}
import scala.collection.mutable
import scala.concurrent.duration._
/**
* Created by Administrator on 2018/1/9.
*/
class MyResourceManager(var hostname:String,var port:Int) extends Actor {
private val id2nodemanagerInfo = new mutable.HashMap[String,NodeManagerInfo]()
//为了方便遍历NodeManagerInfo而声明的这个变量
private val nodeManagerInfoes = new mutable.HashSet[NodeManagerInfo]()
override def preStart(): Unit = {
import context.dispatcher
//定时10s检查心跳
context.system.scheduler.schedule(0 millis,10000 millis,self,CheckTimeOut )//发送给了自己
}
/**
* 偏函数:偏函数是用来做匹配的
* Any 参数一
* Unit 参数二
* 参数一:传入需要匹配的数据的数据类型(任何类型都可以)
* 参数二:返回值类型 Unit 代表我们的这儿没有返回值
* type Receive = PartialFunction[Any, Unit]
* @return
*/
override def receive: Receive = {
case RegisterNodeManager(nodemanagerid,cpu,memory) => {
/**
* 我们思路,现在NodeManager已经把 内存,CPU,ID 号这些信息带过来了。
* 然后我们应该就是要存储起来?
* 集合/数组
* Map<key,value>
* key:id
* value: RegisterNodeManager(id,cpu,memory)
*/
val managerInfo = new NodeManagerInfo(nodemanagerid,cpu,memory)
id2nodemanagerInfo(nodemanagerid)=managerInfo
//id2nodemanagerInfo += (nodemanagerid -> managerInfo)
//id2nodemanagerInfo.put(nodemanagerid,managerInfo)
nodeManagerInfoes += managerInfo
/**
* 其实这儿人家设计的时候,内存易丢失
* spark的设计:
* 1)HDFS
* 2)Zookeeper
* hadoop1 hadoop2
*/
//sender() 接收到谁 的消息 这个就是谁
sender() ! RegisteredNodeManager(s"${hostname} ${port}")
}
case Heartbeat(nodemanagerid) =>{
val nodeManagerInfo = id2nodemanagerInfo(nodemanagerid)
//修改了心跳时间
nodeManagerInfo.lastHeartBeatTime=System.currentTimeMillis()
id2nodemanagerInfo(nodemanagerid) = nodeManagerInfo
nodeManagerInfoes += nodeManagerInfo
}
case CheckTimeOut =>{
val currentTimeMillis = System.currentTimeMillis()
// var tmp=new mutable.HashSet[NodeManagerInfo]()
// for(nm <- nodeManagerInfoes){
// if(currentTimeMillis - nm.lastHeartBeatTime > 15000){
// //超时的
// tmp += nm
// }
// }
//
// for(deadNodeManger <- tmp){
// nodeManagerInfoes -= deadNodeManger
// id2nodemanagerInfo -= deadNodeManger.NodeManagerId
// }
//set 函数式编程
nodeManagerInfoes.filter( nodemanager => currentTimeMillis - nodemanager.lastHeartBeatTime > 15000)//15s算超时剔除该节点
.foreach( deadNodeManager =>{
nodeManagerInfoes -= deadNodeManager
id2nodemanagerInfo -= deadNodeManager.NodeManagerId
})
println("当前成功注册的节点有"+nodeManagerInfoes.size + " 个 !!!")
}
}
}
object MyResourceManager{
val RESOURCEMANAGER_ACOTRYSYSTEM_NAME="MyResourceManagerActorSystem"
val RESOURCEMANAGER_ACTORY_NAME="MyResourceManagerActor"
def main(args: Array[String]): Unit = {
//def apply(name: String, config: Config): ActorSystem
/**
* 假设这儿我们写了代码,我们代码可以去读取配置文件
* 最后我们也不去读配置文件yarn-site.xml了,而是传参,因为这里主要讲解的不是这部分知识
*/
//主机名
val hostname=args(0)
val port=args(1).toInt
val str=s"""
|akka.actor.provider = "akka.remote.RemoteActorRefProvider"
|akka.remote.netty.tcp.hostname =${hostname}
|akka.remote.netty.tcp.port=${port}
""".stripMargin
val config: Config = ConfigFactory.parseString(str)
//创建ActorSystem
val actorSystem = ActorSystem(RESOURCEMANAGER_ACOTRYSYSTEM_NAME,config)
//def actorOf(props: Props, name: String): ActorRef
//使用ActorSystem创建并启动Actor
actorSystem.actorOf(Props(new MyResourceManager(hostname,port)),RESOURCEMANAGER_ACTORY_NAME)
}
}
客户端:
package lesson02
import java.util.UUID
import akka.actor.{Actor, ActorSelection, ActorSystem, Props}
import com.typesafe.config.ConfigFactory
import scala.concurrent.duration._
import io.aeron.driver.Sender
/**
* 客户端
*/
class MyNodeManager(var RMhostname:String,var RMport:Int,var cpu:Int,var memory:Int) extends Actor{
var myRMRef :ActorSelection= _ //
val nodemanagerId = UUID.randomUUID().toString
//Actor跟servlet很像
//Actor一开始启动首先调用的生命周期的方法
override def preStart(): Unit = {
//获取到服务端的引用
myRMRef = context.actorSelection(s"akka.tcp://${MyResourceManager.RESOURCEMANAGER_ACOTRYSYSTEM_NAME}@${RMhostname}:${RMport}/user/${MyResourceManager.RESOURCEMANAGER_ACTORY_NAME}")
//给服务端发送了一个RegisterNodeManager
myRMRef ! RegisterNodeManager(nodemanagerId,cpu,memory)
}
override def receive: Receive = {
case RegisteredNodeManager(url) =>{
//js 定时器
/**
* initialDelay: FiniteDuration, 多长时间以后开始执行
interval: FiniteDuration, 每隔多长时间执行一次
receiver: ActorRef, 给谁发送消息
message: Any 消息内容
*/
import context.dispatcher
context.system.scheduler.schedule(0 millis,10000 millis,self,Sendheartbeat )//发送给了自己
}
case Sendheartbeat =>{
//我这儿这个地方是不是可以写一些代码,做前期的准备工作!!!
myRMRef ! Heartbeat(nodemanagerId)
}
}
}
object MyNodeManager{
val NODEMANAGER_ACTORYSYSTEM_NAME="MyNodeManagerActorSystem"
val NODEMAANGER_ACTOR_NAME="MyNodeManagerActor"
def main(args: Array[String]): Unit = {
/**
* 假设这儿我们写了代码,我们代码可以去读取配置文件
* 最后我们也不去读配置文件yarn-site.xml了,而是传参,因为这里主要讲解的不是这部分知识
*/
//NodeManager的主机名
val hostname=args(0)
//ResourceManager的主机名
val resourceManagerHostName=args(1)
val resourceManagerPort=args(2).toInt
val cpu=args(3).toInt
val memory=args(4).toInt
//NodeManger工作的端口号 因为在一台机器上模拟,为了防止端口号冲突,这个参数每次启动以个nodemanager的时候都要不同
val port=args(5).toInt
val str=
s"""
|akka.actor.provider = "akka.remote.RemoteActorRefProvider"
|akka.remote.netty.tcp.hostname = ${hostname}
|akka.remote.netty.tcp.port=${port}
""".stripMargin
val config = ConfigFactory.parseString(str)
val actorSystem = ActorSystem(NODEMANAGER_ACTORYSYSTEM_NAME,config)
//创建并启动了Actor
actorSystem.actorOf(Props(new MyNodeManager(resourceManagerHostName,resourceManagerPort,cpu,memory) ),NODEMAANGER_ACTOR_NAME)
}
}
测试:
服务端开启:
结果:每隔10s出现一次提示
开启一个node客户端:
结果:
在开启一个node客户端:注意 node 的端口不要重复。
关闭一个node客户端,查看服务端是否会剔除该node。
至此,大功告成!!!