1.启动DefaultMQPushConsumerImpl.start();
进而启动拉取消息的服务this.pullMessageService.start();重点是什么时候会往这个LinkedBlockingQueue队列里面放拉取请求
while (!this.isStoped()) {
try {
PullRequest pullRequest = this.pullRequestQueue.take();
if (pullRequest != null) {
this.pullMessage(pullRequest);
}
} catch (InterruptedException e) {
} catch (Exception e) {
log.error("Pull Message Service Run Method exception", e);
}
}
平衡服务this.rebalanceService.start();间隔时间是20swhile (!this.isStoped()) {
this.waitForRunning(WaitInterval);
this.mqClientFactory.doRebalance();
}
2.最开始的拉取请求是平衡服务放进去的,遍历消费者示例,逐个调用impl.doRebalance();, this.rebalanceImpl.doRebalance(this.isConsumeOrderly());
继续根据订阅关系取出对应的topic,this.rebalanceByTopic(topic, isOrder);根据同步的topic取出消息队列Set<MessageQueue> mqSet = this.topicSubscribeInfoTable.get(topic);
再根据分配策略(默认是平均)选出这一次需要进行消费的队列
AllocateMessageQueueStrategy strategy = this.allocateMessageQueueStrategy;
List<MessageQueue> allocateResult = null;
try {
allocateResult = strategy.allocate(//
this.consumerGroup, //
this.mQClientFactory.getClientId(), //
mqAll, //
cidAll);
} catch (Throwable e) {
log.error("AllocateMessageQueueStrategy.allocate Exception. allocateMessageQueueStrategyName={}", strategy.getName(),
e);
return;
}
3.更新工作队列信息,以及构建拉去消息列表this.updateProcessQueueTableInRebalance(topic, allocateResultSet, isOrder);对比processQueueTable里面存储的mq和
选择的mq之前的情况,如果这次选择了正在进行的消费队列,看看他的工作队列pq是否已经超时PullMaxIdleTime=120s,,如果选择了一个新队列那就加入rocessQueueTable
中,
List<PullRequest> pullRequestList = new ArrayList<PullRequest>();
for (MessageQueue mq : mqSet) {
if (!this.processQueueTable.containsKey(mq)) {
if (isOrder && !this.lock(mq)) {
log.warn("doRebalance, {}, add a new mq failed, {}, because lock failed", consumerGroup, mq);
continue;
}
this.removeDirtyOffset(mq);
ProcessQueue pq = new ProcessQueue();
long nextOffset = this.computePullFromWhere(mq);
if (nextOffset >= 0) {
ProcessQueue pre = this.processQueueTable.putIfAbsent(mq, pq);
if (pre != null) {
log.info("doRebalance, {}, mq already exists, {}", consumerGroup, mq);
} else {
log.info("doRebalance, {}, add a new mq, {}", consumerGroup, mq);
PullRequest pullRequest = new PullRequest();
pullRequest.setConsumerGroup(consumerGroup);
pullRequest.setNextOffset(nextOffset);
pullRequest.setMessageQueue(mq);
pullRequest.setProcessQueue(pq);
pullRequestList.add(pullRequest);
changed = true;
}
} else {
log.warn("doRebalance, {}, add new mq failed, {}", consumerGroup, mq);
}
}
}
this.dispatchPullRequest(pullRequestList);
这里有一个计算消息队列偏移量的操作 long nextOffset = this.computePullFromWhere(mq);这个偏移量就和之前讲过的有序的那个自动提交有关,会存储在RemoteBrokerOffsetStore这个类里面。然后就开始组装拉取请求列表,this.dispatchPullRequest(pullRequestList);实现类是RebalancePushImpl
@Override
public void dispatchPullRequest(List<PullRequest> pullRequestList) {
for (PullRequest pullRequest : pullRequestList) {
this.defaultMQPushConsumerImpl.executePullRequestImmediately(pullRequest);
log.info("doRebalance, {}, add a new pull request {}", consumerGroup, pullRequest);
}
}
最后这些请求就会到达PullMessageService的pullRequestQueue队列里,
public void executePullRequestImmediately(final PullRequest pullRequest) {
try {
this.pullRequestQueue.put(pullRequest);
} catch (InterruptedException e) {
log.error("executePullRequestImmediately pullRequestQueue.put", e);
}
}
4.现在就到文章开头了的那个take操作了,this.pullMessage(pullRequest);选择一个消费者实例DefaultMQPushConsumerImpl进行pullMessage操作
private void pullMessage(final PullRequest pullRequest) {
final MQConsumerInner consumer = this.mQClientFactory.selectConsumer(pullRequest.getConsumerGroup());
if (consumer != null) {
DefaultMQPushConsumerImpl impl = (DefaultMQPushConsumerImpl) consumer;
impl.pullMessage(pullRequest);
} else {
log.warn("No matched consumer for the PullRequest {}, drop it", pullRequest);
}
}
这个方法代码很多,主要就是PullCallback pullCallback这个匿名类
int sysFlag = PullSysFlag.buildSysFlag(//
commitOffsetEnable, // commitOffset
true, // suspend 这个时候是true
subExpression != null, // subscription
classFilter // class filter
);
try {
this.pullAPIWrapper.pullKernelImpl(//
pullRequest.getMessageQueue(), // 1
subExpression, // 2
subscriptionData.getSubVersion(), // 3
pullRequest.getNextOffset(), // 4 要获取的消息队列的偏移量
this.defaultMQPushConsumer.getPullBatchSize(), // 5
sysFlag, // 6
commitOffsetValue, // 7
BrokerSuspendMaxTimeMillis, // 8 15秒
ConsumerTimeoutMillisWhenSuspend, // 9 30秒
CommunicationMode.ASYNC, // 10
pullCallback// 11 回调函数很重要
);
} catch (Exception e) {
log.error("pullKernelImpl exception", e);
this.executePullRequestLater(pullRequest, PullTimeDelayMillsWhenException);
}
(true, // suspend)( BrokerSuspendMaxTimeMillis, 15秒)(ConsumerTimeoutMillisWhenSuspend,
0秒)这三个地方当初困扰了我好久,各种查日志都查不到具体的错误消息。最后还是打断点确认自己理解的不错,原来真相与我只有下一行代码的距离。然后就是组装请求数据PullAPIWrapper
PullResult pullResult = this.mQClientFactory.getMQClientAPIImpl().pullMessage(//
brokerAddr,//
requestHeader,// 自定义请求头数据
timeoutMillis,// 可以暂停回复的超时时间30s
communicationMode,//
pullCallback);
生成请求命令MQClientAPIImplRemotingCommand request = RemotingCommand.createRequestCommand(RequestCode.PULL_MESSAGE, requestHeader);
private void pullMessageAsync(//
final String addr, // 1
final RemotingCommand request, //
final long timeoutMillis, //
final PullCallback pullCallback//
) throws RemotingException, InterruptedException {
this.remotingClient.invokeAsync(addr, request, timeoutMillis, new InvokeCallback() {
@Override
public void operationComplete(ResponseFuture responseFuture) {
RemotingCommand response = responseFuture.getResponseCommand();
if (response != null) {
try {
PullResult pullResult = MQClientAPIImpl.this.processPullResponse(response);
assert pullResult != null;
pullCallback.onSuccess(pullResult);
} catch (Exception e) {
pullCallback.onException(e);
}
} else {
if (!responseFuture.isSendRequestOK()) {
pullCallback.onException(new MQClientException("send request failed", responseFuture.getCause()));
} else if (responseFuture.isTimeout()) {
pullCallback.onException(new MQClientException("wait response timeout " + responseFuture.getTimeoutMillis() + "ms",
responseFuture.getCause()));
} else {
pullCallback.onException(new MQClientException("unknow reseaon", responseFuture.getCause()));
}
}
}
});
}
之前说的错误日志是response == null是打印的错误,但是由于有暂停时间的差距,response正常情况下不会为null,所以我就根本找不到日志,5.通过netty进行通讯NettyRemotingClient.invokeAsync,继续调用NettyRemotingAbstract.invokeAsyncImpl
boolean acquired = this.semaphoreAsync.tryAcquire(timeoutMillis, TimeUnit.MILLISECONDS);
if (acquired) {
final SemaphoreReleaseOnlyOnce once = new SemaphoreReleaseOnlyOnce(this.semaphoreAsync);
final ResponseFuture responseFuture = new ResponseFuture(opaque, timeoutMillis, invokeCallback, once);
this.responseTable.put(opaque, responseFuture);
try {
channel.writeAndFlush(request).addListener(new ChannelFutureListener() {
@Override
public void operationComplete(ChannelFuture f) throws Exception {
if (f.isSuccess()) {
responseFuture.setSendRequestOK(true);
return;
} else {
responseFuture.setSendRequestOK(false);
}
responseFuture.putResponse(null);
responseTable.remove(opaque);
try {
responseFuture.executeInvokeCallback();
} catch (Throwable e) {
plog.warn("excute callback in writeAndFlush addListener, and callback throw", e);
} finally {
responseFuture.release();
}
plog.warn("send a request command to channel <{}> failed.", RemotingHelper.parseChannelRemoteAddr(channel));
}
});
}
组装ResponseFuture ,放入responseTable中,回调函数和那个30s超时时间也在其中,发送成功之后就会更新responseFuture.setSendRequestOK(true);
6.启动netty客户端时顺便启动的的定时任务,注意时间是3s
this.timer.scheduleAtFixedRate(new TimerTask() {
@Override
public void run() {
try {
NettyRemotingClient.this.scanResponseTable();
} catch (Exception e) {
log.error("scanResponseTable exception", e);
}
}
}, 1000 * 3, 1000);
他会定时清理超时的回复response,并执行他的回调方法executeInvokeCallback,也就是上面报错的那段代码,所以他可以等30s才会有可能报错public void scanResponseTable() {
final List<ResponseFuture> rfList = new LinkedList<ResponseFuture>();
Iterator<Entry<Integer, ResponseFuture>> it = this.responseTable.entrySet().iterator();
while (it.hasNext()) {
Entry<Integer, ResponseFuture> next = it.next();
ResponseFuture rep = next.getValue();
if ((rep.getBeginTimestamp() + rep.getTimeoutMillis() + 1000) <= System.currentTimeMillis()) {
rep.release();
it.remove();
rfList.add(rep);
plog.warn("remove timeout request, " + rep);
}
}
for (ResponseFuture rf : rfList) {
try {
rf.executeInvokeCallback();
} catch (Throwable e) {
plog.warn("scanResponseTable, operationComplete Exception", e);
}
}
}
7.服务端broker的通讯服务端NettyRemotingServer接收到请求后执行 processMessageReceived(ctx, msg);,因为是RemotingCommand类型是REQUEST_COMMAND所以 进入processRequestCommand(ctx, cmd);这个方法中主要就是取出code对应的处理器PullMessageProcessor,
final Pair<NettyRequestProcessor, ExecutorService> matched = this.processorTable.get(cmd.getCode());并且启动一个线程执行相应方法,
final RemotingCommand response = pair.getObject1().processRequest(ctx, cmd);
8.拉取消息 this.processRequest(ctx.channel(), request, true);
PullMessageProcessor.processRequest(final Channel channel, RemotingCommand request, boolean brokerAllowSuspend)
进入DefaultMessageStore类获取消息final GetMessageResult getMessageResult =
this.brokerController.getMessageStore().getMessage(requestHeader.getConsumerGroup(), requestHeader.getTopic(),
requestHeader.getQueueId(), requestHeader.getQueueOffset(), requestHeader.getMaxMsgNums(), subscriptionData);
如果有新消息时,状态,下一次偏移量,消息都会有并且符合SelectMapedBufferResult selectResult = this.commitLog.getMessage(offsetPy, sizePy);
if (selectResult != null) {
this.storeStatsService.getGetMessageTransferedMsgCount().incrementAndGet();
getResult.addMessage(selectResult);
status = GetMessageStatus.FOUND;
nextPhyFileStartOffset = Long.MIN_VALUE;
}
但是还有一种情况就是没有新消息的时候,前一次刚好把消息取完,又没有新的消息就会进入下面的分支 if (offset == maxOffset) {
status = GetMessageStatus.OFFSET_OVERFLOW_ONE;
nextBeginOffset = nextOffsetCorrection(offset, offset);
}
状态转化,这里只截取两个主要主要状态 switch (getMessageResult.getStatus()) {
case FOUND:
response.setCode(ResponseCode.SUCCESS);
break;
case OFFSET_OVERFLOW_ONE:
response.setCode(ResponseCode.PULL_NOT_FOUND);
break;
9.下来就是组装response了,成功没什么可看的,正常走流程就行,主要就是看一下没有新消息的流程 switch (response.getCode()) {
case ResponseCode.SUCCESS:
break;
case ResponseCode.PULL_NOT_FOUND://没有新消息的时候会走到这里
//这两个参数第一次都为true,第二次brokerAllowSuspend为false
if (brokerAllowSuspend && hasSuspendFlag) {
long pollingTimeMills = suspendTimeoutMillisLong;
if (!this.brokerController.getBrokerConfig().isLongPollingEnable()) {
pollingTimeMills = this.brokerController.getBrokerConfig().getShortPollingTimeMills();
}
String topic = requestHeader.getTopic();
long offset = requestHeader.getQueueOffset();
int queueId = requestHeader.getQueueId();
PullRequest pullRequest = new PullRequest(request, channel, pollingTimeMills,
this.brokerController.getMessageStore().now(), offset, subscriptionData);
this.brokerController.getPullRequestHoldService().suspendPullRequest(topic, queueId, pullRequest);
response = null;//当它为空的时候不会给客户端返回数据,但是客户端每过三秒会查看一下未回复的response,但是暂停的超时时间是30s,所以并不会报错写日志
break;
}
10.这里会把response置空,并回到线程的最后,这里恰好不会回复客户端的请求。if (!cmd.isOnewayRPC()) {
if (response != null) {
response.setOpaque(opaque);
response.markResponseType();
try {
ctx.writeAndFlush(response);
} catch (Throwable e) {
plog.error("process request over, but response failed", e);
plog.error(cmd.toString());
plog.error(response.toString());
}
} else {
//进入这里不会给客户端回复消息,如果没有新消息的话第一次会到这里
}
}
11.然后重新组装PullRequest ,超时时间为前面设置的15s,并且放入PullRequestHoldService的延迟拉取请求pullRequestTable中,这个服务会在broker服务器start启动时跟着启动
if (this.brokerController.getBrokerConfig().isLongPollingEnable()) { this.waitForRunning(10 * 1000);//暂停时间必须要少于客户端设置的30s
} else {
this.waitForRunning(this.brokerController.getBrokerConfig().getShortPollingTimeMills());
}
this.checkHoldRequest();
这里会进行延迟检查目前所持有的那些拉取请求
private void checkHoldRequest() {
for (String key : this.pullRequestTable.keySet()) {
String[] kArray = key.split(TOPIC_QUEUEID_SEPARATOR);
if (kArray != null && 2 == kArray.length) {
String topic = kArray[0];
int queueId = Integer.parseInt(kArray[1]);
final long offset = this.brokerController.getMessageStore().getMaxOffsetInQuque(topic, queueId);
this.notifyMessageArriving(topic, queueId, offset);
}
}
}
然后分别检查里面请求是否有有效的数据,实际的偏移量大于请求的偏移量newestOffset > request.getPullFromThisOffset()或者暂停时间超时System.currentTimeMillis() >= (request.getSuspendTimestamp() + request.getTimeoutMillis())//broker暂停时间是15s,唤醒请求,也就是再次请求 this.brokerController.getPullMessageProcessor().excuteRequestWhenWakeup(request.getClientChannel(), request.getRequestCommand());这次请求的参数就不会进行暂停了RemotingCommand response = PullMessageProcessor.this.processRequest(channel, request, false);也就是brokerAllowSuspend为false,
如果这次还是没有数据的话也就不会走到上面的9.
12.response不会为null,code是ResponseCode.PULL_NOT_FOUND,会再次进入上面的10,然后给客户端回复消息,这个时候客户端正常情况下也不会超过30s,也就不会
报错,也不会有错误日志
responseHeader.setNextBeginOffset(getMessageResult.getNextBeginOffset());
responseHeader.setMinOffset(getMessageResult.getMinOffset());
responseHeader.setMaxOffset(getMessageResult.getMaxOffset());下一次请求偏移量和最大偏移量会相等
13.消费者用NettyRemotingClient接收消息case RESPONSE_COMMAND: ,因为是回复的消息通过标志位判断走到NettyRemotingAbstract
public void processResponseCommand(ChannelHandlerContext ctx, RemotingCommand cmd) {
final int opaque = cmd.getOpaque();
final ResponseFuture responseFuture = responseTable.get(opaque);
if (responseFuture != null) {
responseFuture.setResponseCommand(cmd);
responseFuture.release();
responseTable.remove(opaque);
if (responseFuture.getInvokeCallback() != null) {
boolean runInThisThread = false;
ExecutorService executor = this.getCallbackExecutor();
if (executor != null) {
try {
executor.submit(new Runnable() {
@Override
public void run() {
try {
responseFuture.executeInvokeCallback();
} catch (Throwable e) {
plog.warn("execute callback in executor exception, and callback throw", e);
}
}
});
} catch (Exception e) {
runInThisThread = true;
plog.warn("execute callback in executor exception, maybe executor busy", e);
}
} else {
runInThisThread = true;
}
if (runInThisThread) {
try {
responseFuture.executeInvokeCallback();
} catch (Throwable e) {
plog.warn("executeInvokeCallback Exception", e);
}
}
} else {
responseFuture.putResponse(cmd);
}
} else {
plog.warn("receive response, but not matched any request, " + RemotingHelper.parseChannelRemoteAddr(ctx.channel()));
plog.warn(cmd.toString());
}
}
根据请求id取出对应的发送消息之前存储的ResponseFuture ,判断有没有回调函数,分别进入不同的分支,
invokeCallback.operationComplete(this);开始执行回调函数,开始我以为response 为null的话就会报错生成日志
@Override
public void operationComplete(ResponseFuture responseFuture) {
RemotingCommand response = responseFuture.getResponseCommand();
if (response != null) {
try {
PullResult pullResult = MQClientAPIImpl.this.processPullResponse(response);
assert pullResult != null;
pullCallback.onSuccess(pullResult);
} catch (Exception e) {
pullCallback.onException(e);
}
} else {
if (!responseFuture.isSendRequestOK()) {
pullCallback.onException(new MQClientException("send request failed", responseFuture.getCause()));
} else if (responseFuture.isTimeout()) {
pullCallback.onException(new MQClientException("wait response timeout " + responseFuture.getTimeoutMillis() + "ms",
responseFuture.getCause()));
} else {
pullCallback.onException(new MQClientException("unknow reseaon", responseFuture.getCause()));
}
}
}
然后开始组装服务器回复的消息,根据code生成拉取消息的转台,这里就是NO_NEW_MSGprivate PullResult processPullResponse(final RemotingCommand response) throws MQBrokerException, RemotingCommandException {
PullStatus pullStatus = PullStatus.NO_NEW_MSG;
switch (response.getCode()) {
case ResponseCode.SUCCESS:
pullStatus = PullStatus.FOUND;
break;
case ResponseCode.PULL_NOT_FOUND:
pullStatus = PullStatus.NO_NEW_MSG;
break;
case ResponseCode.PULL_RETRY_IMMEDIATELY:
pullStatus = PullStatus.NO_MATCHED_MSG;
break;
case ResponseCode.PULL_OFFSET_MOVED:
pullStatus = PullStatus.OFFSET_ILLEGAL;
break;
default:
throw new MQBrokerException(response.getCode(), response.getRemark());
}
PullMessageResponseHeader responseHeader =
(PullMessageResponseHeader) response.decodeCommandCustomHeader(PullMessageResponseHeader.class);
return new PullResultExt(pullStatus, responseHeader.getNextBeginOffset(), responseHeader.getMinOffset(),
responseHeader.getMaxOffset(), null, responseHeader.getSuggestWhichBrokerId(), response.getBody());
}
继续调用回调函数 pullCallback.onSuccess(pullResult);也就是DefaultMQPushConsumerImpl类最开始传进来的PullCallback pullCallback
14.根据pullResult.getPullStatus()进入不同的分支,如果成功,开始组装下一次请求的偏移量pullRequest.setNextOffset(pullResult.getNextBeginOffset());
DefaultMQPushConsumerImpl.this.consumeMessageService.submitConsumeRequest(//
pullResult.getMsgFoundList(), //
processQueue, //
pullRequest.getMessageQueue(), //
dispathToConsume);
提交消费请求给ConsumeMessageConcurrentlyService,然后紧接着进行下一次的拉取请求 DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);并发消费消息服务最开始跟着消费者一块启动this.consumeExecutor.submit(consumeRequest);消息消费 status = listener.consumeMessage(Collections.unmodifiableList(msgs), context);
15。如果没有新消息的话会立刻进行下一次拉取请求
case NO_NEW_MSG:
pullRequest.setNextOffset(pullResult.getNextBeginOffset());
DefaultMQPushConsumerImpl.this.correctTagsOffset(pullRequest);
DefaultMQPushConsumerImpl.this.executePullRequestImmediately(pullRequest);
break;