引言
Spring-Cloud-Sleuth作为微服务链路跟踪组件,默认的实现方式是RestTemplate的调用链路跟踪。
原有项目中调用外部服务接口大多数采用的是使用了第三方的HttpClient库,如:Apache HttpClient或Asynchronous Http Client。
为了保证不影响现有业务接口的稳定性和少量的代码修改达到链路跟踪目的,接下来本文会介绍改造的过程。
原理分析
有关分布式服务链路跟踪的技术论文请参考Google Dapper,Zipkin是一种分布式跟踪系统。它有助于收集解决微服务架构中的延迟问题所需的时序数据,它管理这些数据的收集和查找,Zipkin的设计也是基于Google Dapper论文的,关于Zipkin的介绍请参考zipkin官方文档,此处不做过多的赘述,具体的架构图如下所示:
Spring-Cloud-Sleuth-Zipkin是SpringCloud体系提供的分布服务链路追踪集成套件,它基于zipkin本身的原理和Spring的特点进行有效的集成,使应用上更加的灵活方便。
Span是基本工作单元。例如,发送RPC是一个新的跨度,就像向RPC发送响应一样。跨度由跨度的唯一64位ID和跨度为其一部分的跟踪的另一个64位ID标识。Spans还有其他数据,例如描述,键值注释,导致它们的跨度的ID以及进程ID(通常是IP地址)。跨度启动和停止,他们跟踪他们的时间信息。创建跨度后,必须在将来的某个时刻停止它。一组跨度形成一个称为Trace的树状结构。例如,如果您正在运行分布式大数据存储,则可能会由put请求形成跟踪。
SpringCloud-Sleuth的主要特性包括:
- 将跟踪和跨度ID添加到Slf4J MDC,因此您可以从日志聚合器中的给定跟踪或跨度中提取所有日志。
- 提供对常见分布式跟踪数据模型的抽象:跟踪,跨距(形成DAG),注释,键值注释。松散地基于HTrace,但兼容Zipkin(Dapper)。
- 如果spring-cloud-sleuth-zipkin可用,则该应用程序将通过HTTP生成并收集与Zipkin兼容的跟踪。默认情况下,它将它们发送到localhost(端口9411)上的Zipkin收集器服务。使用spring.zipkin.baseUrl配置服务的位置。
通过它的特性可以看出它简化了应用程序的接入,在我们应用程序中只需要在应用程序主配置文件application.properties中添加相应的属性配置和相关的Maven依赖库即可快速的接入,常见的配置信息如下所示:
# 采样率,最大为1.0
spring.sleuth.sampler.percentage=0.1
## 配置消息发送方的消息类型,支持web(http),rabbitMQ,kafka
spring.zipkin.sender.type=kafka
## 配置kafka消息主题
spring.zipkin.kafka.topic=zipkin
## 配置kafka集群节点
spring.kafka.bootstrapServers=192.168.9.16:9092,192.168.9.17:9092,192.168.9.18:9092
对于Maven的第三方依赖如下所示:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-sleuth</artifactId>
<version>Edgware.SR5</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
了解了基本的原理和配置后,我们来看看应用程序如何接入?
其实很简单,只要Spring Cloud Sleuth位于类路径上,任何Spring Boot应用程序都将生成跟踪数据,具体的示例代码如下:
@SpringBootApplication
@RestController
public class Application {
private static Logger log = LoggerFactory.getLogger(DemoController.class);
@RequestMapping("/")
public String home() {
log.info("Handling home");
return "Hello World";
}
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}
运行此应用程序,然后点击主页。您将在日志中看到填充了traceId和spanId。如果这个应用程序调用另一个应用程序(例如使用RestTemplate),它将在标头中发送跟踪数据,如果接收器是另一个Sleuth应用程序,您将看到整个跟踪的过程。
业务需求
上述的实现方式都是基于RestTemplate接口的情况,现在我们回到文章最开始我们要解决的问题,旧有项目采用的是其它的方式来调用外部服务接口的,我们不可能将所有的业务接口重新按照RestTemplate重新实现一遍,这样研发的同学、测试同学、产品同学甚至上层都会有很大的意见,因为这样的改动不只是要投入研发人力去改动接口且接口的稳定性也会受到影响,原本稳定运行的接口由于改动测试的同学需要重新对业务接口做回归测试等等。
基于上述的原因,所以需要寻求一种折中方案,如何保证不修改原有业务接口代码的情况下加入少量的代码生成服务链路跟踪数据并进行存储。
解决方案
阅读Spring-Cloud-sleuth-zipkin的源码发现ZipkinSpanReporter实现了SpanReporter接口,该接口的主要作用是监听Sleuth的事件,然后通过report方法创建zipkin Span对象报告给Zipkin收集器,具体实现代码如下:
public class ZipkinSpanReporter implements SpanReporter {
private static final org.apache.commons.logging.Log log = org.apache.commons.logging.LogFactory
.getLog(ZipkinSpanReporter.class);
private final Reporter<zipkin2.Span> reporter;
private final Environment environment;
private final List<SpanAdjuster> spanAdjusters;
/**
* Endpoint is the visible IP address of this service, the port it is listening on and
* the service name from discovery.
*/
// Visible for testing
final EndpointLocator endpointLocator;
public ZipkinSpanReporter(Reporter<zipkin2.Span> reporter, EndpointLocator endpointLocator,
Environment environment, List<SpanAdjuster> spanAdjusters) {
this.reporter = reporter;
this.endpointLocator = endpointLocator;
this.environment = environment;
this.spanAdjusters = spanAdjusters;
}
/**
* Converts a given Sleuth span to a Zipkin Span.
* <ul>
* <li>Set ids, etc
* <li>Create timeline annotations based on data from Span object.
* <li>Create tags based on data from Span object.
* </ul>
*/
// Visible for testing
zipkin2.Span convert(Span span) {
//TODO: Consider adding support for the debug flag (related to #496)
Span convertedSpan = span;
for (SpanAdjuster adjuster : this.spanAdjusters) {
convertedSpan = adjuster.adjust(convertedSpan);
}
zipkin2.Span.Builder zipkinSpan = zipkin2.Span.newBuilder();
zipkinSpan.localEndpoint(this.endpointLocator.local());
processLogs(convertedSpan, zipkinSpan);
addZipkinTags(zipkinSpan, convertedSpan);
if (zipkinSpan.kind() != null && this.environment != null) {
setInstanceIdIfPresent(zipkinSpan, Span.INSTANCEID);
}
zipkinSpan.shared(convertedSpan.isShared());
zipkinSpan.timestamp(convertedSpan.getBegin() * 1000L);
if (!convertedSpan.isRunning()) { // duration is authoritative, only write when the span stopped
zipkinSpan.duration(calculateDurationInMicros(convertedSpan));
}
zipkinSpan.traceId(convertedSpan.traceIdString());
if (convertedSpan.getParents().size() > 0) {
if (convertedSpan.getParents().size() > 1) {
log.error("Zipkin doesn't support spans with multiple parents. Omitting "
+ "other parents for " + convertedSpan);
}
zipkinSpan.parentId(Span.idToHex(convertedSpan.getParents().get(0)));
}
zipkinSpan.id(Span.idToHex(convertedSpan.getSpanId()));
if (StringUtils.hasText(convertedSpan.getName())) {
zipkinSpan.name(convertedSpan.getName());
}
return zipkinSpan.build();
}
// Instead of going through the list of logs multiple times we're doing it only once
void processLogs(Span span, zipkin2.Span.Builder zipkinSpan) {
for (Log log : span.logs()) {
String event = log.getEvent();
long micros = log.getTimestamp() * 1000L;
// don't add redundant annotations to the output
if (event.length() == 2) {
if (event.equals("cs")) {
zipkinSpan.kind(zipkin2.Span.Kind.CLIENT);
} else if (event.equals("sr")) {
zipkinSpan.kind(zipkin2.Span.Kind.SERVER);
} else if (event.equals("ss")) {
zipkinSpan.kind(zipkin2.Span.Kind.SERVER);
} else if (event.equals("cr")) {
zipkinSpan.kind(zipkin2.Span.Kind.CLIENT);
} else if (event.equals("ms")) {
zipkinSpan.kind(zipkin2.Span.Kind.PRODUCER);
} else if (event.equals("mr")) {
zipkinSpan.kind(zipkin2.Span.Kind.CONSUMER);
} else {
zipkinSpan.addAnnotation(micros, event);
}
} else {
zipkinSpan.addAnnotation(micros, event);
}
}
}
private void setInstanceIdIfPresent(zipkin2.Span.Builder zipkinSpan, String key) {
String property = defaultInstanceId();
if (StringUtils.hasText(property)) {
zipkinSpan.putTag(key, property);
}
}
String defaultInstanceId() {
return IdUtils.getDefaultInstanceId(this.environment);
}
/**
* Adds tags from the sleuth Span
*/
private void addZipkinTags(zipkin2.Span.Builder zipkinSpan, Span span) {
Endpoint.Builder remoteEndpoint = Endpoint.newBuilder();
boolean shouldAddRemote = false;
// don't add redundant tags to the output
for (Map.Entry<String, String> e : span.tags().entrySet()) {
String key = e.getKey();
if (key.equals("peer.service")) {
shouldAddRemote = true;
remoteEndpoint.serviceName(e.getValue());
} else if (key.equals("peer.ipv4") || key.equals("peer.ipv6")) {
shouldAddRemote = true;
remoteEndpoint.ip(e.getValue());
} else if (key.equals("peer.port")) {
shouldAddRemote = true;
try {
remoteEndpoint.port(Integer.parseInt(e.getValue()));
} catch (NumberFormatException ignored) {
}
} else {
zipkinSpan.putTag(e.getKey(), e.getValue());
}
}
if (shouldAddRemote) {
zipkinSpan.remoteEndpoint(remoteEndpoint.build());
}
}
/**
* There could be instrumentation delay between span creation and the
* semantic start of the span (client send). When there's a difference,
* spans look confusing. Ex users expect duration to be client
* receive - send, but it is a little more than that. Rather than have
* to teach each user about the possibility of instrumentation overhead,
* we truncate absolute duration (span finish - create) to semantic
* duration (client receive - send)
*/
private long calculateDurationInMicros(Span span) {
Log clientSend = hasLog(Span.CLIENT_SEND, span);
Log clientReceived = hasLog(Span.CLIENT_RECV, span);
if (clientSend != null && clientReceived != null) {
return (clientReceived.getTimestamp() - clientSend.getTimestamp()) * 1000;
}
return span.getAccumulatedMicros();
}
private Log hasLog(String logName, Span span) {
for (Log log : span.logs()) {
if (logName.equals(log.getEvent())) {
return log;
}
}
return null;
}
@Override
public void report(Span span) {
if (span.isExportable()) {
this.reporter.report(convert(span));
} else {
if (log.isDebugEnabled()) {
log.debug("The span " + span + " will not be sent to Zipkin due to sampling");
}
}
}
@Override
public String toString(){
return "ZipkinSpanReporter(" + this.reporter + ")";
}
}
当调用report方法时会调用内部的convert函数生成符合zikin2 Span的数据结构。然后数据采集将span数据交给存储组件存储。
了解了基本的原理和工作流程,我们就可以通过SpanReporter接口在我们的业务代码中使用report方法创建链路跟踪数据了,以下是具体的实现代码:
@Autowired
private SpanReporter reporter;
public User getUserBySsoid(Integer ssoid) {
String requestUrl = reconstructURL(setting.API_User_Profile);
User user = new User();
String url = MessageFormat.format(requestUrl, String.valueOf(ssoid));
log.info(MessageFormat.format("通过ssoid:[{0}] 获取用户信息,url:[{1}]", ssoid, url));
try {
Response response = httpComponent.syncHttpRequest(url, null, RequestMethod.GET);
if (response.getStatusCode() != 200) {
throw new Exception("查找用户信息接口出错,code: " + response.getStatusCode());
}
String data = response.getResponseBody();
JSONObject object = JSONObject.parseObject(data);
JSONObject userObject = object.getJSONObject("data");
user = JSON.parseObject(userObject.toString(), User.class);
} catch (Exception e) {
log.error(MessageFormat.format("通过ssoid:[{0}] 获取用户信息出错,url:[{1}],错误信息:{2}", ssoid, url, e.toString()));
}
reporter.report(Span.builder().name(url).build());
return user;
}
上述的reporter.report(Span.builder().name(url).build());主要的作用是根据请求的URL地址构建SpringCloud-Sleuth Span对象,然后通过Spring-Cloud-sleuth-zipkin SpanReporter接口的report方法创建链路跟踪数据。
由于项目本身采用的数据采集存储组件为kafka,所以需要在我们的应用程序配置文件application.properties中添加如下配置:
spring.sleuth.sampler.percentage=0.1
## 配置消息发送方的消息类型,支持web(http),rabbitMQ,kafka
spring.zipkin.sender.type=kafka
## 配置kafka消息主题
spring.zipkin.kafka.topic=zipkin
## 配置kafka集群节点
spring.kafka.bootstrapServers=192.168.9.16:9092,192.168.9.17:9092,192.168.9.18:9092