分布式链路追踪技术核⼼思想
本质:记录⽇志,作为⼀个完整的技术,分布式链路追踪也有⾃⼰的理论和概念
微服务架构中,针对请求处理的调⽤链可以展现为⼀棵树,示意如下
⼀个请求通过⽹关服务路由到下游的微服务-1,然后微服务-1调⽤微
服务-2,拿到结果后再调⽤微服务-3,最后组合微服务-2和微服务-3的结果,通过⽹关返回给⽤户
为了追踪整个调⽤链路,肯定需要记录⽇志,⽇志记录是基础,在此之上肯定有⼀些理论概念,当下主
流的的分布式链路追踪技术/系统所基于的理念都来⾃于Google的⼀篇论⽂《Dapper, a Large-Scale
Distributed Systems Tracing Infrastructure》,这⾥⾯涉及到的核⼼理念是什么
标识⼀个请求链路,⼀条链路通过TraceId唯⼀标识, span标识发起的请求信息,各span通过
parrentId关联起来
Trace: 服务追踪的追踪单元是从客户发起请求(request)抵达被追踪系统的边界开始,到被追踪系统
向客户返回响应(response)为⽌的过程
Trace ID: 为了实现请求跟踪,当请求发送到分布式系统的⼊⼝端点时,只需要服务跟踪框架为该请求
创建⼀个唯⼀的跟踪标识Trace ID,同时在分布式系统内部流转的时候,框架失踪保持该唯⼀标识,直
到返回给请求⽅
⼀个Trace由⼀个或者多个Span组成,每⼀个Span都有⼀个SpanId, Span中会记录TraceId,同时还有
⼀个叫做ParentId,指向了另外⼀个Span的SpanId,表明⽗⼦关系,其实本质表达了依赖关系
Span ID: 为了统计各处理单元的时间延迟,当请求到达各个服务组件时,也是通过⼀个唯⼀标识Span
ID来标记它的开始,具体过程以及结束。对每⼀个Span来说,它必须有开始和结束两个节点,通过记录
开始Span和结束Span的时间戳,就能统计出该Span的时间延迟,除了时间戳记录之外,它还可以包含
⼀些其他元数据,⽐如时间名称、请求信息等。
每⼀个Span都会有⼀个唯⼀跟踪标识 Span ID,若⼲个有序的 span 就组成了⼀个 trace
Span可以认为是⼀个⽇志数据结构,在⼀些特殊的时机点会记录了⼀些⽇志信息,⽐如有时间戳、
spanId、 TraceId, parentIde等, Span中也抽象出了另外⼀个概念,叫做事件,核⼼事件如下
CS : client send/start 客户端/消费者发出⼀个请求,描述的是⼀个span开始
SR: server received/start 服务端/⽣产者接收请求 SR-CS属于请求发送的⽹络延迟
SS: server send/finish 服务端/⽣产者发送应答 SS-SR属于服务端消耗时间
CR: client received/finished 客户端/消费者接收应答 CR-SS表示回复需要的时间(响应的⽹络延
迟)
Spring Cloud Sleuth (追踪服务框架)可以追踪服务之间的调⽤, Sleuth可以记录⼀个服务请求经过哪
些服务、服务处理时⻓等,根据这些,我们能够理清各微服务间的调⽤关系及进⾏问题追踪分析。
耗时分析:通过 Sleuth 了解采样请求的耗时,分析服务性能问题(哪些服务调⽤⽐较耗时)
链路优化:发现频繁调⽤的服务,针对性优化等Sleuth就是通过记录⽇志的⽅式来记录踪迹数据的
注意:我们往往把Spring Cloud Sleuth 和 Zipkin ⼀起使⽤,把 Sleuth 的数据信息发送给 Zipkin 进
⾏聚合,利⽤ Zipkin 存储并展示数据。
Sleuth + Zipkin
1)每⼀个需要被追踪踪迹的微服务⼯程都引⼊依赖坐标
<!--链路追踪-->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
2)每⼀个微服务都修改application.yml配置⽂件,添加⽇志级别
logging:
level:
org.springframework.web.servlet.DispatcherServlet: debug
org.springframework.cloud.sleuth: debug
- 调用微服务
2020-08-27 14:03:34.820 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] o.s.c.s.i.a.ContextRefreshedListener : Context successfully refreshed
2020-08-27 14:03:34.829 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] o.s.c.s.i.w.c.f.LazyTracingFeignClient : Sending a request via tracing feign client [org.springframework.cloud.sleuth.instrument.web.client.feign.TracingFeignClient@71d5f0b7] and the delegate [feign.Client$Default@68c343df]
2020-08-27 14:03:34.829 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] o.s.c.s.i.w.c.feign.TracingFeignClient : Handled send of NoopSpan(c4e61a3c6ecc8362/971cd7cb83cb1eed)
2020-08-27 14:03:35.084 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] o.s.c.s.i.w.c.feign.TracingFeignClient : Handled receive of NoopSpan(c4e61a3c6ecc8362/971cd7cb83cb1eed)
2020-08-27 14:03:35.087 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] c.s.i.w.c.f.TraceLoadBalancerFeignClient : After receive
2020-08-27 14:03:35.087 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] com.liu.service.feign.ResumeFeignClient : [ResumeFeignClient#findDefaultResumeState] <--- HTTP/1.1 200 (904ms)
2020-08-27 14:03:35.087 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] com.liu.service.feign.ResumeFeignClient : [ResumeFeignClient#findDefaultResumeState] content-type: application/json;charset=UTF-8
2020-08-27 14:03:35.087 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] com.liu.service.feign.ResumeFeignClient : [ResumeFeignClient#findDefaultResumeState] date: Thu, 27 Aug 2020 06:03:35 GMT
2020-08-27 14:03:35.087 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] com.liu.service.feign.ResumeFeignClient : [ResumeFeignClient#findDefaultResumeState] transfer-encoding: chunked
2020-08-27 14:03:35.087 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] com.liu.service.feign.ResumeFeignClient : [ResumeFeignClient#findDefaultResumeState]
2020-08-27 14:03:35.096 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] com.liu.service.feign.ResumeFeignClient : [ResumeFeignClient#findDefaultResumeState] 8080
2020-08-27 14:03:35.096 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,4d31cf44366337e8,false] 20492 --- [ervice-resume-1] com.liu.service.feign.ResumeFeignClient : [ResumeFeignClient#findDefaultResumeState] <--- END HTTP (4-byte body)
2020-08-27 14:03:35.132 DEBUG [edu-service-autodeliver,c4e61a3c6ecc8362,d09c413e5a923842,false] 20492 --- [nio-8090-exec-2] o.s.web.servlet.DispatcherServlet : Completed 200 OK
zipKin服务端构建
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>
<!--链路追踪-->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
<dependency>
<groupId>io.zipkin.java</groupId>
<artifactId>zipkin-server</artifactId>
<version>2.12.3</version>
<exclusions>
<!--排除掉log4j2的传递依赖,避免和springboot依赖的⽇志组件冲突-->
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-log4j2</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--zipkin-server ui界⾯依赖坐标-->
<dependency>
<groupId>io.zipkin.java</groupId>
<artifactId>zipkin-autoconfigure-ui</artifactId>
<version>2.12.3</version>
</dependency>
<!--链路追踪-->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
</dependencies>
启动类
@EnableZipkinServer
@SpringBootApplication
public class ZipkinServerApplication {
public static void main(String[] args) {
SpringApplication.run(ZipkinServerApplication.class,args);
}
}
- 配置文件
server:
port: 9441
management:
metrics:
web:
server:
auto-time-requests: false #关闭自动检测
启动服务端 http://localhost:9441/zipkin/
客户端中
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
客户端配置文件中添加对ZipkinServer的支持
server:
port: 8090
spring:
zipkin:
base-url: http://127.0.0.1:9441 # zipkin server的请求地址
sender: ## web 客户端将踪迹⽇志数据通过⽹络请求的⽅式传送到服务端,另外还有配置
type: web # kafka/rabbit 客户端将踪迹⽇志数据传递到mq进⾏中转
sleuth:
sampler: # 采样率 1 代表100%全部采集 ,默认0.1 代表10% 的请求踪迹数据会被采集
# ⽣产环境下,请求量⾮常⼤,没有必要所有请求的踪迹数据都采集分析,对于⽹络包括server端压⼒都是⽐较⼤的,可以配置采样率采集⼀定⽐例的请求的踪迹数据进⾏分析即可
probability: 1
发送请求
Zipkin持久化到mysql
sql
CREATE TABLE IF NOT EXISTS zipkin_spans (
`trace_id_high` BIGINT NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',
`trace_id` BIGINT NOT NULL,
`id` BIGINT NOT NULL,
`name` VARCHAR(255) NOT NULL,
`remote_service_name` VARCHAR(255),
`parent_id` BIGINT,
`debug` BIT(1),
`start_ts` BIGINT COMMENT 'Span.timestamp(): epoch micros used for endTs query and to implement TTL',
`duration` BIGINT COMMENT 'Span.duration(): micros used for minDuration and maxDuration query',
PRIMARY KEY (`trace_id_high`, `trace_id`, `id`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;
ALTER TABLE zipkin_spans ADD INDEX(`trace_id_high`, `trace_id`) COMMENT 'for getTracesByIds';
ALTER TABLE zipkin_spans ADD INDEX(`name`) COMMENT 'for getTraces and getSpanNames';
ALTER TABLE zipkin_spans ADD INDEX(`remote_service_name`) COMMENT 'for getTraces and getRemoteServiceNames';
ALTER TABLE zipkin_spans ADD INDEX(`start_ts`) COMMENT 'for getTraces ordering and range';
CREATE TABLE IF NOT EXISTS zipkin_annotations (
`trace_id_high` BIGINT NOT NULL DEFAULT 0 COMMENT 'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',
`trace_id` BIGINT NOT NULL COMMENT 'coincides with zipkin_spans.trace_id',
`span_id` BIGINT NOT NULL COMMENT 'coincides with zipkin_spans.id',
`a_key` VARCHAR(255) NOT NULL COMMENT 'BinaryAnnotation.key or Annotation.value if type == -1',
`a_value` BLOB COMMENT 'BinaryAnnotation.value(), which must be smaller than 64KB',
`a_type` INT NOT NULL COMMENT 'BinaryAnnotation.type() or -1 if Annotation',
`a_timestamp` BIGINT COMMENT 'Used to implement TTL; Annotation.timestamp or zipkin_spans.timestamp',
`endpoint_ipv4` INT COMMENT 'Null when Binary/Annotation.endpoint is null',
`endpoint_ipv6` BINARY(16) COMMENT 'Null when Binary/Annotation.endpoint is null, or no IPv6 address',
`endpoint_port` SMALLINT COMMENT 'Null when Binary/Annotation.endpoint is null',
`endpoint_service_name` VARCHAR(255) COMMENT 'Null when Binary/Annotation.endpoint is null'
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;
ALTER TABLE zipkin_annotations ADD UNIQUE KEY(`trace_id_high`, `trace_id`, `span_id`, `a_key`, `a_timestamp`) COMMENT 'Ignore insert on duplicate';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`, `span_id`) COMMENT 'for joining with zipkin_spans';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`) COMMENT 'for getTraces/ByIds';
ALTER TABLE zipkin_annotations ADD INDEX(`endpoint_service_name`) COMMENT 'for getTraces and getServiceNames';
ALTER TABLE zipkin_annotations ADD INDEX(`a_type`) COMMENT 'for getTraces and autocomplete values';
ALTER TABLE zipkin_annotations ADD INDEX(`a_key`) COMMENT 'for getTraces and autocomplete values';
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id`, `span_id`, `a_key`) COMMENT 'for dependencies job';
CREATE TABLE IF NOT EXISTS zipkin_dependencies (
`day` DATE NOT NULL,
`parent` VARCHAR(255) NOT NULL,
`child` VARCHAR(255) NOT NULL,
`call_count` BIGINT,
`error_count` BIGINT,
PRIMARY KEY (`day`, `parent`, `child`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED CHARACTER SET=utf8 COLLATE utf8_general_ci;
Server添加依赖
完整
<dependencies>
<!-- <dependency>-->
<!-- <groupId>org.springframework.cloud</groupId>-->
<!-- <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>-->
<!-- </dependency>-->
<!--链路追踪-->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
<dependency>
<groupId>io.zipkin.java</groupId>
<artifactId>zipkin-server</artifactId>
<version>2.12.3</version>
<exclusions>
<!--排除掉log4j2的传递依赖,避免和springboot依赖的⽇志组件冲突-->
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-log4j2</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--zipkin-server ui界⾯依赖坐标-->
<!-- https://mvnrepository.com/artifact/io.zipkin.java/zipkin-autoconfigure-ui -->
<dependency>
<groupId>io.zipkin.java</groupId>
<artifactId>zipkin-autoconfigure-ui</artifactId>
<version>2.12.3</version>
</dependency>
<!--zipkin mysql依赖-->
<dependency>
<groupId>io.zipkin.java</groupId>
<artifactId>zipkin-autoconfigure-storage-mysql</artifactId>
<version>2.12.3</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
</dependency>
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>druid-spring-boot-starter</artifactId>
<version>1.1.10</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-tx</artifactId>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-jdbc</artifactId>
</dependency>
</dependencies>
- 配置文件
server:
port: 9441
management:
metrics:
web:
server:
auto-time-requests: false #关闭自动检测
spring:
datasource:
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://localhost:3306/zipkin?useUnicode=true&characterEncoding=utf8&autoReconnect=true&zeroDateTimeBehavior=convertToNull&allowMultiQueries=true&useSSL=false&serverTimezone=UTC
username: root
password: root
druid:
initial-size: 10
min-idle: 5
max-active: 50
max-wait: 50000
zipkin:
storage:
type: mysql
注入事务管理器
@Bean
public PlatformTransactionManager txManager(DataSource dataSource) {
return new DataSourceTransactionManager(dataSource);
}
访问
查看数据库