仓颉原生APM SDK,参照opentelemetry标准实现的应用性能监测软件,不依赖第三方库。
特性
监测应用性能主要有以下特性
- 支持以下metric数据采集
- Counter/UpDownCounter
- Gauge
- Histogram
- trace数据采集
- 跨线程/进程/服务请求数据链路监控
开发计划
时间 | 关键进展 |
---|---|
2024.3.10 | 完成metric采集核心逻辑,samples输出 |
2024.3.20 | 完成trace采集核心逻辑,samples输出 |
2024.3.30 | 完成metric、trace支持OpenTelemetry输出采集数据,提供试用版本 |
2024.4.15 | 完成metric、trace支持OpenTelemetry输出采集数据,单元测试用例输出 |
2024.4.30 | 完成metric、trace支持OpenTelemetry输出采集数据,提供发布版本 |
metric采集核心逻辑
- 核心指标数据计算/Aggregator
- 指标数据处理/MetricReader
- 指标数据上报/Exporter
trace采集核心逻辑
- 采样器/Sampler(头部采样)
- 上下文传播器/ContextPropagators
- trace数据处理/Processor
- trace数据上报/Exporter
1.编译和测试
工程目录结构
|---samples APM SDK使用示例目录
| |---basic_example 基本使用实例
| |---silo_example 结合silo框架使用用例
|---src APM SDK源码目录
| |---api 客户端调用API
| |---exporter 数据上报
| |---sdk API核心实现
|---test APM SDK单元测试目录
| |---UT SDK单元测试目录
|---module.json
|---README.md
1.1.编译步骤
- 清理工程,在工程根目录下运行:
cjpm clean / savant clean
- 编译工程,在工程根目录下运行:
cjpm build / savant install
- 编译的主要静态库位于:
build/release/apm_sdk/api.cjo
build/release/apm_sdk/sdk.cjo
build/release/apm_sdk/exporter.cjo
1.2.单元测试
在工程test/UT目录下运行:
cjpm test
1.3.交付物、功能范围
交付物 | 说明 | 功能范围 |
---|---|---|
静态库 | apm_sdk编译完成后的静态库 | 应用直接引用 |
samples | apm_sdk使用demo,分为basic/silo两种,根据实际情况使用 | 引用SDK前可直接参考使用方式 |
test/UT | 单元测试方法 | 核心功能的单元测试用例,测试方式见1.2.单元测试 |
1.4.SDK资源使用情况说明
较大对象说明见下表格
资源 | 边界值 | 说明 |
---|---|---|
exporter资源上报 | thread=2 | trace/metric上报线程 |
trace采样器Map | size=2048 | 按照trace operator对采样器进行分组 |
trace批量上报Queue | size=2048 | 批量上报trace队列 |
Metric仪器Map | size=10000 | 最大统计的metric仪器数量 |
Metric仪器上报按照attribute分组Map | size=2000 | 每个attribute最大支持缓存2000个handler |
Metric仪器上报按照attribute分组非阻塞队Queue | size=无限大 | 超过attribute分组Map,手动移除。作用:缓存handler对象避免重复创建大量对象 |
1.5.对应用影响情况保障
- 额外启动线程均独立运行,不会影响应用运行
- 大对象有一定资源消耗,均设置了边界值
- trace流转为同步执行时均对异常做了处理,如:
public func fetchSequence(segmentId: String): Int32 {
if (let Some(v) <- SEQUENCE.get(segmentId)) {
return v.fetchAdd(1)
}
throw UnsupportedException("not supported multiple span [same name] is root.")
}
public static func getIndent(segmentId: String): Int32 {
if (let Some(value) <- INDENT.get(segmentId)) {
value.compareAndSwap(-1, 0)
return value.fetchSub(1)
}
throw UnsupportedException("not supported multiple span is root.")
}
- metric计算aggregator为异步线程,不影响其他程运行,如:
public class LongLastValueHandler <: AggregatorHandle<CounterPointData<Int64>> {
private let current = AtomicReference<Long>(Long.instance())
protected override func doAggregateThenMaybeReset(
startEpochNanos: Int64,
epochNanos: Int64,
attributes: Attributes,
reset: Bool
): CounterPointData<Int64> {
//获取value前异步回调lambda
let value: Int64
if (reset) {
//交换操作,采用默认内存排序方式,将参数 val 指定的值写入原子类型,并返回写入前的值。
value = current.swap(Long.instance()).getValue()
} else {
value = current.load().getValue()
}
return CounterPointData<Int64>(startEpochNanos, epochNanos, attributes, value)
}
protected override func doRecordLong(value: Int64): Unit {
current.store(Long(value))
}
}
1.6.文件保留(一期监控数据写文件)
- 文件写入按天生成文件夹,如:
./apm/2024-04-16/...
./apm/2024-04-17/...
./apm/2024-04-18/...
- 支持指定轮转大小、文件存储路径、最大保留日期(超过日期才后删除最老文件夹)、暂不支持压缩,暂无其他写入IO需求
//轮转文件大小为10MB,保留30天,文件存储路径为当前目录下./apm目录
let config = CommonConfig.builder().rate(true).maxFileLength(10 * 1024 * 1024).maxDirectory(30).filePrefix("./apm").build()
2.在工程中使用APM SDK
2.1.导入APM SDK仓颉语言客户端的静态库
在工程的module.json中引入APM SDK仓颉语言客户端的静态库:
"package_requires": {
"path_option": [
"../build/release/apm_sdk"
],
"package_option": {}
}
2.2.创建监控配置OpenTelemetry
其中的Sampler见2.3详细说明
public class TelemetryConfig {
private static let OPEN_TELEMETRY: OpenTelemetry
static init() {
//声明resource
let resouce = SdkResource.create()
//声明Metric输出
let metricExporter = MetricExporter.create()
let config = CommonConfig.builder().rate(true).build()
//声明Metric执行器
let meterProvider = MeterProvider.builder().setResource(resouce).setReader(
MetricReader.builder(metricExporter).build()).build()
//声明Tracer输出
let tracerExporter = FileExporter.builder().build()
//百分比采样率
let rate = 0.6
//声明Tracer执行器
let tracerProvider = TracerProvider.builder().setResource(resouce).addProcessor(
BatchProcessor.builder(tracerExporter, meterProvider).build()).setConfig(config).setSampler(
//采样器支持自定义2.2.1
GloabalRatioBasedSampler.create(rate)).build()
OPEN_TELEMETRY = OpenTelemetry.builder().meterProvider(meterProvider).tracerProvider(tracerProvider).build()
}
public static prop openTelemetry: OpenTelemetry {
get() {
return OPEN_TELEMETRY
}
}
}
2.2.1.trace内置采样算法说明
- 默认按照头部采样算法内置了3个采样器,支持自定义扩展实现采样器
- 全采样
- 不采样
- 按照请求百分比实施采样(核心)
- 自定义采样器需要实现以下接口即可
import sdk.trace.samplers.ISampler
public enum CustomSampler <: ISampler {
INSTANCE
public override func shouldSample(): SamplingResult {
//采样逻辑
SamplingResult.drop()
}
public override func getDescription(): String {
//采样器描述
"CustomSampler"
}
}
2.2.2.创建silo拦截器(若使用silo框架)
@reflection
public class InterceptorApm <: Interceptor {
//部分自定义指标拦截采集
private let responseTimeAvg = MetricHolder<IDoubleCounter>.get(MetricConst.RESPONSE_TIME_AVG.key)
private let requestActiveMaxTime = MetricHolder<IDoubleCounter>.get(MetricConst.REQUEST_ACTIVE_MAX_TIME.key)
private let requestActiveTotalCount = MetricHolder<ILongCounter>.get(MetricConst.REQUEST_ACTIVE_TOTAL_COUNT.key)
private let requestFailed = MetricHolder<ILongCounter>.get(MetricConst.REQUEST_FAILED.key)
/*
* 处理上跨进程下文传播对象
*/
public override func preHandle(request: RestRequest, response: RestResponse): Bool {
requestActiveTotalCount.add(1, Attributes.of("request", "pre"))
LogFactory.getInstance().debug("InterceptorApm preHandle")
let openTelemetry = TelemetryConfig.openTelemetry
//通过拦截器自动extract 上下文对象
openTelemetry.getPropagators().getTextMapPropagator().extract(
request.header,
{
carrier, key => (carrier as HttpHeaders).getOrThrow().getFirst(key)
}
)
request.setPathParams(Array<(String, String)>([("apm_start", CommonUtils.timestamp().toString())]))
TraceHolder.set(openTelemetry.getTracer())
return true
}
/*
* 后置拦截
*/
public override func postHandle(request: RestRequest, response: RestResponse): Unit {
if (let Some(v) <- request.getPathParam("apm_start")) {
LogFactory.getInstance().debug("get path param value ${v}")
let time = Float64(CommonUtils.timestamp() - Int64.parse(v))
requestActiveMaxTime.calculate(time, CalculateType.MAX)
responseTimeAvg.calculate(time, CalculateType.AVG)
}
LogFactory.getInstance().debug("InterceptorApm postHandle")
}
/*
* 完成处理拦截
*/
public override func afterCompletion(request: RestRequest, response: RestResponse, exception: Option<Exception>): Unit {
LogFactory.getInstance().debug("InterceptorApm afterCompletion")
if (let Some(e) <- exception) {
requestFailed.add(1, Attributes.of("requestFail", true))
response.internalServerError(e.toString().toArray())
}
//重置Tracer
TraceHolder.removeTrace()
}
}
2.2.3.操作metric
public class Metric {
public var telemetry: OpenTelemetry
public init() {
telemetry = TelemetryConfig.openTelemetry
}
public func start() {
let array = ArrayList<Int64>()
let meter = telemetry.getMeter("io.open.oelemetry")
let histogram = meter.histogramBuilder("testHistogram").ofLongs().setUnit("4").setDescription("histogram test").
build()
let upDownCounter = meter.upDownCounterBuilder("testUpDown").ofDoubles().setUnit("3").setDescription(
"up down test").build()
let processCounter = meter.counterBuilder("process").setUnit("2").setDescription("process test").build()
meter.gaugeBuilder("arraySize").ofLongs().setUnit("1").setDescription("array size").callback(
{
measurement => measurement.record(array.size, Attributes.of("array", "apm"))
}
)
upDownCounter.add(100.1, Attributes.of("updown", true))
processCounter.add(1, Attributes.of("tttttt", "11111"))
array.append(1)
let random = Random()
for (x in 0..10) {
histogram.record(x * random.nextInt64(100), Attributes.of("random", true))
array.append(x)
sleep(500 * Duration.millisecond)
}
//异步线程操作metric
spawn {
=>
processCounter.add(1, Attributes.of("aaaa", "11111"))
processCounter.add(2, Attributes.of("bbbb", "11111"))
sleep(1 * Duration.second)
array.remove(0)
array.remove(1)
upDownCounter.add(-88.2, Attributes.of("updown", true))
processCounter.add(2, Attributes.of("cccc", "11111"))
processCounter.add(2, Attributes.of("dddd", "11111"))
sleep(1 * Duration.second)
processCounter.add(3, Attributes.of("eeee", "11111"))
processCounter.add(2, Attributes.of("ffff", "11111"))
histogram.record(10 * random.nextInt64(100), Attributes.of("random", false))
sleep(1 * Duration.second)
processCounter.add(4, Attributes.of("gggg", "11111"))
processCounter.add(2, Attributes.of("hhhh", "11111"))
sleep(1 * Duration.second)
processCounter.add(5, Attributes.of("iiii", "11111"))
processCounter.add(2, Attributes.of("kkkk", "11111"))
sleep(1 * Duration.second)
processCounter.add(2, Attributes.of("vvvvv", "33333"))
upDownCounter.add(20.2, Attributes.of("updown", true))
}
sleep(10 * Duration.second)
}
}
2.3.操作trace(在silo内Silo创建trace)
操作trace时有两种模式可选分别为:
- 原生API
- 内置宏
2.3.1.原生API
@Get["/sendGetMsg"]
public func sendGetMsg(): String {
// 设置header信息
var headers = HttpHeaders()
let path = "/rest/rest_demo/demo/test/gettest"
let outGoing = TraceHolder.tracer.spanBuilder(path).startSpan()
try (scope = outGoing.makeCurrent()) {
TelemetryConfig.openTelemetry.getPropagators().getTextMapPropagator().inject(
TraceContext.current(),
headers,
{
carrier, key, value => if (let Some(v) <- carrier as HttpHeaders) {
v.add(key, value)
}
}
)
outGoing.addAttribute(SemanticAttributes.HTTP_METHOD, "GET");
outGoing.addAttribute(SemanticAttributes.HTTP_URL, path);
var configuredOptions = RestfulOptions()
configuredOptions.setHost("127.0.0.1")
configuredOptions.setPort("8080")
var client = HttpRest(configuredOptions)
var restParam = RestfulParameters()
restParam.setHeaderMap(headers)
var queryParameter = Form("id=test1&count=100000&totalMoney=0.1&isValid=true")
restParam.setParamForm(queryParameter)
var res = client.get(path, restParam)
var result = RestfulResponse(res).getResponseContent()
LogFactory.getInstance().info("response info: ${result}")
return result
} catch (ex: Exception) {
LogFactory.getInstance().error("send get message error,${ex.toString()}")
outGoing.recordException(ex)
} finally {
outGoing.end()
}
return "ok"
}
2.3.2.使用宏(推荐)
2.3.2.1.创建import.cj
在需要使用宏的包内创建文件import.cj,如需要dish.controller包内使用宏(同一个包路径下只需要引用一次包路径,包内其余文件均可引用),文件内容如下:
package dish.controller
import config.TelemetryConfig
from apm_sdk import sdk.OpenTelemetry
from apm_sdk import sdk.trace.*
from apm_sdk import api.trace.enums.*
from apm_sdk import api.trace.IScope
from apm_sdk import macros.*
//避免与宏@Context冲突
from apm_sdk import api.trace.{TraceHolder, ITracer, Context as TraceContext}
2.3.2.2.操作trace
- 示例一
在silo controller内创建一个开启上下文传播的span
@Get["/sendGetMsg"]
//第一个参数为:span的operator路径,第二个参数为:开启trace上下文传播(具体含义参考手动声明SDK)
@ApmSpan["/rest/rest_demo/demo/test/gettest", true]
public func sendGetMsg(): String {
//macrosSpan为宏自动生成直接引用
macrosSpan.addAttribute(SemanticAttributes.HTTP_METHOD, "GET");
macrosSpan.addAttribute(SemanticAttributes.HTTP_URL, path);
var configuredOptions = RestfulOptions()
configuredOptions.setHost("127.0.0.1")
configuredOptions.setPort("8080")
var client = HttpRest(configuredOptions)
var restParam = RestfulParameters()
//headers为宏自动生成直接引用
restParam.setHeaderMap(headers)
var queryParameter = Form("id=test1&count=100000&totalMoney=0.1&isValid=true")
restParam.setParamForm(queryParameter)
//path为宏自动生成直接引用
var res = client.get(path, restParam)
var result = RestfulResponse(res).getResponseContent()
LogFactory.getInstance().info("response info: ${result}")
return result
}
- 示例二
在普通方法内创建一个span
@ApmSpan["/addOrder"]
private func addOrder(orderId: Int64) {
macrosSpan.addAttribute("orderId", orderId)
let orderInfo = OrderInfo()
orderInfo.setOrderId(orderId)
orderInfo.setUserId(userId)
orderDao.addOrder(orderInfo)
println("生成订单成功")
}
- 示例三
开启异步线程创建span(需要传入父span)
@ApmSpanAsync["/three"]
public func asyncSpan(asyncSpan: ISpan) {
//创建span3,跨线程并嵌套场景
//该异步线程为一个新的segment span编号从0开始
logger.info("3 span start.*****************")
for (x in (0..2)) {
//macrosTrace为宏自动生成直接引用
child3_child(macrosTrace, x)
}
}
- 示例四
创建span 并返回通过宏自动生成的span
@ApmSpan["/parent"]
public func runParent(): ISpan {
//macrosSpan为宏自动生成直接引用
macrosSpan.addAttribute(SemanticAttributes.HTTP_METHOD, "GET").addAttribute(
SemanticAttributes.HTTP_URL,
"/parent"
)
macrosSpan.addEvent("init");
for (x in (0..10)) {
//循环启动子span
child(x)
}
macrosSpan.addEvent("process", Attributes.of("key", "aaa").addInt("value", 1))
macrosSpan.addEvent("end")
return macrosSpan
}
- 示例五
创建span 并添加link/kind
@ApmSpan["/tow"]
public func apmSpan(link: SpanContext,kind: SpanKind) {
//创建span2,单个span执行场景
logger.info("2 span start.");
logger.info(
"####################current span ${ISpan.current().getSpanContext()},${ISpan.current().getParentId()}####################"
)
child2(100)
}
2.4.数据上报
一期暂只支持文件写入的方式上报trace/metric数据,支持按照指定大小轮转,每天生成一个文件夹 数据存储目录默认为应用根目录,也可通过配置指定:
let config = CommonConfig.builder().rate(true).maxDirectory(30).filePrefix("./test").build()
默认生成文件格式如:
apm/2024-02-22/trace-0.json
apm/2024-02-22/metric-0.json
3.整体架构
3.1.sdk架构
整体架构分为三层分别为:
- api
- exportor
- sdk