APM仓颉语言SDK

仓颉原生APM SDK,参照opentelemetry标准实现的应用性能监测软件,不依赖第三方库。

特性

监测应用性能主要有以下特性

  1. 支持以下metric数据采集
    • Counter/UpDownCounter
    • Gauge
    • Histogram
  2. trace数据采集
    • 跨线程/进程/服务请求数据链路监控

开发计划

时间关键进展
2024.3.10完成metric采集核心逻辑,samples输出
2024.3.20完成trace采集核心逻辑,samples输出
2024.3.30完成metric、trace支持OpenTelemetry输出采集数据,提供试用版本
2024.4.15完成metric、trace支持OpenTelemetry输出采集数据,单元测试用例输出
2024.4.30完成metric、trace支持OpenTelemetry输出采集数据,提供发布版本

metric采集核心逻辑

  • 核心指标数据计算/Aggregator
  • 指标数据处理/MetricReader
  • 指标数据上报/Exporter

trace采集核心逻辑

  • 采样器/Sampler(头部采样)
  • 上下文传播器/ContextPropagators
  • trace数据处理/Processor
  • trace数据上报/Exporter

1.编译和测试

工程目录结构

|---samples  APM SDK使用示例目录
|     |---basic_example  基本使用实例
|     |---silo_example  结合silo框架使用用例
|---src      APM SDK源码目录
|     |---api  客户端调用API
|     |---exporter  数据上报
|     |---sdk  API核心实现
|---test     APM SDK单元测试目录
|     |---UT  SDK单元测试目录
|---module.json
|---README.md

1.1.编译步骤

  • 清理工程,在工程根目录下运行:
 cjpm clean / savant clean
  • 编译工程,在工程根目录下运行:
 cjpm build / savant install
  • 编译的主要静态库位于:
build/release/apm_sdk/api.cjo
build/release/apm_sdk/sdk.cjo
build/release/apm_sdk/exporter.cjo

1.2.单元测试

在工程test/UT目录下运行:

 cjpm test

1.3.交付物、功能范围

交付物说明功能范围
静态库apm_sdk编译完成后的静态库应用直接引用
samplesapm_sdk使用demo,分为basic/silo两种,根据实际情况使用引用SDK前可直接参考使用方式
test/UT单元测试方法核心功能的单元测试用例,测试方式见1.2.单元测试

1.4.SDK资源使用情况说明

较大对象说明见下表格

资源边界值说明
exporter资源上报thread=2trace/metric上报线程
trace采样器Mapsize=2048按照trace operator对采样器进行分组
trace批量上报Queuesize=2048批量上报trace队列
Metric仪器Mapsize=10000最大统计的metric仪器数量
Metric仪器上报按照attribute分组Mapsize=2000每个attribute最大支持缓存2000个handler
Metric仪器上报按照attribute分组非阻塞队Queuesize=无限大超过attribute分组Map,手动移除。作用:缓存handler对象避免重复创建大量对象

1.5.对应用影响情况保障

  • 额外启动线程均独立运行,不会影响应用运行
  • 大对象有一定资源消耗,均设置了边界值
  • trace流转为同步执行时均对异常做了处理,如:
    public func fetchSequence(segmentId: String): Int32 {
        if (let Some(v) <- SEQUENCE.get(segmentId)) {
            return v.fetchAdd(1)
        }
        throw UnsupportedException("not supported multiple span [same name] is root.")
    }
    public static func getIndent(segmentId: String): Int32 {
        if (let Some(value) <- INDENT.get(segmentId)) {
            value.compareAndSwap(-1, 0)
            return value.fetchSub(1)
        }
        throw UnsupportedException("not supported multiple span is root.")
    }
  • metric计算aggregator为异步线程,不影响其他程运行,如:
public class LongLastValueHandler <: AggregatorHandle<CounterPointData<Int64>> {
    private let current = AtomicReference<Long>(Long.instance())
    protected override func doAggregateThenMaybeReset(
        startEpochNanos: Int64,
        epochNanos: Int64,
        attributes: Attributes,
        reset: Bool
    ): CounterPointData<Int64> {
        //获取value前异步回调lambda
        let value: Int64
        if (reset) {
            //交换操作,采用默认内存排序方式,将参数 val 指定的值写入原子类型,并返回写入前的值。
            value = current.swap(Long.instance()).getValue()
        } else {
            value = current.load().getValue()
        }
        return CounterPointData<Int64>(startEpochNanos, epochNanos, attributes, value)
    }
    protected override func doRecordLong(value: Int64): Unit {
        current.store(Long(value))
    }
}

1.6.文件保留(一期监控数据写文件)

  • 文件写入按天生成文件夹,如:
./apm/2024-04-16/...
./apm/2024-04-17/...
./apm/2024-04-18/...
  • 支持指定轮转大小、文件存储路径、最大保留日期(超过日期才后删除最老文件夹)、暂不支持压缩,暂无其他写入IO需求
    //轮转文件大小为10MB,保留30天,文件存储路径为当前目录下./apm目录
    let config = CommonConfig.builder().rate(true).maxFileLength(10 * 1024 * 1024).maxDirectory(30).filePrefix("./apm").build()

2.在工程中使用APM SDK

2.1.导入APM SDK仓颉语言客户端的静态库

在工程的module.json中引入APM SDK仓颉语言客户端的静态库:

"package_requires": {
	"path_option": [
		"../build/release/apm_sdk"
	],
	"package_option": {}
}

2.2.创建监控配置OpenTelemetry

其中的Sampler见2.3详细说明

public class TelemetryConfig {
    private static let OPEN_TELEMETRY: OpenTelemetry
    static init() {
        //声明resource
        let resouce = SdkResource.create()
        //声明Metric输出
        let metricExporter = MetricExporter.create()

        let config = CommonConfig.builder().rate(true).build()
        //声明Metric执行器
        let meterProvider = MeterProvider.builder().setResource(resouce).setReader(
            MetricReader.builder(metricExporter).build()).build()

        //声明Tracer输出
        let tracerExporter = FileExporter.builder().build()
        //百分比采样率
        let rate = 0.6
        //声明Tracer执行器
        let tracerProvider = TracerProvider.builder().setResource(resouce).addProcessor(
            BatchProcessor.builder(tracerExporter, meterProvider).build()).setConfig(config).setSampler(
            //采样器支持自定义2.2.1
            GloabalRatioBasedSampler.create(rate)).build()

        OPEN_TELEMETRY = OpenTelemetry.builder().meterProvider(meterProvider).tracerProvider(tracerProvider).build()
    }
    public static prop openTelemetry: OpenTelemetry {
        get() {
            return OPEN_TELEMETRY
        }
    }
}
2.2.1.trace内置采样算法说明
  • 默认按照头部采样算法内置了3个采样器,支持自定义扩展实现采样器
  1. 全采样
  2. 不采样
  3. 按照请求百分比实施采样(核心)
  • 自定义采样器需要实现以下接口即可
import sdk.trace.samplers.ISampler
public enum CustomSampler <: ISampler {
    INSTANCE

    public override func shouldSample(): SamplingResult {
        //采样逻辑
        SamplingResult.drop()
    }
    public override func getDescription(): String {
        //采样器描述
        "CustomSampler"
    }
}
2.2.2.创建silo拦截器(若使用silo框架)
@reflection
public class InterceptorApm <: Interceptor {
    //部分自定义指标拦截采集
    private let responseTimeAvg = MetricHolder<IDoubleCounter>.get(MetricConst.RESPONSE_TIME_AVG.key)
    private let requestActiveMaxTime = MetricHolder<IDoubleCounter>.get(MetricConst.REQUEST_ACTIVE_MAX_TIME.key)
    private let requestActiveTotalCount = MetricHolder<ILongCounter>.get(MetricConst.REQUEST_ACTIVE_TOTAL_COUNT.key)
    private let requestFailed = MetricHolder<ILongCounter>.get(MetricConst.REQUEST_FAILED.key)
    /*
     * 处理上跨进程下文传播对象
     */
    public override func preHandle(request: RestRequest, response: RestResponse): Bool {
        requestActiveTotalCount.add(1, Attributes.of("request", "pre"))
        LogFactory.getInstance().debug("InterceptorApm preHandle")
        let openTelemetry = TelemetryConfig.openTelemetry
        //通过拦截器自动extract 上下文对象
        openTelemetry.getPropagators().getTextMapPropagator().extract(
            request.header,
            {
                carrier, key => (carrier as HttpHeaders).getOrThrow().getFirst(key)
            }
        )
        request.setPathParams(Array<(String, String)>([("apm_start", CommonUtils.timestamp().toString())]))
        TraceHolder.set(openTelemetry.getTracer())
        return true
    }

    /*
     * 后置拦截
     */
    public override func postHandle(request: RestRequest, response: RestResponse): Unit {
        if (let Some(v) <- request.getPathParam("apm_start")) {
            LogFactory.getInstance().debug("get path param value ${v}")
            let time = Float64(CommonUtils.timestamp() - Int64.parse(v))
            requestActiveMaxTime.calculate(time, CalculateType.MAX)
            responseTimeAvg.calculate(time, CalculateType.AVG)
        }
        LogFactory.getInstance().debug("InterceptorApm postHandle")
    }

    /*
     * 完成处理拦截
     */
    public override func afterCompletion(request: RestRequest, response: RestResponse, exception: Option<Exception>): Unit {
        LogFactory.getInstance().debug("InterceptorApm afterCompletion")

        if (let Some(e) <- exception) {
            requestFailed.add(1, Attributes.of("requestFail", true))
            response.internalServerError(e.toString().toArray())
        }
        //重置Tracer
        TraceHolder.removeTrace()
    }
}
2.2.3.操作metric
public class Metric {
    public var telemetry: OpenTelemetry
    public init() {
        telemetry = TelemetryConfig.openTelemetry
    }

    public func start() {
        let array = ArrayList<Int64>()
        let meter = telemetry.getMeter("io.open.oelemetry")
        let histogram = meter.histogramBuilder("testHistogram").ofLongs().setUnit("4").setDescription("histogram test").
            build()
        let upDownCounter = meter.upDownCounterBuilder("testUpDown").ofDoubles().setUnit("3").setDescription(
            "up down test").build()
        let processCounter = meter.counterBuilder("process").setUnit("2").setDescription("process test").build()
        meter.gaugeBuilder("arraySize").ofLongs().setUnit("1").setDescription("array size").callback(
            {
                measurement => measurement.record(array.size, Attributes.of("array", "apm"))
            }
        )
        upDownCounter.add(100.1, Attributes.of("updown", true))
        processCounter.add(1, Attributes.of("tttttt", "11111"))
        array.append(1)
        let random = Random()
        for (x in 0..10) {
            histogram.record(x * random.nextInt64(100), Attributes.of("random", true))
            array.append(x)
            sleep(500 * Duration.millisecond)
        }
        //异步线程操作metric
        spawn {
             =>
            processCounter.add(1, Attributes.of("aaaa", "11111"))
            processCounter.add(2, Attributes.of("bbbb", "11111"))
            sleep(1 * Duration.second)
            array.remove(0)
            array.remove(1)
            upDownCounter.add(-88.2, Attributes.of("updown", true))
            processCounter.add(2, Attributes.of("cccc", "11111"))
            processCounter.add(2, Attributes.of("dddd", "11111"))
            sleep(1 * Duration.second)
            processCounter.add(3, Attributes.of("eeee", "11111"))
            processCounter.add(2, Attributes.of("ffff", "11111"))
            histogram.record(10 * random.nextInt64(100), Attributes.of("random", false))
            sleep(1 * Duration.second)
            processCounter.add(4, Attributes.of("gggg", "11111"))
            processCounter.add(2, Attributes.of("hhhh", "11111"))
            sleep(1 * Duration.second)
            processCounter.add(5, Attributes.of("iiii", "11111"))
            processCounter.add(2, Attributes.of("kkkk", "11111"))
            sleep(1 * Duration.second)

            processCounter.add(2, Attributes.of("vvvvv", "33333"))
            upDownCounter.add(20.2, Attributes.of("updown", true))
        }
        sleep(10 * Duration.second)
    }
}

2.3.操作trace(在silo内Silo创建trace)

操作trace时有两种模式可选分别为:

  • 原生API
  • 内置宏
2.3.1.原生API
    @Get["/sendGetMsg"]
    public func sendGetMsg(): String {
        // 设置header信息
        var headers = HttpHeaders()
        let path = "/rest/rest_demo/demo/test/gettest"
        let outGoing = TraceHolder.tracer.spanBuilder(path).startSpan()
        try (scope = outGoing.makeCurrent()) {
            TelemetryConfig.openTelemetry.getPropagators().getTextMapPropagator().inject(
                TraceContext.current(),
                headers,
                {
                    carrier, key, value => if (let Some(v) <- carrier as HttpHeaders) {
                        v.add(key, value)
                    }
                }
            )
            outGoing.addAttribute(SemanticAttributes.HTTP_METHOD, "GET");
            outGoing.addAttribute(SemanticAttributes.HTTP_URL, path);
            var configuredOptions = RestfulOptions()
            configuredOptions.setHost("127.0.0.1")
            configuredOptions.setPort("8080")

            var client = HttpRest(configuredOptions)
            var restParam = RestfulParameters()
            restParam.setHeaderMap(headers)
            var queryParameter = Form("id=test1&count=100000&totalMoney=0.1&isValid=true")
            restParam.setParamForm(queryParameter)
            var res = client.get(path, restParam)

            var result = RestfulResponse(res).getResponseContent()
            LogFactory.getInstance().info("response info: ${result}")
            return result
        } catch (ex: Exception) {
            LogFactory.getInstance().error("send get message error,${ex.toString()}")
            outGoing.recordException(ex)
        } finally {
            outGoing.end()
        }
        return "ok"
    }

2.3.2.使用宏(推荐)
2.3.2.1.创建import.cj

在需要使用宏的包内创建文件import.cj,如需要dish.controller包内使用宏(同一个包路径下只需要引用一次包路径,包内其余文件均可引用),文件内容如下:

package dish.controller

import config.TelemetryConfig
from apm_sdk import sdk.OpenTelemetry
from apm_sdk import sdk.trace.*
from apm_sdk import api.trace.enums.*
from apm_sdk import api.trace.IScope
from apm_sdk import macros.*
//避免与宏@Context冲突
from apm_sdk import api.trace.{TraceHolder, ITracer, Context as TraceContext}
2.3.2.2.操作trace
  • 示例一

在silo controller内创建一个开启上下文传播的span

    @Get["/sendGetMsg"]
    //第一个参数为:span的operator路径,第二个参数为:开启trace上下文传播(具体含义参考手动声明SDK)
    @ApmSpan["/rest/rest_demo/demo/test/gettest", true]
    public func sendGetMsg(): String {
        //macrosSpan为宏自动生成直接引用
        macrosSpan.addAttribute(SemanticAttributes.HTTP_METHOD, "GET");
        macrosSpan.addAttribute(SemanticAttributes.HTTP_URL, path);
        var configuredOptions = RestfulOptions()
        configuredOptions.setHost("127.0.0.1")
        configuredOptions.setPort("8080")

        var client = HttpRest(configuredOptions)
        var restParam = RestfulParameters()
        //headers为宏自动生成直接引用
        restParam.setHeaderMap(headers)
        var queryParameter = Form("id=test1&count=100000&totalMoney=0.1&isValid=true")
        restParam.setParamForm(queryParameter)
        //path为宏自动生成直接引用
        var res = client.get(path, restParam)

        var result = RestfulResponse(res).getResponseContent()
        LogFactory.getInstance().info("response info: ${result}")
        return result
    }
  • 示例二

在普通方法内创建一个span

    @ApmSpan["/addOrder"]
    private func addOrder(orderId: Int64) {
        macrosSpan.addAttribute("orderId", orderId)
        let orderInfo = OrderInfo()
        orderInfo.setOrderId(orderId)
        orderInfo.setUserId(userId)
        orderDao.addOrder(orderInfo)
        println("生成订单成功")
    }
  • 示例三

开启异步线程创建span(需要传入父span)

    @ApmSpanAsync["/three"]
    public func asyncSpan(asyncSpan: ISpan) {
        //创建span3,跨线程并嵌套场景
        //该异步线程为一个新的segment span编号从0开始
        logger.info("3 span start.*****************")
        for (x in (0..2)) {
            //macrosTrace为宏自动生成直接引用
            child3_child(macrosTrace, x)
        }
    }
  • 示例四

创建span 并返回通过宏自动生成的span

    @ApmSpan["/parent"]
    public func runParent(): ISpan {
        //macrosSpan为宏自动生成直接引用
        macrosSpan.addAttribute(SemanticAttributes.HTTP_METHOD, "GET").addAttribute(
            SemanticAttributes.HTTP_URL,
            "/parent"
        )
        macrosSpan.addEvent("init");
        for (x in (0..10)) {
            //循环启动子span
            child(x)
        }
        macrosSpan.addEvent("process", Attributes.of("key", "aaa").addInt("value", 1))
        macrosSpan.addEvent("end")
        return macrosSpan
    }

  • 示例五

创建span 并添加link/kind

    @ApmSpan["/tow"]
    public func apmSpan(link: SpanContext,kind: SpanKind) {
        //创建span2,单个span执行场景
        logger.info("2 span start.");
        logger.info(
            "####################current span ${ISpan.current().getSpanContext()},${ISpan.current().getParentId()}####################"
        )
        child2(100)
    }

2.4.数据上报

一期暂只支持文件写入的方式上报trace/metric数据,支持按照指定大小轮转,每天生成一个文件夹 数据存储目录默认为应用根目录,也可通过配置指定:

  let config = CommonConfig.builder().rate(true).maxDirectory(30).filePrefix("./test").build()

默认生成文件格式如:

apm/2024-02-22/trace-0.json
apm/2024-02-22/metric-0.json

3.整体架构

3.1.sdk架构

整体架构分为三层分别为:

  • api
  • exportor
  • sdk
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值