目录
- 一、介绍Collector
- 二、安装Collector
- 2.1 Docker方式
- 2.2 Windows系统安装
- 三、配置Collector
- 四、exporter配置
- 4.1 导出到Skywalking
- 4.1.1 导出metrics、logs
- 4.1.2 通过zipkin导出traces到Skywalking
- 4.2 导出到Jaeger
- 4.3 导出到zipkin
- 4.4 导出到Prometheus
- 4.4.1 Prometheus主动抓取(PULL)
- 4.4.1.1 启动Prometheus
- 4.4.1.2 OTel Collector配置
- 4.4.1.3 查询Prometheus中的metric
- 4.4.2 向Prometheus推送(PUSH)
- 4.5 导出到debug - 控制台日志
- 4.6 导出到文件file
一、介绍Collector
OpenTelemetry Collector是一个与供应商无关的代理,可以接收、处理和导出遥测数据。
- 它支持接收多种格式的遥测数据(例如,OTLP、Jaeger、Prometheus,以及许多商业/专有工具)
- 并将数据发送到一个或多个后端
- 它还支持在导出遥测数据之前对其进行处理和过滤
Collector包含4种组件:
-
Receivers 接收器 - 接收器可以是基于推或拉的,它是数据进入收集器的方式。接收器可以支持一个或多个数据源。
-
Processors 处理器 - 处理器在接收和导出之间的数据上运行。处理器是可选的,但有些是推荐的。
-
Exporters 导出器 - 导出器可以是基于推或拉的,它是将数据发送到一个或多个后端/目的地的方式。导出器可以支持一个或多个数据源。
-
Connectors 连接器 - 连接器既是一个Exporter,也是一个Receiver。顾名思义,连接器连接两个管道:它作为导出器在一个管道的末端使用数据,并作为接收器在另一个管道的开始发出数据。它可以使用和发送相同数据类型的数据,也可以使用不同数据类型的数据。连接器可以生成和发送数据以汇总所消费的数据,也可以简单地复制或路由数据。
二、安装Collector
2.1 Docker方式
Docker命令启动:
docker run
-v $(pwd)/config.yaml:/etc/otelcol-contrib/config.yaml
otel/opentelemetry-collector-contrib:0.89.0
Docker Compose启动:
otel-collector:image: otel/opentelemetry-collector-contribvolumes:- ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yamlports:- 1888:1888 # pprof extension- 8888:8888 # Prometheus metrics exposed by the Collector- 8889:8889 # Prometheus exporter metrics- 13133:13133 # health_check extension- 4317:4317 # OTLP gRPC receiver- 4318:4318 # OTLP http receiver- 55679:55679 # zpages extension
2.2 Windows系统安装
下载最新Windows版Release包:
https://github.com/open-telemetry/opentelemetry-collector-releases/releases
下载成功后解压缩:
添加自定义配置文件otelcol-config.yaml(名称随意,与后续启动命令中的–config参数值对应即可):
如下为基础配置,仅将traces, metrics, logs导出到控制台输出
# 声明接收器
receivers:# OTLP接收器otlp:protocols:grpc:http:# 声明处理器
processors:batch:# 声明导出器
exporters:# 导出到日志debug:verbosity: detailedsampling_initial: 5sampling_thereafter: 200# 声明扩展
extensions:health_check:pprof:zpages:# 组合最终生效的配置
service:extensions: [health_check, pprof, zpages]pipelines:# traces配置traces:receivers: [otlp]processors: [batch]exporters: [debug]# metrics配置metrics:receivers: [otlp]processors: [batch]exporters: [debug]# logs配置logs:receivers: [otlp]processors: [batch]exporters: [debug]
打开cmd窗口,执行如下命令:
otelcol-contrib.exe --config otelcol-config.yaml
三、配置Collector
Collector各组件支持的配置列表如下:
- receivers
- otlp
- jaeger
- zipkin
- kafka
- prometheus
- opencensus
- fluentforward
- hostmetrics
- …
- processors
- batch, attributes, filter, resource, memory_limiter, probablistic_sampler, span, …
- exporters
- file
- kafka
- debug
- opencensus
- otlp
- otlphttp
- prometheus
- prometheusremotewrite
- zipkin
- …
- extensions
- health_check
- pprof
- zpages
- memory_ballast
- …
如下为我本地测试的配置:
- receivers:OTLP
- exporters:
- traces:[debug, file, otlp/jaege, zipkin/skywalking]
- metrics:[debug, file, skywalking, prometheus]
- logs:[debug, file, skywalking]
# 声明接收器
receivers:# OTLP接收器otlp:protocols:grpc:http:# 声明处理器
processors:batch:# 声明导出器
exporters:# 导出到日志debug:verbosity: basicsampling_initial: 5sampling_thereafter: 200# 导出到文件file:path: ./file_export/file.logrotation:max_megabytes: 10max_days: 3max_backups: 3localtime: trueformat: json# 启用压缩后,无法直接查看原始文本#compression: zstd# 导出traces到Jaeger(借助Jaeger内嵌OTel Collector gRPC 4137端口)otlp/jaeger:endpoint: http://10.170.xx.xxx:4317tls:insecure: true# 导出到Skywalking(metrics, logs)skywalking:endpoint: http://localhost:11800tls:insecure: true # 通过zipkin导出traces到Skywalkingzipkin/skywalking:# 对应Skywalking receiver-zipkin配置endpoint: http://localhost:9411/api/v2/spans# Prometheus导出器(由Prometheus主动拉取)prometheus:endpoint: "127.0.0.1:1234"# 声明扩展
extensions:health_check:pprof:zpages:# 组合最终生效的配置
service:extensions: [health_check, pprof, zpages]pipelines:# traces配置traces:receivers: [otlp]processors: [batch]exporters: [debug, file, otlp/jaeger, zipkin/skywalking]# metrics配置metrics:receivers: [otlp]processors: [batch]exporters: [debug, file, skywalking, prometheus]# logs配置logs:receivers: [otlp]processors: [batch]exporters: [debug, file, skywalking]
关于Collector的更多配置可参见下表:
组件 | 参考链接 |
---|---|
官方Collector代码库 | https://github.com/open-telemetry/opentelemetry-collector |
官方receivers | https://github.com/open-telemetry/opentelemetry-collector/blob/main/receiver/README.md |
官方exporters | https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/README.md |
社区Collector贡献代码库 | https://github.com/open-telemetry/opentelemetry-collector-contrib |
社区receivers | https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver |
社区exporters | https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter |
四、exporter配置
4.1 导出到Skywalking
详细配置可参见:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/skywalkingexporter/README.md
使用skywalk -data-collect-protocol格式通过gRPC导出数据。默认情况下,此导出程序需要TLS并提供排队重试功能。
4.1.1 导出metrics、logs
目前skywalking导出器仅支持导出metrics、logs,不支持traces,如果在traces使用skywalking导出器,报错如下:
2023/11/21 14:36:40 collector server run finished with error: failed to build pipelines:
failed to create "skywalking" exporter for data type "traces": telemetry type is not supported
需要进行以下设置:
- endpoint(无默认值):host:导出器使用gRPC协议向其发送SkyWalking日志数据的端口。这里描述了有效的语法。如果使用https方案,则启用客户端传输安全性并覆盖不安全设置。
- num_streams (default = 2):发送grpc请求的grpc流的个数。
默认情况下,TLS是启用的,必须在TLS:下配置。
- insecure (default = false):是否为导出器连接启用客户端传输安全性。
因此,在tls下还需要以下参数::
- cert_file(无默认值):用于TLS所需连接的TLS证书的路径。应该只在insecure设置为false时使用。
- key_file(无默认值):用于TLS所需连接的TLS密钥的路径。应该只在insecure设置为false时使用。
注:
Skywalking后端默认端口:
- 11800 - gRPC api
- 12800 - HTTP REST api。
- 8080 - UI监听端口,UI请求127.0.0.1/12800运行GraphQL查询
exporters:skywalking:endpoint: "192.168.1.5:11800"tls:insecure: true num_streams: 5 skywalking/2:endpoint: "10.18.7.4:11800"compression: "gzip"tls:cert_file: file.certkey_file: file.keytimeout: 10s
4.1.2 通过zipkin导出traces到Skywalking
https://skywalking.apache.org/docs/main/v9.6.0/en/setup/backend/otlp-trace/
修改Skywaling配置config/application.yaml
receiver-zipkin:# 修改为defaultselector: ${SW_RECEIVER_ZIPKIN:default}default:# Defines a set of span tag keys which are searchable.# The max length of key=value should be less than 256 or will be dropped.searchableTracesTags: ${SW_ZIPKIN_SEARCHABLE_TAG_KEYS:http.method}# The sample rate precision is 1/10000, should be between 0 and 10000sampleRate: ${SW_ZIPKIN_SAMPLE_RATE:10000}## The below configs are for OAP collect zipkin trace from HTTPenableHttpCollector: ${SW_ZIPKIN_HTTP_COLLECTOR_ENABLED:true}restHost: ${SW_RECEIVER_ZIPKIN_REST_HOST:0.0.0.0}restPort: ${SW_RECEIVER_ZIPKIN_REST_PORT:9411}restContextPath: ${SW_RECEIVER_ZIPKIN_REST_CONTEXT_PATH:/}restMaxThreads: ${SW_RECEIVER_ZIPKIN_REST_MAX_THREADS:200}restIdleTimeOut: ${SW_RECEIVER_ZIPKIN_REST_IDLE_TIMEOUT:30000}restAcceptQueueSize: ${SW_RECEIVER_ZIPKIN_REST_QUEUE_SIZE:0}## The below configs are for OAP collect zipkin trace from kafkaenableKafkaCollector: ${SW_ZIPKIN_KAFKA_COLLECTOR_ENABLED:false}kafkaBootstrapServers: ${SW_ZIPKIN_KAFKA_SERVERS:localhost:9092}kafkaGroupId: ${SW_ZIPKIN_KAFKA_GROUP_ID:zipkin}kafkaTopic: ${SW_ZIPKIN_KAFKA_TOPIC:zipkin}# Kafka consumer config, JSON format as Properties. If it contains the same key with above, would override.kafkaConsumerConfig: ${SW_ZIPKIN_KAFKA_CONSUMER_CONFIG:"{\"auto.offset.reset\":\"earliest\",\"enable.auto.commit\":true}"}# The Count of the topic consumerskafkaConsumers: ${SW_ZIPKIN_KAFKA_CONSUMERS:1}kafkaHandlerThreadPoolSize: ${SW_ZIPKIN_KAFKA_HANDLER_THREAD_POOL_SIZE:-1}kafkaHandlerThreadPoolQueueSize: ${SW_ZIPKIN_KAFKA_HANDLER_THREAD_POOL_QUEUE_SIZE:-1}# This module is for Zipkin query API and support zipkin-lens UI
query-zipkin:selector: ${SW_QUERY_ZIPKIN:default}default:# For HTTP serverrestHost: ${SW_QUERY_ZIPKIN_REST_HOST:0.0.0.0}restPort: ${SW_QUERY_ZIPKIN_REST_PORT:9412}restContextPath: ${SW_QUERY_ZIPKIN_REST_CONTEXT_PATH:/zipkin}restMaxThreads: ${SW_QUERY_ZIPKIN_REST_MAX_THREADS:200}restIdleTimeOut: ${SW_QUERY_ZIPKIN_REST_IDLE_TIMEOUT:30000}restAcceptQueueSize: ${SW_QUERY_ZIPKIN_REST_QUEUE_SIZE:0}# Default look back for traces and autocompleteTags, 1 day in millislookback: ${SW_QUERY_ZIPKIN_LOOKBACK:86400000}# The Cache-Control max-age (seconds) for serviceNames, remoteServiceNames and spanNamesnamesMaxAge: ${SW_QUERY_ZIPKIN_NAMES_MAX_AGE:300}## The below config are OAP support for zipkin-lens UI# Default traces query max sizeuiQueryLimit: ${SW_QUERY_ZIPKIN_UI_QUERY_LIMIT:10}# Default look back on the UI for search traces, 15 minutes in millisuiDefaultLookback: ${SW_QUERY_ZIPKIN_UI_DEFAULT_LOOKBACK:900000}
OTel Collector配置:
# 声明接收器
receivers:# OTLP接收器otlp:protocols:grpc:http:# 声明处理器
processors:batch:# 声明导出器
exporters:# 导出到Skywalking(metrics, logs)skywalking:endpoint: http://localhost:11800tls:insecure: true # 通过zipkin导出traces到Skywalkingzipkin/skywalking:# 对应Skywalking receiver-zipkin配置endpoint: http://localhost:9411/api/v2/spans# 声明扩展
extensions:health_check:pprof:zpages:# 组合最终生效的配置
service:extensions: [health_check, pprof, zpages]pipelines:# traces配置traces:receivers: [otlp]processors: [batch]exporters: [zipkin/skywalking]# metrics配置metrics:receivers: [otlp]processors: [batch]exporters: [skywalking]# logs配置logs:receivers: [otlp]processors: [batch]exporters: [skywalking]
通过Skywalking的zipkin UI端点进行查看:http://localhost:8080/zipkin/
可通过上图中的Traces ID
,可通过Skywalking UI查询该Trace对应的日志:http://localhost:8080/General-Service/Services
4.2 导出到Jaeger
最新版本的OTel Collector已经不支持Jaeger导出器(Stackoverflow/#77475771),想要通过Collector导出数据到Jaeger则需借助otlp导出器。目前最新版本的Jaeger已经内嵌OTLP Collector,暴露的端口也同为4317(gRPC),4318(HTTP)。
docker run --rm --name jaeger \-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \-p 6831:6831/udp \-p 6832:6832/udp \-p 5778:5778 \-p 16686:16686 \-p 4317:4317 \-p 4318:4318 \-p 14250:14250 \-p 14268:14268 \-p 14269:14269 \-p 9411:9411 \jaegertracing/all-in-one:1.51
Port | Protocol | Component | Function |
---|---|---|---|
6831 | UDP | agent | accept jaeger.thrift over Thrift-compact protocol (used by most SDKs) |
6832 | UDP | agent | accept jaeger.thrift over Thrift-binary protocol (used by Node.js SDK) |
5775 | UDP | agent | (deprecated) accept zipkin.thrift over compact Thrift protocol (used by legacy clients only) |
5778 | HTTP | agent | serve configs (sampling, etc.) |
16686 | HTTP | query | serve frontend |
4317 | HTTP | collector | accept OpenTelemetry Protocol (OTLP) over gRPC |
4318 | HTTP | collector | accept OpenTelemetry Protocol (OTLP) over HTTP |
14268 | HTTP | collector | accept jaeger.thrift directly from clients |
14250 | HTTP | collector | accept model.proto |
9411 | HTTP | collector | Zipkin compatible endpoint (optional) |
https://opentelemetry.io/docs/collector/configuration/#exporters
exporters:# Data sources: tracesotlp/jaeger:endpoint: jaeger-all-in-one:4317tls:cert_file: cert.pemkey_file: cert-key.pem# Data sources: tracesotlp/jaeger:endpoint: jaeger-all-in-one:14250tls:insecure: true
导出到Jaeger中的traces展示:
4.3 导出到zipkin
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/zipkinexporter/README.md
exporters:zipkin/nontls:endpoint: "http://some.url:9411/api/v2/spans"format: protodefault_service_name: unknown-servicezipkin/withtls:endpoint: "https://some.url:9411/api/v2/spans"zipkin/tlsnoverify:endpoint: "https://some.url:9411/api/v2/spans"tls:insecure_skip_verify: true
导出到zipkin中的traces展示:
4.4 导出到Prometheus
4.4.1 Prometheus主动抓取(PULL)
https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusexporter
exporters:prometheus:endpoint: "1.2.3.4:1234"tls:ca_file: "/path/to/ca.pem"cert_file: "/path/to/cert.pem"key_file: "/path/to/key.pem"namespace: test-spaceconst_labels:label1: value1"another label": spaced valuesend_timestamps: truemetric_expiration: 180menable_open_metrics: trueadd_metric_suffixes: falseresource_to_telemetry_conversion:enabled: true
最小配置:
exporters:prometheus:endpoint: "127.0.0.1:1234"
4.4.1.1 启动Prometheus
# 修改默认端口9090(避免本地端口冲突)
prometheus.exe --web.listen-address=:8001 --config.file=prometheus.yml
prometheus.yaml:
# my global config
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:# 原默认为localhost:9090- targets: ["localhost:8001"]# OTel Collector metrics exporter- job_name: "otelcol"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:1234"]
4.4.1.2 OTel Collector配置
# 声明接收器
receivers:# OTLP接收器otlp:protocols:grpc:http:# 声明处理器
processors:batch:# 声明导出器
exporters:# 导出到日志debug:verbosity: basicsampling_initial: 5sampling_thereafter: 200# Prometheus导出器(由Prometheus主动拉取)prometheus:endpoint: "127.0.0.1:1234"# 声明扩展
extensions:health_check:pprof:zpages:# 组合最终生效的配置
service:extensions: [health_check, pprof, zpages]pipelines:# traces配置traces:receivers: [otlp]processors: [batch]exporters: [debug]# metrics配置metrics:receivers: [otlp]processors: [batch]exporters: [debug, prometheus]# logs配置logs:receivers: [otlp]processors: [batch]exporters: [debug]
4.4.1.3 查询Prometheus中的metric
通过Prometheus UI查询:
通过Grafana Dashboard(自定义)查询:
Grafana DashBoard对应的PromQL如下:
# Metric - findGoodsPage_count
rate(findGoodsPage_count_total[5m])# JVM Memeory Used
sum(jvm_memory_used_bytes) by (exported_job) / 1000000# Http Instance QPS
sum(rate(http_server_requests_seconds_count[5m])) by (exported_job)# Http URI QPS
rate(http_server_requests_seconds_count[5m])
4.4.2 向Prometheus推送(PUSH)
https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusremotewriteexporter
Prometheus Remote Write Exporter发送OpenTelemetry指标到Prometheus远程写入兼容的后端,如Cortex, Mimir, Thanos。默认情况下,此导出程序需要TLS并提供排队重试功能。
4.5 导出到debug - 控制台日志
https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/debugexporter/README.md
以下设置是可选的:
- verbosity (default = basic):日志导出的详细信息(
detailed
|normal
|basic
)。当设置为detailed
时,将详细记录管道数据。 - sampling_initial (default = 2):每秒初始记录的消息数。
- sampling_hereafter (default = 500):记录初始消息后的采样率(每第m条消息记录一次)。更多细节请参考Zap文档。关于采样参数如何影响消息数量。
exporters:debug:verbosity: detailedsampling_initial: 5sampling_thereafter: 200
4.6 导出到文件file
https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/fileexporter
exporters:file/no_rotation:path: ./foofile/rotation_with_default_settings:path: ./foorotation:file/rotation_with_custom_settings:path: ./foorotation:max_megabytes: 10max_days: 3max_backups: 3localtime: trueformat: protocompression: zstdfile/flush_every_5_seconds:path: ./fooflush_interval: 5