Capture Kubernetes logs with OpenTelemetry Collector

It’s best practice to send OpenTelemetry logs with the OTLP protocol. Some use cases prevent this pattern and require outputting logs to files or stdout:

Lack of support of OTLP for logs by the OpenTelemetry SDK. The upstream OpenTelemetry SDKs for Go, Python, Ruby, JavaScript, and PHP don’t provide stable implementation of OTLP for logs.
Organizational constraints, often related to reliability practices, that require usage of files for logs.

You can still collect file-based logs with the OpenTelemetry Collector. Follow this documentation for an example on how to capture logs emitted through Kubernetes stdout, and you can apply the same pattern to logs emitted to files.

Architecture

To correlate between traces and metrics, you need to contextualize logs with the same resource attributes and trace and span IDs.

Firstly you need to enrich logs with the same identifying resource attributes, for example service.name, service.namespace, service.instance.id, and deployment.environment, and with trace_id and span_id.

Then you need to use the same metadata enrichment pipeline in the OpenTelemetry Collector, for example the Kubernetes Attributes Processor or the Resource Detection Processor.

If you automatically get this enrichment when exporting logs through OTLP, then add these attributes to the log lines when collected through files or stdout.

Kubernetes architecture with containerized application logs emitted through stdout, collected with the OpenTelemetry Collector, and sent to Grafana Cloud — This Kubernetes architecture diagram shows containerized application logs emitted through stdout, collected with the OpenTelemetry Collector, and sent to Grafana Cloud.

To carry over the resource attributes in the log lines, you need to use one of the following export patterns.

Export unstructured logs:

Export unstructured logs and parse them with regular expressions, for example:

2024-09-17T11:29:54  INFO [nio-8080-exec-1] c.e.OrderController  : Order completed - service.name=order-processor, service.instance.id=i-123456, span_id=1d5f8ca3f9366fac...

Export structured logs:

Export structured format logs like JSON logs and parse them with native parsers of the chosen format, for example:

{"timestamp": "2024-09-17T11:29:54", "level": "INFO", "body":"Order completed", "logger": "c.e.OrderController", "service_name": "order-processor", "service_instance_id": "i-123456", "span_id":"1d5f8ca3f9366fac"...}

Both export patterns have advantages and disadvantages:

	JSON logs	Unstructured logs
Correlation	+++	+++
Human Readability	The verbosity of JSON can seriously erode readability	Contextualization attributes can be appended at the end of the log line preserving readability
Reliability of the parsing	It’s simple to define robust JSON parsing rules	Parsing unstructured text with regular expressions is fragile, particularly due to multi-line log messages like stack traces, to the point where it requires monitoring parsing failures