How to observe AWS Lambda functions using the OpenTelemetry Collector and Grafana Cloud

• 2025-02-18 • 6 min

Note: A version of this post originally appeared on the OpenTelemetry blog.

Getting telemetry data out of modern applications is very straightforward—or at least it should be. You set up a collector that either receives data from your application or asks it to provide an up-to-date state of various counters. This happens every minute or so, and if it’s a second late or early, no one really bats an eye.

But what if the application isn’t around for long? What if every second waiting for the data to be collected is billed? Then you’re most likely thinking of function-as-a-service (FaaS) environments, the most well known being AWS Lambda.

Collecting telemetry data from Lambda functions presents some unique challenges, but there are tools that can help. In this blog, I’ll show you how to use the opentelemetry-lambda extension layer to gather your data efficiently and cost-effectively—and how you can pair it with Grafana Cloud without touching your code at all!

What are the challenges with observing Lambda functions?

In Lambda’s execution model, functions are called directly, and the environment is frozen afterward. You’re only billed for actual execution time and no longer need a server to wait for incoming requests. (This is also where the term serverless comes from.)

Keeping the function alive until metrics can be collected isn’t really an option. And even if you’re willing to pay for that, different invocations have completely separate contexts and don’t necessarily know about all the other executions happening simultaneously.

Now, you might be saying: “I’ll just push all the data at the end of my execution; no issues here!” But that doesn’t solve the issue either. You still have to pay for the time it takes to send the data—and with many invocations, this adds up.

But there is another way! Lambda extension layers allow you to run any process alongside your code, sharing the execution runtime and providing additional services. With the opentelemetry-lambda extension layer, you get a local endpoint to send data to while it keeps track of the Lambda lifecycle and ensures your telemetry gets to the storage layer.

How does the opentelemetry-lambda extension layer work?

When your function is called for the first time, the extension layer starts an instance of the OpenTelemetry Collector. The collector build is stripped down, providing only components necessary in the context of Lambda. It registers with the Lambda extension API and telemetry API. By doing this, it receives notifications whenever your function is executed, emits a logline, or the execution context is about to be shut down.

This is where the magic happens

Up until now, this just seems like extra work for nothing. You’ll still have to wait for the collector to ship the data, right?

This is where the special decouple processor comes in. It separates the receiving and exporting components while interfacing with the Lambda lifecycle. This allows for the Lambda to return, even if not all data has been sent. At the next invocation (or on shutdown) the collector continues shipping the data while your function does its thing.

Compared to sending the data directly from the application, this reduces the billed time significantly on repeated requests.

How can I use the extension layer?

The opentelemetry-lambda project publishes releases of the collector extension layer. It can be configured through a configuration file hosted either in an Amazon S3 bucket or on an arbitrary HTTP server. It is also possible to bundle the configuration file with your Lambda code.

In both cases, you have tradeoffs to consider. Using remote configuration files delivered through S3 or HTTP adds to the cold start duration as an additional request needs to be made, while bundling the configuration increases the management overhead when trying to control the configuration for multiple Lambdas. At scale, you can also provide your users with a custom version of the extension layer, utilizing a specific configuration file.

Getting started with the OpenTelemetry Collector

The simplest way to get started is with an embedded configuration. For this, add a file called collector.yaml to your function. This is a regular OpenTelemetry Collector configuration file. To take advantage of the Lambda-specific extensions, they need to be configured. As an example, the following configuration receives traces and logs from the telemetry API and sends them to another endpoint:

receivers:
 telemetryapi:
exporters:
 otlphttp/external:
   endpoint: "external-collector:4318"
processors:
 batch:
 decouple:
service:
 pipelines:
   traces:
 	receivers: [telemetryapi]
 	processors: [batch,decouple]
 	exporters: [otlphttp/external]
   logs:
 	receivers: [telemetryapi]
 	processors: [batch,decouple]
 	exporters: [otlphttp/external]

Afterward, set the OPENTELEMETRY_COLLECTOR_CONFIG_URI environment variable to /var/task/collector.yaml. Once the function is redeployed, you’ll see your function logs appear!

Every log line your Lambda produces will be sent to the specified external-collector endpoint. You don’t need to modify the code at all! From there, telemetry data flows to your backend as usual. Since the transmission of telemetry data might be frozen when the lambda is not active, logs can arrive delayed. They’ll either arrive during the next execution or during the shutdown interval.

Getting started with Grafana Cloud

To make it easier to start observing Lamdba with Grafana Cloud, we built the Grafana distribution of the extension. It bundles a simple configuration file that receives data from the telemetry API and sends it to the Grafana Cloud OpenTelemetry endpoint. Everything is configurable through environment variables, offering simple configuration while keeping startup times to a minimum.

You can try it out by adding the extension layer ARN to your lambda and configuring the following variables:

GRAFANA_CLOUD_INSTANCE_ID
GRAFANA_CLOUD_OTLP_ENDPOINT
GRAFANA_CLOUD_API_KEY_ARN

GRAFANA_CLOUD_API_KEY_ARN needs to be a reference to an existing secret in the AWS Secrets Manager. You also need to add the secretsmanager:GetSecretValue permission to the executing lambda role.

After your lambda is redeployed, you’ll see the logs in Grafana Cloud using the {service_name="<your-lambda-name>"} query:

Observing Lambda with Grafana Cloud

Every log line your Lambda produces will be sent to Grafana Cloud. You don’t need to modify the code at all! Since the transmission of telemetry data might be frozen when the lambda is not active, logs can arrive delayed. They’ll either arrive during the next execution or during the shutdown interval.

Since the sending of the data is deferred, logs might be delayed by one execution or until the lambda shuts down.

This extension is still in its early days, so let us know of any issues or wishes through the GitHub issue tracker.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

Feedback

Relevant sources:

Feedback

How to observe AWS Lambda functions using the OpenTelemetry Collector and Grafana Cloud

What are the challenges with observing Lambda functions?

How does the opentelemetry-lambda extension layer work?

This is where the magic happens

How can I use the extension layer?

Getting started with the OpenTelemetry Collector

Getting started with Grafana Cloud

Related content

How to observe AWS Lambda functions using the OpenTelemetry Collector and Grafana Cloud

What are the challenges with observing Lambda functions?

How does the opentelemetry-lambda extension layer work?

This is where the magic happens

How can I use the extension layer?

Getting started with the OpenTelemetry Collector

Getting started with Grafana Cloud

Related content

Grafana Cloud updates: Exemptions in Adaptive Logs, GPU monitoring in AI Observability, and more

How to cut costs for metrics and logs: a guide to lowering expenses in Grafana Cloud

From Datadog to Grafana Cloud: Why companies migrate and how it changes business for the better