Retry on RESOURCE_EXHAUSTED
failure
Grafana Cloud Traces returns RetryInfo
to correctly indicate retryable errors. This change aligns with the OpenTelemetry specification.
As per the OTel specification, “Retryable errors indicate that telemetry data processing failed, and the client SHOULD record the error and may retry exporting the same data. For example, this can happen when the server is temporarily unable to process the data.” If an error is retryable, the collector keeps the data and attempts to send again after the interval returned by the server.
Currently, Grafana Cloud Traces returns RESOURCE_EXHAUSTED
as a non-retryable error.
Starting on July 1, 2024, RESOURCE_EXHAUSTED
will change to being returned as a retryable error.
Note
This behavior change will take effect on July 1, 2024.
Impact
If configured to retry, telemetry collectors (OTel Collector, Grafana Alloy, Grafana Agent) correctly retry for retryable errors.
Incorrectly configured collectors might hold too much data in memory and run out of memory and crash.
The amount of data a collector holds in memory to retry can be controlled using sending_queue
and retry_on_failure
configuration options.
For Grafana Alloy, refer to sending_queue
and retry_on_failure
in Grafana Alloy.
For OpenTelemetry Collector, refer to sending_queue
and retry_on_failure
in the Configuration section of the README.