Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Documentation

Tempo Documentation

Troubleshooting

Too many jobs in the queue

Open source

I am getting error message ‘Too many jobs in the queue’

The error message might also be

queue doesn't have room for 100 jobs
failed to add a job to work queue

You may see this error if the compactor isn’t running and the blocklist size has exploded. Possible reasons why the compactor may not be running are:

Insufficient permissions.
Compactor sitting idle because no block is hashing to it.
Incorrect configuration settings.

Diagnosing the issue

Check metric tempodb_compaction_bytes_written_total If this is greater than zero (0), it means the compactor is running and writing to the backend.
Check metric tempodb_compaction_errors_total If this metric is greater than zero (0), check the logs of the compactor for an error message.

Solutions

Verify that the Compactor has the LIST, GET, PUT, and DELETE permissions on the bucket objects.
- If these permissions are missing, assign them to the compactor container.
- For detailed information, check - https://grafana.com/docs/tempo/latest/configuration/s3/#permissions
If there’s a compactor sitting idle while others are running, port-forward to the compactor’s http endpoint. Then go to /compactor/ring and click Forget on the inactive compactor.
Check the following configuration parameters to ensure that there are correct settings:
- max_block_bytes to determine when the ingester cuts blocks. A good number is anywhere from 100MB to 2GB depending on the workload.
- max_compaction_objects to determine the max number of objects in a compacted block. This should relatively high, generally in the millions.
- retention_duration for how long traces should be retained in the backend.
Check the storage section of the config and increase queue_depth. Do bear in mind that a deeper queue could mean longer waiting times for query responses. Adjust max_workers accordingly, which configures the number of parallel workers that query backend blocks.

storage:
  trace:
    pool:
      max_workers: 100                 # worker pool determines the number of parallel requests to the object store backend
      queue_depth: 10000