Menu

This is documentation for the next version of Loki. For the latest stable release, go to the latest version.

Open source

What is structured metadata

Warning

Structured metadata was added to chunk format V4 which is used if the schema version is greater or equal to 13. See Schema Config for more details about schema versions.

Selecting proper, low cardinality labels is critical to operating and querying Loki effectively. Some metadata, especially infrastructure related metadata, can be difficult to embed in log lines, and is too high cardinality to effectively store as indexed labels (and therefore reducing performance of the index).

Structured metadata is a way to attach metadata to logs without indexing them or including them in the log line content itself. Examples of useful metadata are kubernetes pod names, process ID’s, or any other label that is often used in queries but has high cardinality and is expensive to extract at query time.

Structured metadata can also be used to query commonly needed metadata from log lines without needing to apply a parser at query time. Large json blobs or a poorly written query using complex regex patterns, for example, come with a high performance cost. Examples of useful metadata include container_IDs or user IDs.

When to use structured metadata

You should only use structured metadata in the following situations:

  • If you are ingesting data in OpenTelemetry format, using Grafana Alloy or an OpenTelemetry Collector. Structured metadata was designed to support native ingestion of OpenTelemetry data.
  • If you have high cardinality metadata that should not be used as a label and does not exist in the log line. Some examples might include process_id or thread_id or Kubernetes pod names.
  • If you are using Explore Logs to visualize and explore your Loki logs.
  • If you are a large-scale customer, who is ingesting more than 75TB of logs a month and are using Bloom filters

We do not recommend extracting information that already exists in your log lines and putting it into structured metadata.

Attaching structured metadata to log lines

You have the option to attach structured metadata to log lines in the push payload along with each log line and the timestamp. For more information on how to push logs to Loki via the HTTP endpoint, refer to the HTTP API documentation.

Alternatively, you can use Grafana Alloy or Promtail to extract and attach structured metadata to your log lines. See the Promtail: Structured metadata stage for more information.

With Loki version 1.2.0, support for structured metadata has been added to the Logstash output plugin. For more information, see logstash.

Warning

Structured metadata size is taken into account while asserting ingestion rate limiting. Along with that, there are separate limits on how much structured metadata can be attached per log line.

yaml
# Maximum size accepted for structured metadata per log line.
# CLI flag: -limits.max-structured-metadata-size
[max_structured_metadata_size: <int> | default = 64KB]

# Maximum number of structured metadata entries per log line.
# CLI flag: -limits.max-structured-metadata-entries-count
[max_structured_metadata_entries_count: <int> | default = 128]

Querying structured metadata

Structured metadata is extracted automatically for each returned log line and added to the labels returned for the query. You can use labels of structured metadata to filter log line using a label filter expression.

For example, if you have a label pod attached to some of your log lines as structured metadata, you can filter log lines using:

logql
{job="example"} | pod="myservice-abc1234-56789"

Of course, you can filter by multiple labels of structured metadata at the same time:

logql
{job="example"} | pod="myservice-abc1234-56789" | trace_id="0242ac120002"

Note that since structured metadata is extracted automatically to the results labels, some metric queries might return an error like maximum of series (50000) reached for a single query. You can use the Keep and Drop stages to filter out labels that you don’t need. For example:

logql
count_over_time({job="example"} | trace_id="0242ac120002" | keep job  [5m])