Graphite 1.1: Teaching an Old Dog New Tricks
The Road to Graphite 1.1
I started working on Graphite just over a year ago, when @obfuscurity asked me to help out with some issues blocking the Graphite 1.0 release. Little did I know that a year later, that would have resulted in 262 commits (and counting), and that with the help of the other Graphite maintainers (especially @deniszh, @iksaif & @cbowman0) we would have added a huge amount of new functionality to Graphite.
There are a huge number of new additions and updates in this release, in this post I’ll give a tour of some of the highlights including tag support, syntax and function updates, custom function plugins, and python 3.x support.
Tagging!
The single biggest feature in this release is the addition of tag support, which brings the ability to describe metrics in a much richer way and to write more flexible and expressive queries.
Traditionally series in Graphite are identified using a hierarchical naming scheme based on dot-separated segments called nodes. This works very well and is simple to map into a hierarchical structure like the whisper filesystem tree, but it means that the user has to know what each segment represents, and makes it very difficult to modify or extend the naming scheme since everything is based on the positions of the segments within the hierarchy.
The tagging system gives users the ability to encode information about the series in a collection of tag=value
pairs which are used together with the series name to uniquely identify each series, and the ability to query series by specifying tag-based matching expressions rather than constructing glob-style selectors based on the positions of specific segments within the hierarchy. This is broadly similar to the system used by Prometheus and makes it possible to use Graphite as a long-term storage backend for metrics gathered by Prometheus with full tag support.
When using tags, series names are specified using the new tagged carbon format: name;tag1=value1;tag2=value2
. This format is backward compatible with most existing carbon tooling, and makes it easy to adapt existing tools to produce tagged metrics simply by changing the metric names. The OpenMetrics format is also supported for ingestion, and is normalized into the standard Graphite format internally.
At its core, the tagging system is implemented as a tag database (TagDB) alongside the metrics that allows them to be efficiently queried by individual tag values rather than having to traverse the metrics tree looking for series that match the specified query. Internally the tag index is stored in one of a number of pluggable tag databases, currently supported options are the internal graphite-web database, redis, or an external system that implements the Graphite tagging HTTP API. Carbon automatically keeps the index up to date with any tagged series seen.
The new seriesByTag
function is used to query the TagDB and will return a list of all the series that match the expressions passed to it. seriesByTag
supports both exact and regular expression matches, and can be used anywhere you would previously have specified a metric name or glob expression.
There are new dedicated functions for grouping and aliasing series by tag (groupByTags
and aliasByTags
), and you can also use tags interchangeably with node numbers in the standard Graphite functions like aliasByNode
, groupByNodes
, asPercent
, mapSeries
, etc.
Piping Syntax & Function Updates
One of the huge strengths of the Graphite render API is the ability to chain together multiple functions to process data, but until now (unless you were using a tool like Grafana) writing chained queries could be painful as each function had to be wrapped around the previous one. With this release it is now possible to “pipe” the output of one processing function into the next, and to combine piped and nested functions.
For example:
alias(movingAverage(scaleToSeconds(sumSeries(stats_global.production.counters.api.requests.*.count),60),30),'api.avg')
Can now be written as:
sumSeries(stats_global.production.counters.api.requests.*.count)|scaleToSeconds(60)|movingAverage(30)|alias('api.avg')
OR
stats_global.production.counters.api.requests.*.count|sumSeries()|scaleToSeconds(60)|movingAverage(30)|alias('api.avg')
Another source of frustration with the old function API was the inconsistent implementation of aggregations, with different functions being used in different parts of the API, and some functions simply not being available. In 1.1 all functions that perform aggregation (whether across series or across time intervals) now support a consistent set of aggregations; average
, median
, sum
, min
, max
, diff
, stddev
, count
, range
, multiply
and last
. This is part of a new approach to implementing functions that emphasises using shared building blocks to ensure consistency across the API and solve the problem of a particular function not working with the aggregation needed for a given task.
To that end a number of new functions have been added that each provide the same functionality as an entire family of “old” functions; aggregate
, aggregateWithWildcards
, movingWindow
, filterSeries
, highest
, lowest
and sortBy
.
Each of these functions accepts an aggregation method parameter, for example aggregate(some.metric.*, 'sum')
implements the same functionality as sumSeries(some.metric.*)
.
It can also be used with different aggregation methods to replace averageSeries
, stddevSeries
, multiplySeries
, diffSeries
, rangeOfSeries
, minSeries
, maxSeries
and countSeries
. All those functions are now implemented as aliases for aggregate
, and it supports the previously-missing median
and last
aggregations.
The same is true for the other functions, and the summarize
, smartSummarize
, groupByNode
, groupByNodes
and the new groupByTags
functions now all support the standard set of aggregations. Gone are the days of wishing that sortByMedian
or highestRange
were available!
For more information on the functions available check the function documentation.
Custom Functions
No matter how many functions are available there are always going to be specific use-cases where a custom function can perform analysis that wouldn’t otherwise be possible, or provide a convenient alias for a complicated function chain or specific set of parameters.
In Graphite 1.1 we added support for easily adding one-off custom functions, as well as for creating and sharing plugins that can provide one or more functions.
Each function plugin is packaged as a simple python module, and will be automatically loaded by Graphite when placed into the functions/custom folder.
An example of a simple function plugin that translates the name of every series passed to it into UPPERCASE:
from graphite.functions.params import Param, ParamTypes
def toUpperCase(requestContext, seriesList):
"""Custom function that changes series names to UPPERCASE"""
for series in seriesList:
series.name = series.name.upper()
return seriesList
toUpperCase.group = 'Custom'
toUpperCase.params = [
Param('seriesList', ParamTypes.seriesList, required=True),
]
SeriesFunctions = {
'upper': toUpperCase,
}
Once installed the function is not only available for use within Graphite, but is also exposed via the new Function API which allows the function definition and documentation to be automatically loaded by tools like Grafana. This means that users will be able to select and use the new function in exactly the same way as the internal functions.
More information on writing and using custom functions is available in the documentation.
Clustering Updates
One of the biggest changes from the 0.9 to 1.0 releases was the overhaul of the clustering code, and with 1.1.1 that process has been taken even further to optimize performance when using Graphite in a clustered deployment. In the past it was common for a request to require the frontend node to make multiple requests to the backend nodes to identify matching series and to fetch data, and the code for handling remote vs local series was overly complicated. In 1.1.1 we took a new approach where all render data requests pass through the same path internally, and multiple backend nodes are handled individually rather than grouped together into a single finder. This has greatly simplified the codebase, making it much easier to understand and reason about, while allowing much more flexibility in design of the finders. After these changes, render requests can now be answered with a single internal request to each backend node, and all requests for both remote and local data are executed in parallel.
To maintain the ability of graphite to scale out horizontally, the tagging system works seamlessly within a clustered environment, with each node responsible for the series stored on that node. Calls to load tagged series via seriesByTag
are fanned out to the backend nodes and results are merged on the query node just like they are for non-tagged series.
Python 3 & Django 1.11 Support
Graphite 1.1 finally brings support for Python 3.x, both graphite-web and carbon are now tested against Python 2.7, 3.4, 3.5, 3.6 and PyPy. Django releases 1.8 through 1.11 are also supported. The work involved in sorting out the compatibility issues between Python 2.x and 3.x was quite involved, but it is a huge step forward for the long term support of the project! With the new Django 2.x series supporting only Python 3.x we will need to evaluate our long-term support for Python 2.x, but the Django 1.11 series is supported through 2020 so there is time to consider the options there.
Watch This Space
Efforts are underway to add support for the new functionality across the ecosystem of tools that work with Graphite, adding collectd tagging support, prometheus remote read & write with tags (and native Prometheus remote read/write support in Graphite) and last but not least Graphite tag support in Grafana.
We’re excited about the possibilities that the new capabilities in 1.1.x open up, and can’t wait to see how the community puts them to work.
Download the 1.1.1 release and check out the release notes here.