Collectors

Collectors are low-level components that collect metrics for domains. See metrics/status.global/global.go for a reference example with extensive code comments.

Domain Names

Blip metric domain names have three requirements:

  1. Always lowercase
  2. One word: [a-z]+
  3. Singular noun: “size” not “sizes”; “query” not “queries”

Common abbreviation and acronyms are preferred, especially when they match MySQL usage: “thd” not “thread”; “pfs” not “performanceschema”; and so on.

Currently, domain names fit this convention, but if a need arises to allow hyphenation (“domain-name”), it might be allowed. Snake case (“domain_name”) and camel case (“domainName”) are not allowed: the former is used by metrics, and the latter is not Blip style.

Subdomains

Blip domains are dot-separated into subdomains, like status.global: domain is status, subdomain is global. Blip uses subdomains for metric grouping and, in rare cases, Blip organization. See the Domain Quick Reference for a full list of domains and subdomains.

Metric Grouping

Many MySQL metrics are grouped, and Blip mirrors the same grouping in its domain names. This has changed over many years, so if it seems new to you, then consider this:

This is why Blip uses domain status.global and not just “status” because the latter is ambiguous.

For brevity, all Blip domains and subdomains are called “domains”. repl is a domain, and status.global is a domain. The subdomain distinction is made only when discussing and naming them.

Global groups, like status.global and var.global, do not have a metric group because what would the group key and value be? During Blip developed, we considered a global group key-value like all="*" or all="", but these are useless magical values that serve no purpose, so we omitted them. Groups are only used when there are meaningful non-global group keys and values, like size.table: these metrics must be grouped by db and tbl to make sense.

Blip Organization

In rare cases, we subdomain for greater clarity and organization in Blip, especially with respect to level collection. For example, with repl and repl.lag a user might want to collect repl.lag frequently (every second) but collect repl info more slowly. If these two were one domain, it would be less clear in the plan and more difficult to code in the collector:

rep_lag:
  freq: 1s
  collect:
    repl:
      options:
        source-role: "east"
      metrics:
        - lag

repl_info:
  freq: 30s
  collect:
    repl:
      metrics:
        - running

That plan is valid, it’s more difficult to code in a single collector given totally different work (and options) at different levels. While collectors must handle different metrics at different levels, it’s usually the same work, which makes coding the collector easier. Also, the plan is less clear since it’s the same domain but configured differently.

By contrast, separate domains makes it easier to develop, test, explain, and use each:

We subdomain for Blip organization judiciously. If you’re making a custom collector that you want to merge upstream (into public Blip), be sure to file an issue and discuss with us first.

Metric Names

MySQL Metrics

Blip reports MySQL metric names as-is (no renaming) so that what you see in MySQL is what you get in Blip. The only modification Blip makes is lowercasing MySQL metric names for consistency because in MySQL they’re inconsistent:

  • Foo_bar (most common)
  • Foo_Bar (replica status)
  • foo_bar (InnoDB metrics)

If your collector only reports MySQL metrics, then just strings.ToLower() the name as-is from MySQL.

Derived Metrics

If your collector creates and reports derived metrics, then there are three requirements for naming:

  1. Only snake_case
  2. Always lowercase
  3. No prefixes or suffixes

A fully-qualified metric name includes a domain: status.global.threads_running. The metric name is always the last field (split on .).

Code

Example https://github.com/cashapp/blip/tree/main/examples/integrate shows how to create a custom metrics collector for domain “foo”. The high-level work is:

  1. Implement blip.Collector and blip.CollectorFactory
  2. Register the domain/collector by calling metrics.Register(myFactory, "foo")
  3. Use the domain in a plan:
level:
  collect:
    foo: # Custom domain/collector
      metrics:
        - whatever