haystack-metrics
This haystack-metrics module contains code needed by most or all of the other modules in the Haystack code base.
Metrics
The Haystack system is deployed by Kubernetes and comes with an InfluxDb database for time series data (TSD). Other modules then use Netflix Servo metrics objects to create two types of metrics:
- Counter monitors to track how often an event of interest is occurring, and
- Timer monitors to track how much time an event of interest is taking.
Glue code in the metrics package of this module makes it easy to create Counters and Timers.
Usage
Dependencies
In the <properties>
section of pom.xml put:
<haystack-metrics-version>...</haystack-metrics-version>
You should of course use the correct version of the haystack-metrics dependency in place of ... above.
In the <dependencies>
section of pom.xml put:
<dependency>
<groupId>com.expedia.www</groupId>
<artifactId>haystack-metrics</artifactId>
<version>${haystack-metrics-version}</version>
</dependency>
How to create objects
In the examples below, the values of SUBSYSTEM
, APPLICATION
, and CLASS_NAME
should not contain spaces or periods (each period or space will be changed to a hyphen).
Subsystem
As you will see, creating a Servo object in Haystack requires a "subsystem" String, whose value will be something like "pipes" or "trends"; the SUBSYSTEM
constant below should be defined at a high level in your subsystem code base.
public static final String SUBSYSTEM = "subsystemName"; // e.g. "pipes" or "trends"
Application
Applications are in the subsystem's git repository, and good practice is to store the application name at a high level in the application's code hierarchy.
public static final String APPLICATION = "applicationName";
Class
Creating a Servo object also requires a "class" String, which is often the Java class or Scala object containing the object:
private static final String CLASS_NAME = ClassContainingTheCounter.class.getSimpleName();
Refactoring or renaming may well lead to changing the name of the Java class or Scala object in which the Servo object resides, so it also acceptable to choose a "class" String that will never change (and that may not even correspond to an actual class in your code base):
private static final String CLASS_NAME = "JsonSerialization";
Singleton
Your Servo objects should be singletons, either as static (Java) or object (Scala) variables. The MetricObjects variable with which you create them can be managed by a Dependency Injection (DI) framework or not, as you see fit; note that if they are managed by a DI framework like Spring you can choose to let Spring manage the singleton and inject the same Servo object into each object that Spring instantiates. (The examples below show the creation of a new MetricsObject with the creation of each Servo object.) Servo objects are specified with Identifiers:
- Subsystem
- Application
- Class Name
- Metric Name If a Counter or Timer is created twice (that is, its Identifiers match that of an already registered Counter or Timer), then a warning will be logged and the existing object returned by the createAndRegister call. A Timer and a Counter with matching Identifiers is allowed but best avoided.
Counter
Creation
The code below is a Java snippet that shows the right way to create a Counter:
static final Counter REQUEST = (new MetricObjects()).createAndRegisterCounter(
SUBSYSTEM, APPLICATION, CLASS_NAME, "REQUEST");
Because the Servo Counter generates a RATE metric, using upper case for the variable name REQUEST
and the counter name "REQUEST"
is recommended because doing so results in an sensibly named complete metric name of REQUEST_RATE
in InfluxDb, as explained in the "Graphite Bridge" section of this document. The sendasrate
configuration controls whether rates or counts are sent by the Counters (simple counts are easier to understand and often just as useful as rates).
Usage
Simply increment the Counter to count:
REQUEST.increment();
It will be reset when its value is reported to InfluxDb.
BasicTimer
Creation
The code below is a Java snippet that shows the right way to create a BasicTimer:
static final Timer JSON_SERIALIZATION = (new MetricObjects()).createAndRegisterTimer(
SUBSYSTEM, APPLICATION, CLASS_NAME, "JSON_SERIALIZATION", TimeUnit.MICROSECONDS);
The Servo Timer generates four metrics (GAUGE max, NORMALIZED count, NORMALIZED totalOfSquares, and NORMALIZED totalTime), and while using upper case is again suggested (see the Counter section above), the complete metric names (JSON_SERIALIZATION_GAUGE_min
, JSON_SERIALIZATION_NORMALIZED_count
, JSON_SERIALIZATION_NORMALIZED_totalOfSquares
, and JSON_SERIALIZATION_NORMALIZED_totalTime
) are mixed case. Choose the appropriate time unit as the last argument:
- For on-host code,
TimeUnit.MICROSECONDS
is probably appropriate. - For network calls,
TimeUnit.MILLISECONDS
may be sufficient. The coarser TimeUnit.MILLISECONDS has less performance impact than the finer TimeUnit.MICROSECONDS and TimeUnit.NANOSECONDS; you can read more about this issue here and here.
Usage
Follow the pattern below (this is for Java; the Scala implementation is similar):
final Stopwatch stopwatch = JSON_SERIALIZATION.start();
try {
// Do the work being timed
} finally {
stopwatch.stop();
}
You can also do your own timing without using a Stopwatch:
JSON_SERIALIZATION.record(timeItTookInMs, TimeUnit.MILLISECONDS);
Again, the Timer will be reset when its values are reported to InfluxDb.
BucketTimer and StatsTimer
Servo provides counters more complicated than BasicTimer:
- BucketTimer and
- StatsTimer They can be created with code very similar to what was given in the BucketTimer section above.
The Main Method
To initialize the metrics system, the first line of your main() method should be something like:
(new MetricPublishing()).start(graphiteConfig);
where graphiteConfig is an implementation of the GraphiteConfig interface declared in this module.
Configuration
You will typically have a base.yaml in your resources directory whose contents will include something like:
haystack:
graphite:
prefix: "haystack" # using something other than "haystack" will require a change in the InfluxDb template
host: "haystack.local" # set in /etc/hosts per instructions in haystack/deployment module
port: 2003 # Graphite port; typically 2003
pollintervalseconds: 60
queuesize: 10
sendasrate: false
Graphite Bridge
The "Graphite Bridge" connects Servo metrics from the application to the Haystack InfluxDb via Graphite plaintext protocol messages. Such a message consists of three space-delimited Strings terminated by a newline:
<metric path> <metric value> <metric timestamp>\n
The <metric value>
is a number, and the pieces of <metric path>
are traditionally separated by a period. Note that the period-delimited pieces contain no metadata; that is, the meanings of each piece are not specified in the message. This lack of metadata is addressed in OpenTSDB but code to connect Servo metrics to InfluxDb via the OpenTSDB protocol does not currently exist in Servo; instead, the bridge uses the Graphite plaintext protocol, and an InfluxDb template (read about them in this README file) parses the Graphite plain text message into tags. (You can read about metrics tags here.)
This graphite bridge therefore requires a convention to map each metric piece to a tag; this convention is found/used in three places that must agree on the convention:
- The template configuration (see the
templates
value in influxdb.yaml) - The code that builds client-side tags (see
getTags()
in MetricObjects.java) - The code that creates the graphite plain text message from the metric and its client-side tags (see
getName()
in ServoToInfluxDbViaGraphiteNamingConvention.java)
As a result, the graphite message has the following meaning:
<system>.<server>.<subsystem>.<application>.<class>.<VARIABLE_NAME>_<METRIC_NAME> (for Counter)
<system>.<server>.<subsystem>.<application>.<class>.<VARIABLE_NAME>_<METRIC_NAME>_<timerStatName> (for Timer)
where:
<system>
is typically "haystack" (this value is controlled by thehaystack.graphite.prefix
configuration)<server>
is the host name<subsystem>
is the value discussed in the "Subsystem" section above<application>
is the value discussed in the "Application" section above<class>
is the value discussed in the "Class" section above<VARIABLE_NAME>_<METRIC_NAME>
or<VARIABLE_NAME>_<METRIC_NAME>_<timerStatName>
is the complete metric name; see the "Counter" and "BasicTimer" sections above.
Releases
- Decide what kind of version bump is necessary, based on Semantic Versioning conventions. In the items below, the version number you select will be referred to as
x.y.z
. - Update the pom.xml, changing the version element to
<version>x.y.z-SNAPSHOT</version>
. Note the-SNAPSHOT
suffix. - Make your code changes, including unit tests. This package requires 100% unit test code coverage for the build to succeed.
- Update the ReleaseNotes.md file with details of your changes.
- Create a pull request with your changes.
- Ask for a review of the pull request; when it is approved, the Travis CI build will upload the resulting jar file to the SonaType Staging Repository
- Tag the build with the version number: from a command line, executed in the root directory of the project:
git tag x.y.z
git push --tags
This will cause the jar file to be released to the SonaType Release Repository