NETWAYS
Graphite + Grafana
Time Series Metrics
Version: 2.1.1

~~~SECTION:MAJOR~~~ Performance Graphing

NETWAYS

1 Performance Graphing

~~~SECTION:MAJOR~~~ Performance Graphing

NETWAYS

What it is

  • All about numbers and time
  • Collect and store data
  • View the past
  • Predict trends
  • View the state of something

~~~SECTION:MAJOR~~~ Performance Graphing

NETWAYS

What it is not

  • Alerting in any way
  • Telling you what the problem is
  • Telling you how to solve it
  • You are the interpreter

~~~SECTION:MAJOR~~~ Why we need graphs

NETWAYS

2 Why we need graphs

~~~SECTION:MAJOR~~~ Why we need graphs

NETWAYS

Why Operators need it

  • State of single server or application
  • Correlate the state of multiple servers
  • Network usage
  • Trace problems
  • Transparancy of the data center
    • Load
    • Disk
    • Processes
    • Memory
    • ...

~~~SECTION:MAJOR~~~ Why we need graphs

NETWAYS

Why Devs need it

  • Debugging
  • Application profiling
  • Trace performance issues
  • Follow impacts of application changes
  • Follow impacts of growing usage

~~~SECTION:MAJOR~~~ Why we need graphs

NETWAYS

Why Managers need it

  • Service Level Agreements (SLAs)

~~~SECTION:MAJOR~~~ Why we need graphs

NETWAYS

Why all of us need it

  • Capacity Management
    • Storage
    • Network
    • Temperature
    • Consumption of electricity
    • Rack space
    • ...

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

3 Graphite

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Graphite Overview

  • Store numeric time-series data
  • Render graphs of this data on demand
  • No collection of data
  • Written in Python

Graphite is not just a single product, it consists of multiple components which work together to build a complete performance monitoring solution.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

3.1 Basic Components

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Components of Graphite

Graphite is a scalable system which provides realtime graphing. Graphite was originally developed by Chris Davis from orbitz.com, where it was used to visualize business-critical data. Graphite is not a single application, it consists of multiple components which together provide a fully functional performance monitoring solution.

Parts of Graphite:

  • Carbon
  • Whisper
  • Graphite-Web

Graphite was published in 2008 under the "Apache 2.0" license.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Cache

  • Accepts metrics over TCP or UDP
  • Various protocols
    • Plaintext
    • Python pickle
    • AMQP
  • Caches metrics and writes data to disk
  • Provides query port "Carbonlink" (in-memory metrics)

Carbon Cache accepts metrics and provides a mechanism to cache those for a defined amount of time. It uses the underlying Whisper libraries to store permanently to disk. In a growing environment with more I/O a single carbon-cache process may not be enough. To scale you can simply spawn multiple Carbon Caches.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Graphite-Web

  • Webinterface (Django)
  • Render graphs
  • Save graphs in dashboards
  • Share graphs

Graphite-Web is the visualizing component. To create graphics, it obtains the data simultaneously from the related Whisper files and the Carbon Cache. Graphite-Web combines data points from both sources and returns a single image. By doing this it ensures that data always can be shown in real time, even if some data points are not written yet into Whisper files and therefore written on the hard drive.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

3.2 Single-Node Setup

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Single-Node Setup

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

3.3 Virtual Machines

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Training VMs

  • VirtualBox
  • Host-only network (192.168.56.0/24)
  • "graphing1.localdomain" is the primary training VM
  • "graphing2.localdomain" is used for Cluster setup
Instance IP Login
graphing1.localdomain 192.168.56.101 training/netways (sudo su)
graphing2.localdomain 192.168.56.102 training/netways (sudo su)

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Base Linux Installation

  • CentOS 7
  • Systemd as init system
  • Firewalld (Stopped)
  • SELinux (Permissive)
  • EPEL repository for additional packages

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

3.4 Graphite Installation

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Installation methods

  • Installation from source
  • Installation from PyPI (Python Package Index) via pip
  • Installation via binary packages
    • Most common operating systems
  • Installation in isolated Python environment with Virtualenv
  • Script based installation with Synthesize (Vagrant)
  • Installation with configuration management tools
    • Puppet
    • Ansible
    • Chef
    • Salt

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Installation via Puppet

  • Configuration management solution
  • Supports multiple operating systems
    • RedHat, CentOS, Debian, Ubuntu, SLES, Oracle Linux, ...
  • Covers all common Graphite options
  • Fully automated installation of all components
  • Optional: Installation of webserver

Puppet module: https://forge.puppetlabs.com/dwerder/graphite

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

PyPI vs. Binary Package Installation

PyPI:

  • All versions and features available
  • More installation and configuration effort
  • More flexibility
  • Harder to debug

Binary package:

  • Availability depends on the operating system
  • Older and more stable versions
  • Easier to install

In this course we've decided to do the installation from PyPI via pip. So the attendees will get a better understanding of the Graphite components and how they work together.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Prerequisites

To prepare the Graphite installation we need to install some required packages first:

# yum -y install python2-pip gcc
# yum -y install python-devel cairo-devel libffi-devel

Required packages for Graphite-Web:

# yum -y install python-scandir mod_wsgi httpd
# yum -y install dejavu-sans-fonts dejavu-serif-fonts

An exported shell variable will simplify the navigation when copying or moving files around:

# export GRAPHITE=/opt/graphite

Note: All package requirements for Graphite and Graphite-Web are already pre-installed on "graphing1.localdomain".

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Installation via PyPI

After all requirements are fulfilled, the installation of the Graphite components via PyPI is pretty simple:

# pip install carbon==1.1.3
# pip install whisper==1.1.3
# pip install graphite-web==1.1.3

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Python Packages

Due to a bug in Carbon and Graphite-Web >= 1.0.0 Python packages are not stored correctly, so we create symlinks as workaround:

# ln -s $GRAPHITE/lib/carbon-1.1.3-py2.7.egg-info/ \
/usr/lib/python2.7/site-packages/
# ln -s /opt/graphite/webapp/graphite_web-1.1.3\
-py2.7.egg-info/ /usr/lib/python2.7/site-packages/

Finally pip should list the installed Graphite packages:

# pip list
...
carbon (1.1.3)
graphite-web (1.1.3)
whisper (1.1.3)

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Cache Configuration

All variations of the Carbon daemon (Cache, Aggregator and Relay) require some basic configuration. The PyPI packages provide for each configuration file at least one example. To be able to start Carbon Cache we need to copy at least two config files.

# cp $GRAPHITE/conf/carbon.conf.example \
$GRAPHITE/conf/carbon.conf

carbon.conf includes the basic configuration for all Carbon daemons (Cache, Aggregator and Relay). Options like IP's and ports to bind and some other settings are located here.

# cp $GRAPHITE/conf/storage-schemas.conf.example \
$GRAPHITE/conf/storage-schemas.conf

storage-schemas.conf includes configuration about the storage retention of metrics. More details about this will follow.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Graphite Tags

Since 1.1.x Graphite supports storing data using tags to identify each series. Those tag informations are stored in the tag database (TagDB). It is enabled by default and uses the Graphite-Web database backend, but it can also be configured to use an external Redis server or a custom plugin.

We disable tag support right now by changing ENABLE_TAGS to False.

File: /opt/graphite/conf/carbon.conf

ENABLE_TAGS = False

Note: When using tag support it is mandatory to set the correct GRAPHITE_URL for Graphite-Web. It is also recommended not to use the default SQLite backend of the Webapp.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Cache Daemon

It's time to start the Carbon Cache daemon for the first time:

# $GRAPHITE/bin/carbon-cache.py status
carbon-cache (instance a) is not running

# $GRAPHITE/bin/carbon-cache.py start
Starting carbon-cache (instance a)

# $GRAPHITE/bin/carbon-cache.py status
carbon-cache (instance a) is running with pid 2344

Note: With this method Carbon Cache daemon will not be started after a reboot of the system.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Cache Service Unit

File: /etc/systemd/system/carbon-cache-a.service

[Unit]
Description=Graphite Carbon Cache (instance a)
After=network.target

[Service]
Type=forking
StandardOutput=syslog
StandardError=syslog
ExecStart=/opt/graphite/bin/carbon-cache.py \
  --instance=a \
  --config=/opt/graphite/conf/carbon.conf \
  --pidfile=/var/run/carbon-cache-a.pid \
  --logdir=/var/log/carbon/ start
ExecReload=/bin/kill -USR1 $MAINPID
PIDFile=/var/run/carbon-cache-a.pid

[Install]
WantedBy=multi-user.target

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Start Carbon Cache Daemon

Start Carbon Cache daemon with systemd:

# $GRAPHITE/bin/carbon-cache.py stop

# systemctl daemon-reload
# systemctl start carbon-cache-a.service
# systemctl enable carbon-cache-a.service

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Graphite-Web Configuration

Graphite-Web needs some more configuration. In addition to a working Apache 2 virtual host, the WSGI script and a basic configuration is required.

# cp $GRAPHITE/examples/example-graphite-vhost.conf \
/etc/httpd/conf.d/graphite-web.conf

# cp $GRAPHITE/conf/graphite.wsgi.example \
$GRAPHITE/conf/graphite.wsgi

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Graphite-Web Configuration

local_settings.py includes all configuration for the webapp. Here you can enable a memcache daemon and configure multiple Carbon Cache backends. For now we only need to set a secret key which is needed for the database initialization and adjust the time zone.

# cp $GRAPHITE/webapp/graphite/local_settings.py.example \
$GRAPHITE/webapp/graphite/local_settings.py

File: /opt/graphite/webapp/graphite/local_settings.py

SECRET_KEY = 'random-string'

TIME_ZONE = 'Europe/Berlin'

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Graphite-Web MySQL Database

Graphite-Web will use a SQLite database per default, but it can be changed to PostgreSQL, MySQL or Oracle. Here's an example for MySQL:

CREATE DATABASE graphite;

GRANT ALL PRIVILEGES ON graphite.* TO 'graphite'\
@'localhost' IDENTIFIED BY 'graphite';

File: /opt/graphite/webapp/graphite/local_settings.py

DATABASES = {
    'default': {
        'NAME': 'graphite',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'graphite',
        'PASSWORD': 'graphite',
        'HOST': 'localhost',
        'PORT': '3306'
    }
}

Note: Package MySQL-python is already pre-installed on "graphing1.localdomain".

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Graphite-Web Database Initialization

The database needs to be initialized:

# PYTHONPATH=$GRAPHITE/webapp django-admin.py \
migrate --settings=graphite.settings --run-syncdb

After the initialization we should create a user with admin rights. This user can later be used to login to Graphite-Web and store graphs and dashboards.

# PYTHONPATH=$GRAPHITE/webapp django-admin.py \
createsuperuser --settings=graphite.settings

Username (leave blank to use 'root'):
Email address:
Password: ******
Password (again): ******
Superuser created successfully.

Note: The users' password can be changed with subcommand changepassword.


Show all Django subcommands:

# PYTHONPATH=$GRAPHITE/webapp django-admin.py help \
--settings=graphite.settings

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Graphite-Web Verification

Django's command-line utility also provides subcommands to check and test the installed components:

# PYTHONPATH=$GRAPHITE/webapp django-admin.py check \
--settings=graphite.settings

# PYTHONPATH=$GRAPHITE/webapp django-admin.py test \
--settings=graphite.settings

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Graphite-Web Permissions

The SQLite database and the webapp logs are located in the storage directory, therefore we change the owner to apache:

# chown apache:root $GRAPHITE/storage
# chown apache:apache $GRAPHITE/storage/graphite.db

It's also important to change the permissions for the log directory, otherwise you will get the error "populate() isn't reentrant" later:

# chown -Rf apache:root $GRAPHITE/storage/log

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Apache Configuration

The default HTTP name for Graphite-Web is graphite and configured in graphite-web.conf. In order to access Graphite-Web we have to create the static files and allow web access to the directory:

# PYTHONPATH=$GRAPHITE/webapp django-admin.py \
collectstatic --noinput --settings=graphite.settings

File: /etc/httpd/conf.d/graphite-web.conf

<Directory /opt/graphite/static>
    Require all granted
</Directory>

Finally we can restart the pre-installed Apache webserver:

# systemctl restart httpd.service

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Installation Verification

To prove if everything worked open your browser with the default url of Graphite-Web: http://graphite

However it's always recommended to have a look at the logs!

  • Graphite-Web: /opt/graphite/storage/log/webapp/
  • Carbon Cache: /var/log/carbon/
  • Apache: /var/log/httpd/
  • Whisper Files: /opt/graphite/storage/whisper/

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Alternative EPEL Package Installation (1/2)

Graphite 0.9.16 on CentOS 7:

# yum -y install python-carbon python-whisper
# yum -y install graphite-web

# systemctl enable carbon-cache.service --now

File: /etc/graphite-web/local_settings.py

SECRET_KEY = 'random-string'

TIME_ZONE = 'Europe/Berlin'

# python /usr/lib/python2.7/site-packages/\
graphite/manage.py syncdb

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Alternative EPEL Package Installation (2/2)

File: /etc/httpd/conf.d/graphite-web.conf

<Directory "/usr/share/graphite">
  <IfModule mod_authz_core.c>
    # Apache 2.4
    # Require local
    Require all granted
  </IfModule>
  ...
</Directory>

# chown apache:apache /var/lib/graphite-web/graphite.db 
# systemctl enable httpd.service --now

Graphite-Web: http://graphite-web

Config Files: /etc/carbon/ and /etc/graphite-web/

Log Files: /var/log/carbon/ and /var/log/httpd/

Whisper Files: /var/lib/carbon/whisper/

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

3.5 Data Flow

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Receive Data

To receive metrics, Graphite provides by default two interfaces. On Port 2003 Carbon is listening with a plain text protocol, on port 2004 with the so-called "Pickle protocol".

While the plain text protocol is pretty simple "<metric.path>.<value>.<timestamp>", the Pickle protocol is more complex and looks more like a multidimensional array. The advantage of the plain text protocol is its simplicity, the Pickle protocol instead is more efficient. In addition, multiple metrics can be transferred in a bulk.

# echo "localhost.tmp.files `ls /tmp | wc -l` `date +%s`"
localhost.tmp.files 9 1522237082

# echo "localhost.tmp.files `ls /tmp | wc -l` `date +%s`" \
| nc localhost 2003

Tags must be appended to the metrics path with semicolon: ";<tag-key>=<tag-value>"

# echo "localhost.tmp.files;os=linux;dist=centos \
`ls /tmp | wc -l` `date +%s`"
localhost.tmp.files;os=linux;dist=centos 9 1522237082

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Cache Datapoint Flow

  • Datapoint arrives at Carbon Cache
  • According to its metric a queue is created
    • If the queue already exists, datapoint is added to it
    • One queue represents one metric path, hence one Whisper file
  • A writer thread periodically writes down all the datapoints
    • There are 4 different algorithms for writing down datapoints
    • All datapoints of one queue are written at once to the corresponding Whisper file: update_many()

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Cache Write Algorithms (1/2)

The thread that writes metrics to disk can use on of the following strategies determining the order in which metrics are removed from cache and flushed to disk. This setting can be adjusted in carbon.conf with CACHE_WRITE_STRATEGY in the [cache] section.

Algorithm Description
sorted (default) All metrics in the cache will be counted and an ordered list of them will be sorted according to the number of datapoints in the cache at the moment of the list's creation. Metrics will then be flushed from the cache to disk in that order.
timesorted All metrics in the list will be looked at and sorted according to the timestamp of there datapoints. The metric that were the least recently written will be written first. This is an hybrid strategy between max and sorted which is particularly adapted to sets of metrics with non-uniform resolutions.
max The writer thread will always pop and flush the metric from cache that has the most datapoints. This will give a strong flush preference to frequently updated metrics and will also reduce random file-io. Infrequently updated metrics may only ever be persisted to disk at daemon shutdown if there are a large number of metrics which receive very frequent updates OR if disk i/o is very slow.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Cache Write Algorithms (2/2)

Algorithm Description
naive Metrics will be flushed from the cache to disk in an unordered fashion. This strategy may be desirable in situations where the storage for whisper files is solid state, CPU resources are very limited or deference to the OS's i/o scheduler is expected to compensate for the random write pattern.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Internal Statistics (1/2)

Metric Meaning
activeConnections Number of active connections. (>=1.0.0)
avgUpdateTime The average amount of time spent per Whisper update operation.
blacklistMatches Number of blacklist matches.
cache.bulk_queries Number of bulk queries to the carbon-cache instance.
cache.overflow Number of datapoints received while the cache was full.
cache.queries Number of all cache queries received by the cache from the webapp.
cache.queues Number of metric keys (metric name) in the cache at the time of recording.
cache.size Number of metric datapoints in the cache at the time of recording.
committedPoints Number of metric datapoints flushed to disk.
cpuUsage CPU usage of the carbon-cache instance.
creates Number of Whisper files successfully created.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Internal Statistics (2/2)

Metric Meaning
droppedCreates Number of failed Whisper create operations. (>=1.0.0)
errors Number of failed Whisper update operations.
memUsage Memory usage of the carbon-cache daemon.
metricsReceived Number of datapoints received by the carbon-cache listener.
pointsPerUpdate Average number of datapoints written per Whisper update operation. The higher the value, the more effective the cache is performing at batch writes (fewer I/O operations).
updateOperations Number of successful Whisper update operations.
whitelistRejects Number of whitelist rejects.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

3.6 Data Storage

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Whisper

  • File based time-series database
  • Fixed size (like RRD)
  • Values paired with timestamps
  • Rollup aggregation
  • Multiple archives

Carbon Cache uses Whisper to store the received datapoints. Each datapoint gets paired with its linked timestamp. Data can be stored into multiple archives, where one archive describes the precision and retention of this data.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Whisper - Anatomy

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Whisper vs. RRD

In a different way from RRD, Whisper supports updates to time slots with timestamps prior to its most recent one. This means that there is no way with RRD to back-fill data properly.

Whisper is slower than RRD, but fast enough for most purposes. This is the consequence of Whisper being written in Python, where RRD is written in C. The performance is in theory, depending on the operation, 2 to 5 times slower.

In practice the difference is measured in hundreds of microseconds which leads to less than a millisecond difference for most operations. Anyway, storing time-series data always causes high I/O on your disk. Using Carbon Relay you can distribute this load to multiple servers.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Metric Path

The metric path determines the hierarchy in which data is being held and can be seen as an address of your data. Each element of the metric path describes also a directory on the filesystem where Whisper files are stored. The last element is the filename.

berlin.dc1.r12.server1.load.longterm
├── berlin
│   └── dc1
│       └── r12
│           ├── server1
│           │   └── load
│           │       └── longterm.wsp
│           │       └── midterm.wsp
│           │       └── shortterm.wsp
│           ├── server2

With Graphite-Web metrics can be accessed by using globs (wildcards or character lists) in the metric path. Graphite-Web will then return the datapoints of all matching metrics.

berlin.*.r12.server1.load.longterm
berlin.dc1.r12.server{1,2,3}.load.longterm

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Storage Schema

storage-schemas.conf stores the configuration about retention and frequency in which datapoints should be stored. The config file includes multiple sections which are applied from the top to the bottom. Based on patterns it matches metric paths and tells Whisper how to store the data. The first pattern that matches is being applied for the metric path, other sections are ignored.

  • Config file is read every 60 seconds, no need for reload
  • Patterns are regex
  • First pattern that matches is used
  • Patterns from top to bottom
  • Whisper file is created on first metric received

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Storage Schema Configuration

Each section has 3 lines:

  • name
  • pattern
  • frequency/retention (multiple)

Example storage schemas:

[carbon]
pattern = ^carbon\.
retentions = 60:90d

[default]
pattern = .*
retentions = 1s:30m,1m:1d,5m:7d

By aggregating data you can save I/O on your disks. Already created Whisper files will not be affected by configuration changes!

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Storage Aggregation

When downsampling data Whisper will do an average over a set of datapoints. This behaviour can be changed in storage-aggregation.conf. Like in other configuration files the entries are scanned from top to the bottom and the first match applies to the Whisper files.

  • Pattern
    • Regex pattern to match metric paths
  • xFilesFactor
    • Ratio of datapoints that are required to do an aggregation to the next archive (float between 0 and 1)
  • aggregationMethod
    • average, sum, min, max, or last

Default aggregation entry:

[default_average]
pattern = .*
xFilesFactor = 0.5 # 50 percent
aggregationMethod = average

This configuration doesn't affect the first archive and already created Whisper files will not be affected by configuration changes!

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Whisper File Size

Example storage schema:

1s:30m,1m:1d,5m:7d

Convert into seconds:

1s:30m=1s:(30*60)=1s:1800s
1m:1d=60s:(24*60*60)=60s:86400s
5m:7d=(5*60):(7*24*60*60)=300s:604800s

12 bytes for each datapoint and 12 bytes of archive information for every retention period:

1800s/1s*12b+12b=21612b
86400s/60s*12b+12b=17292b
604800s/300s*12b+12b=24204b

21612b+17292b+24204b=63108b

16 bytes of database metadata:

63108b+16b=63124b/1024=61.64kb

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Whisper File Size Calculator

Online calculator: http://m30m.github.io/whisper-calculator/


There's also a script for the calculation of the Whisper file size available: https://gist.github.com/jjmaestro/5774063

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Whisper Tools (1/2)

Whisper comes with some default tools. With those it is possible to create, edit and view Whisper files.

Tool Description
rrd2whisper.py  Convert a rrd file into a whisper (.wsp) file
whisper-auto-resize.py Resize archives with default settings (>=1.0.0)
whisper-auto-update.py  Update values (>=1.0.0)
whisper-create.py Create a new Whisper file
whisper-diff.py Differences beetwen two Whisper files
whisper-dump.py Dump raw Whisper data
whisper-fetch.py Dump readable (timestamp + value) data from Whisper files
whisper-fill.py Backfill datapoints from one whisper file into another

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Whisper Tools (2/2)

Tool Description
whisper-info.py Get metadata
whisper-merge.py Merge two Whisper files
whisper-resize.py Resize archives with individual settings
whisper-set-aggregation-method.py Change aggregation method
whisper-set-xfilesfactor.py  Change xFilesFactor (>=1.0.0)
whisper-update.py Update value

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Common Whisper Tools Usage (1/2)

# whisper-fetch.py --pretty metricsReceived.wsp
...
Fri Jun  2 08:53:00 2017    976.000000
Fri Jun  2 08:54:00 2017    1001.000000
Fri Jun  2 08:55:00 2017    1011.000000

# whisper-info.py metricsReceived.wsp 
maxRetention: 7776000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 1555228

Archive 0
retention: 7776000
secondsPerPoint: 60
points: 129600
size: 1555200
offset: 28

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Common Whisper Tools Usage (2/2)

# whisper-resize.py --nobackup metricsReceived.wsp 60:120d
Retrieving all data from the archives
Creating new whisper database: metricsReceived.wsp.tmp
Created: metricsReceived.wsp.tmp (1555228 bytes)
Migrating data without aggregation...
Renaming old database to: metricsReceived.wsp.bak
Renaming new database to: metricsReceived.wsp

# whisper-create test.wsp 1s:30m 1m:1d 5m:7d
Created: test.wsp (63124 bytes)

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

White- and Blacklisting

When this feature is enabled, each Carbon daemon will only accept metrics that are whitelisted and reject those which are blacklisted.

Each file, whitelist.conf and blacklist.conf takes one regular expression per line. If whitelist.conf does not exist, everything is whitelisted by default. Configuration is reloaded automatically, no restart of the daemon is necessary.

To enable the functionality, edit carbon.conf:

USE_WHITELIST = True

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

3.7 Advanced Components

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Relay

  • Forward incoming metrics to another Carbon daemon
  • Replicate metrics to one or more destinations
  • Sharding
  • Forward based on consistent hash (default) or
  • Rule based routing

Carbon Relay is a kind of "loadbalancer" for Carbon Cache and/or Aggregator. It can forward metrics based on consistent hashes or with defined regex rules.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Aggregator

  • In front of Carbon Cache
  • Buffers metrics
  • Aggregation of metrics by rules
  • Reduce I/O by aggregating data

Carbon Aggregator sits in front of Carbon Cache and receives metrics. The function of this daemon is to aggregate the data it receives and forwards it to Carbon Cache for permanent storage. For instance it can sum statistics of multiple servers into one metric.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

3.8 Full Single-Node Setup

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Full Single-Node Setup

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Daemon Configuration

All Carbon daemons are configured in carbon.conf where each daemon gets its own section.

File: /opt/graphite/conf/carbon.conf

[cache]
LINE_RECEIVER_PORT = 2023         # default: 2003
PICKLE_RECEIVER_PORT = 2024       # default: 2004
CACHE_QUERY_PORT = 7002

[relay]
LINE_RECEIVER_PORT = 2003         # default: 2013
PICKLE_RECEIVER_PORT = 2004       # default: 2014

RELAY_METHOD = consistent-hashing # default: rules
DESTINATIONS = 127.0.0.1:2014     # default: :2004

[aggregator]
LINE_RECEIVER_PORT = 2013         # default: 2023
PICKLE_RECEIVER_PORT = 2014       # default: 2024

DESTINATIONS = 127.0.0.1:2024     # default: :2004

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Relay Configuration

Take care of relay-rules.conf:

# cp $GRAPHITE/conf/relay-rules.conf.example \
$GRAPHITE/conf/relay-rules.conf

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Relay Methods

The RELAY_METHOD defines how metrics are distributed:

  • consistent-hashing
    • Even distribution of metrics between destinations
  • aggregated-consistent-hashing
    • Send to a group of Carbon Aggregators
  • rules
    • Route metrics to destinations based on pattern rules
    • Requires relay-rules.conf

Example rule in /opt/graphite/conf/relay-rules.conf with rules RELAY_METHOD:

[collectd_dc1]
pattern = ^collectd\.dc1\.
destinations = 127.0.0.1:2104:a

[collectd_dc2]
pattern = ^collectd\.dc2\.
destinations = 127.0.0.1:2204:b

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Relay Service Unit

File: /etc/systemd/system/carbon-relay.service

[Unit]
Description=Graphite Carbon Relay
After=network.target

[Service]
Type=forking
StandardOutput=syslog
StandardError=syslog
ExecStart=/opt/graphite/bin/carbon-relay.py \
  --instance=a \
  --config=/opt/graphite/conf/carbon.conf \
  --pidfile=/var/run/carbon-relay.pid \
  --logdir=/var/log/carbon/ start
ExecReload=/bin/kill -USR1 $MAINPID
PIDFile=/var/run/carbon-relay.pid

[Install]
WantedBy=multi-user.target

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Start Carbon Relay Daemon

Start Carbon Relay daemon with systemd:

# systemctl restart carbon-cache-a.service

# systemctl daemon-reload
# systemctl start carbon-relay.service
# systemctl enable carbon-relay.service

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Aggregator Configuration

Take care of aggregation-rules.conf:

# cp $GRAPHITE/conf/aggregation-rules.conf.example \
$GRAPHITE/conf/aggregation-rules.conf

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Aggregation Rules

When using Carbon Aggregator, the most important configuration is made in aggregation-rules.conf. The file accepts input patterns for metrics and is able to merge multiple incoming metrics to one final metric which is then written as single Whisper file.

Each line of the configuration should look like this:

output_template (frequency) = method input_pattern

This will capture any metric that matches input_pattern. Every frequency seconds it will calculate the destination metric using the specified method which can be sum or avg. The config also accepts placeholders.

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Aggregation Rules Example

Here is an example of an Apache environment. Goal is to track requests from all application servers and store the sum in a single metric:

<env>.applications.<app>.all.requests (60) 
  = sum <env>.applications.<app>.*.requests

The result is, that metrics matching the pattern will get summed each 60 seconds and written to one single destination:

# input
prod.applications.apache.www01.requests
prod.applications.apache.www02.requests
prod.applications.apache.www03.requests

# output
prod.applications.apache.all.requests

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Carbon Aggregator Service Unit

File: /etc/systemd/system/carbon-aggregator.service

[Unit]
Description=Graphite Carbon Aggregator
After=network.target

[Service]
Type=forking
StandardOutput=syslog
StandardError=syslog
ExecStart=/opt/graphite/bin/carbon-aggregator.py \
  --instance=a \
  --config=/opt/graphite/conf/carbon.conf \
  --pidfile=/var/run/carbon-aggregator.pid \
  --logdir=/var/log/carbon/ start
ExecReload=/bin/kill -USR1 $MAINPID
PIDFile=/var/run/carbon-aggregator.pid

[Install]
WantedBy=multi-user.target

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Start Carbon Aggregator Daemon

Start Carbon Aggregator daemon with systemd:

# systemctl daemon-reload
# systemctl start carbon-aggregator.service
# systemctl enable carbon-aggregator.service

~~~SECTION:MAJOR~~~ Graphite

NETWAYS

Rewrite Rules

With rewrite rules, metric paths can be rewritten before Whisper files are created. This is handy when your collector sends metric paths which are not accurate. The configuration for rewrite rules takes place in rewrite-rules.conf. This functionality is currently only available for Carbon Aggregator.

A rewrite rule should look like as follows:

regex-pattern = replacement-text

These are some rewrite rules commonly used when collectd sends metrics:

[pre]
\.load\.load\. = .load.
\.memory\.memory\. = .memory.
\.mysql\.stats\. = .mysql.
_TCP80 = .TCP80

Rewrite rules consist of two sections [pre] and [post]. Rules in the pre section are applied before aggregation and rules in the post section after aggregation takes place.

~~~SECTION:MAJOR~~~ collectd

NETWAYS

4 collectd

~~~SECTION:MAJOR~~~ collectd

NETWAYS

collectd

  • Collects data
  • Variety of ways to store data
    • RRD
    • OpenTSDB
    • HTTP
    • Redis
    • Graphite (InfluxDB)
    • ...
  • Wide choice of plugins
    • CPU, Memory, Load, Network, ...
    • MySQL, PostgreSQL, Apache, Bind, ...
    • cURL, Exec, Perl, Python, Java, ...

~~~SECTION:MAJOR~~~ collectd

NETWAYS

collectd

collectd is a daemon that runs on your server and collects periodically performance data about several parts of the system. One of many plugins is "write_graphite" which lets you send those collected data to a Graphite server.

collectd provides a lot of plugins by default. Each plugin serves a specific set of data and mostly can be configured to fit your needs. Some plugins differ from others, for example there are plugins which are just for forwarding the statistics to receivers like Graphite. There are also plugins which enable you to write your own plugins in languages like Perl, Python or Java.

With the included SNMP support collectd stays not limited to the host it's running. You can get performance counters of the network activity from switches, routers or other devices that are capable of SNMP.

~~~SECTION:MAJOR~~~ collectd

NETWAYS

collectd Installation

collectd can be installed via packages. You should take care that you install a version >= 5.0, because the Graphite plugin in earlier versions had to be configured in a different way than it is described here.

# yum -y install collectd

After the installation you should edit /etc/collectd.conf. It should include the following content:

Hostname            "graphing1" # default: FQDN
FQDNLookup          false       # default: true
Interval            10
MaxReadInterval     86400
Timeout             2
ReadThreads         5
WriteThreads        5
Include             "/etc/collectd.d"

Start collectd with systemd:

# systemctl enable collectd.service --now

Note: collectd is already pre-installed on "graphing1.localdomain".

~~~SECTION:MAJOR~~~ collectd

NETWAYS

collectd Plugins

Plugins are enabled with the phrase LoadPlugin <plugin>. You can find a list of all collectd plugins here: https://collectd.org/wiki/index.php/Table_of_Plugins. This page also includes the documentation of each plugin.

By creating config files inside /etc/colletcd.d/ the configuration of plugins will be way more tidy than configuring them all in one single file.

~~~SECTION:MAJOR~~~ collectd

NETWAYS

collectd DF Plugin

This example shows the content of the "DF" plugin, all other plugins are configured in the same manner.

File: /etc/collectd.d/df.conf

LoadPlugin df

<Plugin df>
  FSType "rootfs"
  FSType "xfs"
  IgnoreSelected false
  ReportByDevice false
  ReportInodes true
  ValuesAbsolute true
  ValuesPercentage false
</Plugin>

The configuration of plugins is in many cases optional. In some cases it's sufficient to just load the plugin without any configuration. Examples for this are "CPU", "Load" or "Memory".

~~~SECTION:MAJOR~~~ collectd

NETWAYS

Storage Schema for collectd

To store the data correctly, you need to configure a proper Carbon storage schema for data coming from collectd. The frequency which is configured here must be the same as configured in collectd.conf.

File: /opt/graphite/conf/storage-schemas.conf

[...]

[collectd]
pattern = ^collectd\.
retentions = 10s:5d

[...]

When LOG_CREATES in carbon.conf is set to True, you can follow the logging of Carbon Cache in /var/log/carbon/creates.log to track if your Whisper files are being generated with the correct storage schema.

new metric collectd.graphing1.interface-enp0s3.\
  if_dropped.tx matched schema collectd
new metric collectd.graphing1.interface-enp0s3.\
  if_dropped.tx matched schema default

~~~SECTION:MAJOR~~~ collectd

NETWAYS

collectd Graphite Plugin

Sending the collected data to Graphite is not more than enabling and configuring the appropriate plugin.

File: /etc/collectd.d/write_graphite.conf

LoadPlugin write_graphite

<Plugin write_graphite>
  <Node "graphing1">
    Host "localhost"
    Port "2003"
    Protocol "tcp"
    ReconnectInterval 0
    LogSendErrors true
    Prefix "collectd."
    Postfix ""
    StoreRates true
    AlwaysAppendDS false
    EscapeCharacter "_"
    SeparateInstances false
    PreserveSeparator false
    DropDuplicateFields false
  </Node>
</Plugin>

~~~SECTION:MAJOR~~~ collectd

NETWAYS

Restart collectd Daemon

After configuration changes collectd needs to be restarted:

# systemctl restart collectd.service

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

5 Webapp

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

Graphite-Web

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

Graphite-Web

Graphite-Web is the visualization component of the Graphite stack. It consists mainly of 3 parts:

Composer

The composer is the first thing you see when opening Graphites web application. It enables you to browse through your metrics and display those. Multiple graph options can change the look and feel of graphs. For permanent usage templates can be stored (graphTemplates.conf) and applied to graphs. With plenty of functions you can combine, transform, calculate or filter datapoints. This functionality, which is also available for the API, makes Graphite-Web to one of the most powerful webinterfaces for graph visualization.

Dashboard

The dashboard brings the same functionality as the composer, but combines multiple graphs to one overview. When logged in, users can save dashboards for later usage. The behaviour of the dashboard can be adjusted in dashboard.conf.

API

Using the API you can retrieve data in different formats. This can be used for integration into third party tools or monitoring purposes. The API represents almost the same functionality as the composer.

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

Indexing

For faster access to metrics Graphite supports an index. This index is actually just a text file that includes all available metrics. It can be generated by a cronjob periodically.

Content of /opt/graphite/storage/index:

...
collectd.graphing1.cpu-0.cpu-wait
collectd.graphing1.cpu-0.cpu-user
collectd.graphing1.interface-lo.if_packets.rx
collectd.graphing1.interface-lo.if_packets.tx
collectd.graphing1.interface-lo.if_octets.tx
collectd.graphing1.interface-lo.if_octets.rx
...

An optional cronjob that creates the index each hour may look like this:

0 * * * * /opt/graphite/bin/build-index.sh > /dev/null

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

5.1 Graphite-Web API

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

HTTP API

The interactive webinterface of Graphite-Web is called Composer. Aside to this Graphite-Web also has a HTTP API which consists of two parts:

  • Render API
  • Metrics API

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

Render API

The Render API can be used to retrieve and visualize datapoints. The API can be used to embed graphs in own applications or webinterfaces. Mostly the API is used by 3rd party dashboard tools like Grafana.

  • Different formats:
    • png, raw, csv, json, svg, pdf, dygraph, rickshaw and pickle
  • Paths with Wildcards
  • Relative or absolute time periods
  • Template function with variables
  • Almost full functionality of Composer

Parameters: http://graphite.readthedocs.org/en/latest/render_api.html#graph-parameters

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

Render API - Sample 1

Create a simple graph of the load of one of your servers in the last hour. Output as PNG with the resolution of 800x600.

http://graphite/render
?target=collectd.graphing1.load.load.*
&from=-1h
&width=800
&height=600

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

Render API - Sample 2

Create a JSON output of the free diskspace from one of your servers. Output only the last 5 minutes.

http://graphite/render
?target=collectd.graphing1.df-root.df_complex-free
&from=-5min
&format=json

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

Render API - Sample 3

Build an average of the CPUs system time from two of your servers. Add a title to the graph and change the background color. Set a minimum of 0 and create an alias for the legend called "CPU system".

http://graphite/render
?target=
  alias
    (
      averageSeries
      (
        collectd.graphing[1-2].cpu-*.cpu-system
      ), 
    "CPU system"
    )
&title=CPU of all servers
&bgcolor=red
&yMin=0

~~~SECTION:MAJOR~~~ Webapp

NETWAYS

Metrics API

Graphite-Web supports some functionality to browse through metrics.

Function Description
/metrics/index.json Walks the metrics tree and returns every metric found as a sorted JSON array.
/metrics/find?query=a.b.c.d  Finds metrics under a given path.
/metrics/expand?query=a.b.c.d  Expands the given query with matching paths.

~~~SECTION:MAJOR~~~ Grafana

NETWAYS

6 Grafana

~~~SECTION:MAJOR~~~ Grafana

NETWAYS

Overview of Grafana

Grafana is an Open Source webinterface that lets you visualize data from a lot of different data sources. Currently offical supported data sources are Graphite, Elasticsearch, CloudWatch, InfluxDB, OpenTSDB, Prometheus, MySQL and Postgres.

Since version 2.0 Grafana ships with its own backend server and since 3.0 additional data sources can be installed as plugins and mixed in the same chart. A core feature introduced with Grafana 4.0 was alerting.

  • Visualize graphs
  • Create dashboards
  • Share
  • Annotations
  • Templates
  • Multiple backends
  • Alerts

~~~SECTION:MAJOR~~~ Grafana

NETWAYS

Grafana 5

Grafana v5.0 was released on March 1st 2018 and includes a lot of new features:

  • New dashboard layout engine
  • UI improvements in look and function
  • New light theme
  • Dashboard folders
  • Permissions
  • Group users into teams and use them for permissions
  • Setup data sources and dashboards via config files
  • Persistent dashboard url's
  • Graphite Tags
  • Integrated function docs

~~~SECTION:MAJOR~~~ Grafana

NETWAYS

Grafana Installation

On RPM-based Linux systems like CentOS, Grafana can be installed via source, YUM repository or using YUM directly.

File: /etc/yum.repos.d/grafana.repo

[grafana]
name=grafana
baseurl=https://packagecloud.io/grafana/stable/el/7/\
  $basearch
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packagecloud.io/gpg.key \
  https://grafanarel.s3.amazonaws.com/RPM-GPG-KEY-grafana
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

Install Grafana:

# yum -y install grafana

Note: Grafana is already pre-installed on "graphing1.localdomain".

~~~SECTION:MAJOR~~~ Grafana

NETWAYS

Grafana Server

Start and enable the Grafana server:

# systemctl start grafana-server.service
# systemctl enable grafana-server.service

After that Grafana should be accessible via HTTP on port 3000 and the administrative user admin with password admin.

Basic settings like the used database, authentication or logging settings can be changed in /etc/grafana/grafana.ini. Grafana is even capable of sending its internal metrics to Graphite.

~~~SECTION:MAJOR~~~ Grafana

NETWAYS

Grafana Graphite Data Source

The first step after installation is to add the Graphite data source:

~~~SECTION:MAJOR~~~ Grafana

NETWAYS

Grafana Dashboards

You can either create dashboards by yourself or import them from https://grafana.com/dashboards.
A good start with Graphite as backend is a dashboard called "Graphite Server (Carbon Metrics)" with ID "947".

~~~SECTION:MAJOR~~~ Grafana

NETWAYS

Custom Grafana Dashboards

To build your own dashboards with Grafana you should start with the "Getting started" chapter of the official documentation, available at http://docs.grafana.org/guides/getting_started/.

There's also a video with a 10min beginners guide from the main developer of Grafana, Torkel Ödegaard, and many other assistances available.

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

7 Graphite Cluster

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

7.1 Single-Node Cluster

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Graphite Single-Node Cluster

In this scenario Carbon Relay distributes metrics over 2 Carbon Caches. As the communication between all Carbon daemons, they all can be located on different hosts.

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Multiple Carbon Caches

All Carbon daemons are configured in carbon.conf where each daemon gets its own section.

Configure multiple caches in /opt/graphite/conf/carbon.conf:

[cache]
LINE_RECEIVER_PORT = 2023 
PICKLE_RECEIVER_PORT = 2024
CACHE_QUERY_PORT = 7002

[cache:b]
LINE_RECEIVER_PORT = 2123   # default: 2103
PICKLE_RECEIVER_PORT = 2124 # default: 2104
CACHE_QUERY_PORT = 7102

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Carbon Cache Service Unit for additional instance

File: /etc/systemd/system/carbon-cache-b.service

[Unit]
Description=Graphite Carbon Cache (instance b)
After=network.target

[Service]
Type=forking
StandardOutput=syslog
StandardError=syslog
ExecStart=/opt/graphite/bin/carbon-cache.py \
  --instance=b \
  --config=/opt/graphite/conf/carbon.conf \
  --pidfile=/var/run/carbon-cache-b.pid \
  --logdir=/var/log/carbon/ start
ExecReload=/bin/kill -USR1 $MAINPID
PIDFile=/var/run/carbon-cache-b.pid

[Install]
WantedBy=multi-user.target

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Start Carbon Cache Daemon for additional instance

Start Carbon Cache daemon for instance "b" with systemd:

# systemctl daemon-reload
# systemctl start carbon-cache-b.service
# systemctl enable carbon-cache-b.service

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Carbon Relay with Multiple Caches

Add Carbon Cache instances to Relay configuration in carbon.conf:

[relay]
LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_PORT = 2004

RELAY_METHOD = consistent-hashing
REPLICATION_FACTOR = 1

DESTINATIONS = 127.0.0.1:2024:a, 127.0.0.1:2124:b

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Graphite Single-Node Cluster with Graphite-Web

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Graphite-Web with Multiple Caches

Graphite-Web needs to be configured to query both Carbon Caches.

File: /opt/graphite/webapp/graphite/local_settings.py:

CARBONLINK_HOSTS = ["127.0.0.1:7002:a", \
"127.0.0.1:7102:b"]

Restart Apache:

# systemctl restart httpd.service

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Graphite Single-Node Cluster with Aggregator

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Carbon Relay with Aggregator

Change Relay configuration in carbon.conf:

[relay]
LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_PORT = 2004

RELAY_METHOD = consistent-hashing
REPLICATION_FACTOR = 1

DESTINATIONS = 127.0.0.1:2014

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Carbon Aggregator with Multiple Caches

Change Aggregator configuration in carbon.conf:

[aggregator]
LINE_RECEIVER_PORT = 2013
PICKLE_RECEIVER_PORT = 2014

FORWARD_ALL = True

DESTINATIONS = 127.0.0.1:2024:a, 127.0.0.1:2124:b

REPLICATION_FACTOR = 1

Restart Carbon Aggregator:

# systemctl restart carbon-aggregator.service

Note: The configuration for the Carbon Caches remains unchanged.

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Metric routing of Carbon Aggregator

The Aggregator does not have the same complement of routing methods as the Relay and supports only consistent-hashing. For rule-based routing an additional Relay beetwen Aggregator and Cache(s) is needed.

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Queues and Caches

Each Carbon daemon will start dropping or not accepting metrics if its queue or cache is full. The parameters MAX_QUEUE_SIZE and MAX_CACHE_SIZE are configured in carbon.conf in the proper section.

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Carbonate

Carbonate brings some usefull tools to handle different tasks in Graphite clusters, for example to redistribute datapoints manually when new nodes are introduced.

# pip install carbonate

The configuration of Carbonate is done in: /opt/graphite/conf/carbonate.conf

[main]
DESTINATIONS = 127.0.0.1:2024:a, 127.0.0.1:2124:b
REPLICATION_FACTOR = 1
SSH_USER = carbonate # optional

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Carbonate Tools

Tool Description
carbon-hosts Return the addresses for all nodes in a cluster
carbon-list List the metrics this carbon node contains
carbon-lookup Lookup where a metric lives in a carbon cluster
carbon-path Transform metric paths to (or from) filesystem paths
carbon-sieve Given a list of metrics, output those that belong to a node
carbon-stale Find and list potentially stale metrics
carbon-sync Sync local metrics using remote nodes in the cluster
whisper-aggregate Set aggregation for whisper-backed metrics this carbon instance contains
whisper-fill Backfill datapoints from one whisper file into another

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

7.2 Multi-Node Cluster

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Graphite Multi-Node Cluster

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Multiple Graphite-Web Instances

For failover purposes of Graphite-Web we have to edit CLUSTER_SERVERS.

File: /opt/graphite/webapp/graphite/local_settings.py

CLUSTER_SERVERS = ["192.168.56.101", "192.168.56.102"]

And reload Apache:

# systemctl reload httpd.service

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Multiple Whisper Directories

It's possible to use different storage directories for each Carbon Cache. We have to adjust the configuration in /opt/graphite/conf/carbon.conf:

[cache]
LOCAL_DATA_DIR = /opt/graphite/storage/whisper1/

[cache:b]
LOCAL_DATA_DIR = /opt/graphite/storage/whisper2/

Restart the Cache daemons:

# systemctl restart carbon-cache-a.service
# systemctl restart carbon-cache-b.service

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Multiple Whisper Directories with Graphite-Web

The configuration for Graphite-Web has to be changed in order to support different storage directories.

File: /opt/graphite/webapp/graphite/local_settings.py

STANDARD_DIRS = ["/opt/graphite/storage/whisper1", \
"/opt/graphite/storage/whisper2"]

Afterwards Apache requires a reload:

# systemctl reload httpd.service

~~~SECTION:MAJOR~~~ Graphite Cluster

NETWAYS

Graphite Multi-Node Cluster with collectd

The write_graphite plugin of collectd is able to put metrics to all relays, but we have to edit the configuration for that and to restart collectd afterwards.

File: /etc/collectd.d/write_graphite.conf

<Plugin write_graphite>
  <Node "graphing1">
    ...
  </Node>
  <Node "graphing2">
    Host "192.168.56.102"
    Port "2003"
    Protocol "tcp"
    ReconnectInterval 0
    LogSendErrors true
    Prefix "collectd."
    Postfix ""
    StoreRates true
    AlwaysAppendDS false
    EscapeCharacter "_"
    SeparateInstances false
    PreserveSeparator false
    DropDuplicateFields false
  </Node>
</Plugin>

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

8 StatsD

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Overview of StatsD

  • Receive metrics via network (UDP/TCP)
  • Aggregate metrics
  • Transport metrics to one or multiple backends
  • No rendering of data
  • No collection of data
  • Runs on Node.js platform

StatsD is most interesting for developers since it is very easy to embed code which sends metrics to a central StatsD.

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

StatsD and Graphite

StatsD acts like a front-end proxy to Graphite. The major task of StatsD is to listen for metrics and periodically flush results to Graphite. Values of metrics that are being sent to StatsD may change during the time between receivement and forward to Graphite.

Key concept:

  • Buckets
    • This represents one metric.
  • Values
    • The value of a metric. This may change during the duration of the metric in StatsD.
  • Flush
    • Writing collected data to a backend.

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

StatsD and Graphite

  1. Receive metrics
  2. Collect for defined timeperiod
  3. Aggregate properly
  4. Flush to Graphite

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

StatsD Installation

StatsD requires Node.js to run, this requirement is automatically fullfilled during the package installation:

# yum -y install statsd

Note: StatsD is already pre-installed on "graphing1.localdomain".

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

StatsD Configuration

For a simple environment you need only a small piece of configuration. For everything else default settings are sufficient.

Make sure that /etc/statsd/config.js contains the following content:

{
  graphitePort: 2003
, graphiteHost: "localhost"
, port: 8125
, backends: [ "./backends/graphite" ]
, graphite: {
    legacyNamespace: false
  }
}

To see all available settings, take a look into exampleConfig.js.

Start StatsD with systemd:

# systemctl start statsd.service

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Metric Types (1/3)

StatsD supports multiple types of metrics. Depending on the metric type datapoints a different aggregation method is applied.

Counting

A simple counter that adds a value to a bucket. On each flush interval the bucket gets written to Graphite and reset to 0.

gitcommits:1|c

Timing

The timing type comes in when measuring how long something took. For timers StatsD calculates automatically percentiles, average, standard deviation, sum, lower and upper bounds for each flush interval.

request:480|ms

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Metric Types (2/3)

Gauge

Arbitrary values that describe the current state of something. Gauge values are recorded as they are. By default, if a gauge metric does not change, StatsD will send the last value again to Graphite. This behaviour can be changed in the configuration. Gauge values can be increased or decreased instead of setting fixed values.

processes:250|g
processes:-20|g
processes:+10|g

Sets

When defining a set, StatsD counts all occurances of unique values between flush times. For example one could count the number of unique users logged in.

userid:200|s

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Metric Types (3/3)

Multi-Metric Packets

Multiple metrics can be sent to StatsD at once by seperating them with a newline. The size of one packet should not exceed the payload limit of your networks MTU.

gitcommits:1|c\nprocesses:250|g\nresponse:200|s

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Storage Schema for StatsD

The default flush interval of StatsD is 10 seconds. A proper storage schema needs to be created to handle this.

File: /opt/graphite/conf/storage-schemas.conf

[...]

[statsd]
pattern = ^stats
retentions = 10s:1d,1m:7d

[...]

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Storage Aggregation (1/2)

Since StatsD supports several aggregation methods those need to be handled by Graphite too.

Add the following to /opt/graphite/conf/storage-aggregation.conf. Keep in mind that patterns are applied from top to bottom and first match wins. You should keep the following rules above other rules.

[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.upper(_\d+)?$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum

...

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Storage Aggregation (2/2)

Second part of /opt/graphite/conf/storage-aggregation.conf:

...

[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.3
aggregationMethod = average

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Feed in your data

Everything should be ready now to accept your data. Before starting you should clear Whisper files that may have been created automatically with wrong storage schemas:

# rm -rf /opt/graphite/storage/whisper[1-2]/stats*

Feeding data through StatsD to Graphite is as simple as writing directly to Graphite:

# echo "mycounter:5|c" | nc -u -w1 localhost 8125
# echo "mygauge:230|g" | nc -u -w1 localhost 8125

By repeating the counter within the flush interval will increase the resulting counter. The gauge value will be forwarded as is.

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Admin Interface

StatsD brings a simple TCP interface. It can be used to monitor a running StatsD server.

To interact with it, at first connect

# nc localhost 8126

Available commands:

Command Description
stats Some stats about the running server.
counters A dump of all the current counters.
gauges A dump of all the current gauges.
timers A dump of the current timers.
delcounters Delete a counter or folder of counters.
delgauges Delete a gauge or folder of gauges.
deltimers Delete a timer or folder of timers.
health A way to set the health status of statsd.

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Client Libraries

For all common programing languages libraries exist that enable you to talk to StatsD and send datapoints. In addition, some applications support StatsD natively or with a plugin.

   
Node Java
Python Ruby
Perl PHP
Clojure Io
C CPP
.NET Go
Apache Varnish
PowerShell Browser
Objective-C ActionScript
Wordpress Drupal
Haskell

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Logstash Output Plugin

Logstash brings a built-in output plugin for StatsD. Here is an example on how to use it for Apache logs we covered in the Logstash section.

File: /etc/logstash/conf.d/filter.conf

output {
  statsd {
    host => "127.0.0.1"
    port => 8125
    increment  => ["apache.response.%{response}"]
    count => { "apache.bytes" => "%{bytes}" }
  } 
}

~~~SECTION:MAJOR~~~ StatsD

NETWAYS

Third Party Backends

Beside the default Graphite backend, there are also several other backends which are not maintained by the StatsD project.

   
amqp librato
atsd mongo
aws-cloudwatch monitis
node-bell netuitive
couchdb opentsdb
datadog socket.io
elasticsearch stackdriver
ganglia statsd
hosted graphite statsd http
influxdb statsd aggregation
instrumental warp10
jut zabbix
leftronic

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

9 InfluxDB

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Overview of InfluxDB

  • Time series data storage
  • Part of TICK Stack
  • Stores event data (exceptions, deploys, logins, ...)
  • SQL like query language
  • HTTP(S) API
  • No rendering or collection of data
  • Written in Go

Note: Unfortunately High-Availability and Clustering are features of InfluxEnterprise only.

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

TICK Stack

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

InfluxData Installation

InfluxData provides packages for most common used Unix operating systems.

File: /etc/yum.repos.d/influxdb.repo

[influxdb]
name = InfluxDB Repository - RHEL \$releasever
baseurl = https://repos.influxdata.com/rhel/\$releasever/\
  $basearch/stable
enabled = 1
gpgcheck = 1
gpgkey = https://repos.influxdata.com/influxdb.key

# yum -y install influxdb chronograf telegraf

Note: InfluxDB, Chronograf and Telegraf are already pre-installed on "graphing1.localdomain".

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

InfluxData Services

We have to start InfluxDB, Chronograf and Telegraf:

# systemctl start influxdb.service
# systemctl start chronograf.service
# systemctl start telegraf.service

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

InfluxDB Ports

InfluxDB opens a bunch of ports. Not all of them are used by default, but its good to know what they are useful for.

  • Default ports
    • 8086: HTTP(S) API for client-server communication
    • 8088: RPC service for backup and restore
    • 8888: Chronograf webinterface

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Chronograf Setup

Chronograf is available on "graphing1.localdomain" on port "8888". We have to setup a new connection to InfluxDB and the name of the Telegraf database.

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Chronograf Host List

Chronograf's Host List Menu shows metrics collected from Telegraf:

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Telegraf Plugins

  • Default data collection interval is 10s
  • > 70 input plugins
    • Docker, Kubernetes, MongoDB, Prometheus, Puppet Agent, StatsD, etc.
  • > 20 output plugins
    • Elasticsearch, Graphite, Graylog, Kafka, OpenTSDB, Prometheus, etc.

Telegraf provides a built-in output for Graphite.

File: /etc/telegraf/telegraf.conf

[[outputs.graphite]]
    servers = ["localhost:2003"]
    prefix = "telegraf"

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Telegraf Graphite Output Plugin

Care about a proper storage schema:

File: /opt/graphite/conf/storage-schemas.conf

[...]

[telegraf]
pattern = ^telegraf\.
retentions = 10s:5d

[...]

Restart Telegraf with systemd:

# systemctl restart telegraf.service

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

InfluxDB Data Model

name: cpu
time                   usage_system   host
----                   ------------   ----
04/12/18 02:11:30PM    0.35087        graphing1
04/12/18 02:11:40PM    0.30075        graphing2
04/12/18 02:11:50PM    0.25075        graphing1
  • Measurement: cpu
  • Field key: usage_system
  • Field values: 0.35087, 0.30075, 0.25075
  • Tag key: host
  • Tag values: graphing1, graphing2

Line protocol:

cpu,host=graphing1 usage_system=0.35087 1523539329
measurement,tag(s) field(s) timestamp

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Write Data Using HTTP API

The easiest method for adding datapoints to a InfluxDB database is by using the HTTP API. Most client libraries use this and its easy to build into custom applications. Here is an example how you could use InfluxDB as metric database.

# curl -i -XPOST http://localhost:8086/query \
--data-urlencode "q=CREATE DATABASE metrics"

# curl -i -XPOST 'http://localhost:8086/write?db=metrics' \
--data-binary 'cpu,host=graphing1,region=europe \
value=0.64 1523540001'

Multiple Points:

# curl -i -XPOST 'http://localhost:8086/write?db=metrics' \
--data-binary 'cpu,host=graphing2 value=0.67 \
cpu,host=graphing1,region=europe value=2.0 \
cpu,host=graping2 value=1.20 1422568543702900257'

Note: If points are provided without timestamp, the server's local timestamp is used.

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

HTTP Repsonses

  • 2xx: If your write request received HTTP 204 No Content, it was a success!
  • 4xx: InfluxDB could not understand the request.
  • 5xx: The system is overloaded or significantly impaired.

Example output after adding points:

HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: e28fee07-4cde-11e7-8083-000000000000
X-Influxdb-Version: 1.2.4
Date: Fri, 09 Jun 2017 06:43:02 GMT

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Query Data Using HTTP API

The output of queries is returned in JSON:

# curl -G 'http://localhost:8086/query?pretty=true' \
--data-urlencode "db=metrics" --data-urlencode \
"q=SELECT \"value\" FROM \"cpu\" WHERE \
\"region\"='us-west'"

Multiple queries can be stacked together with semicolon as delimiter:

# curl -G 'http://localhost:8086/query?pretty=true' \
--data-urlencode "db=metrics" --data-urlencode \
"q=SELECT \"value\" FROM \"cpu\" WHERE \
\"region\"='us-west'; SELECT count(\"value\") \
FROM \"cpu\" WHERE \"region\"='us-west'"

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

InfluxDB Shell

The InfluxDB Shell is part of every InfluxDB installation.

# influx
Connected to http://localhost:8086 version 1.5.1
InfluxDB shell version: 1.5.1

> CREATE DATABASE metrics

> USE metrics
Using database metrics

> INSERT cpu,host=graphing2,region=us-west value=0.64
> INSERT cpu,host=graphing1,region=europe value=0.23

> SELECT * FROM "cpu"
name: cpu
time       host      region  value
----       ----      ------  -----
1523540001 graphing1 us-west 0.64
1523540064 graphing1 europe  0.23

> quit

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

InfluxDB Listeners

  • HTTP API
    • Send Post request to with a proper body to InfluxDB to add datapoints to a database.
    • Default port: 8086
  • Graphite
    • Support for Carbon protocol, which Graphite uses for receiving metrics.
  • collectd
    • collectd has its own binary protocol to communicate with other collectd nodes.
    • Default port: 25826
  • OpenTSDB
    • InfluxDB supports both the telnet and HTTP OpenTSDB protocol.
    • Default port: 4242
  • UDP
    • Sending data in JSON format via UDP is supported.
    • Default port: 8089

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Prepare InfluxDB for collectd

Both, "CollectD" and "Graphite"-Service Plugins can be used to connect collectd to InfluxDB. In this case we want to use the "CollectD" Service Plugin and enable it in /etc/influxdb/influxdb.conf:

[[collectd]]
  enabled = true
  bind-address = ":25826"
  database = "collectd"

After a restart of InfluxDB the collectd database must be created manually:

# systemctl restart influxdb.service

# curl -i -XPOST http://localhost:8086/query \
--data-urlencode "q=CREATE DATABASE collectd"

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Connect collectd to InfluxDB

InfluxDB opens port 25826 after the restart, so we can connect collectd to send data to InfluxDB.

File: /etc/collectd.d/network.conf

LoadPlugin network

<Plugin network>
  Server "localhost" "25826"
</Plugin> 

And reload the collectd daemon afterwards:

# systemctl restart collectd.service

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Chronograf Data Explorer

With the Data Explorer included in Chronograf you can create, edit and delete databases. Also data can be explored by using the build in query language.

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Influx Query Language

Influx Query Language (InfluxQL) is InfluxDB’s SQL-like query language for interacting with data in InfluxDB:

  • Data exploration: SELECT, WHERE, GROUP BY, INTO
  • Schema exploration: SHOW
  • Data mangement: CREATE, DROP, DELETE, ALTER
  • InfluxQL continuous queries
  • InfluxQL functions: Aggregations, Selectors, Transformations, Predictors
  • InfluxQL mathematical operators

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

InfluxQL Examples (1/2)

Return a list of series for the specified database:

SHOW SERIES ON "collectd"

Return a list of measurements:

SHOW MEASUREMENTS ON "collectd"

Return a list of tag keys:

SHOW TAG KEYS ON "collectd"

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

InfluxQL Examples (2/2)

Load of one server:

SELECT "time", "host", "value" FROM "collectd".\
  "autogen"."load_shortterm"

SELECT * FROM "collectd"."autogen"."load_shortterm"
  WHERE "host" = 'graphing1.localdomain'

Select timeframe:

SELECT * FROM "collectd"."autogen"."load_shortterm"
  WHERE time > now() - 1h

SELECT * FROM load_shortterm
  WHERE time < '2018-03-18 12:50' 
  WHERE time > '2018-04-18 12:50'

Delete data:

DELETE FROM "collectd"."autogen"."load_shortterm"
  WHERE time > now() - 1h

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Grafana InfluxDB Data Source

Add the InfluxDB data source with the "collectd" database to Grafana:

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

InfluxGraph

InfluxGraph (formerly: Graphite-InfluxDB) is a storage plugin to use InfluxDB as a drop-in replacement data store to the Graphite query API

~~~SECTION:MAJOR~~~ InfluxDB

NETWAYS

Stop Services

We have to stop serval services due to limited system resources in our training environment:

# systemctl stop statsd.service
# systemctl stop telegraf.service
# systemctl stop chronograf.service
# systemctl stop influxdb.service

~~~SECTION:MAJOR~~~ OpenTSDB

NETWAYS

10 OpenTSDB

~~~SECTION:MAJOR~~~ OpenTSDB

NETWAYS

OpenTSDB

  • Time series database
  • Scalable
  • Distributed
  • HTTP API
  • No rendering of data
  • No collection of data
  • Based on HBase
  • Language: Java

~~~SECTION:MAJOR~~~ OpenTSDB

NETWAYS

OpenTSDB

OpenTSDB is a time series database based on Apaches HBase. With this underlying technology it is possible to distribute and scale data across a big amount of servers.

  • Time Series Daemon (TSD)
    • One or multiple daemons that talk to HBase
    • TSDs are independent from each other
  • HTTP API
    • User do not need to talk to HBase
  • Telnet interface
  • Simple built-in GUI
  • Commandline Tools

~~~SECTION:MAJOR~~~ OpenTSDB

NETWAYS

OpenTSDB Concept

~~~SECTION:MAJOR~~~ OpenTSDB

NETWAYS

OpenTSDB Metrics

OpenTSDB uses some kind of Graphites metric path combined with tags to identify datapoints:

<metric> <timestamp> <value> <tagk1=tagv1 ... tagkN=tagvN>
sys.cpu.user 1356998400 42.5 host=webserver01 cpu=0

~~~SECTION:MAJOR~~~ OpenTSDB

NETWAYS

OpenTSDB Collectors

To collect data several clients are available:

~~~SECTION:MAJOR~~~ OpenTSDB

NETWAYS

OpenTSDB Client Libraries

For communication with OpenTSDB multiple client libraries exist. Some of them can just pull data where other may read and write metrics.

  • R (read)
  • Erlang (write)
  • Ruby (read/write)
  • Go (read/write)
  • Python (write)
  • vert.x (write)

~~~SECTION:MAJOR~~~ OpenTSDB

NETWAYS

OpenTSDB Frontends

Beside the simple built-in GUI there are some other webinterfaces for OpenTSDB dashboards:

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

11 Icinga

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Icinga Overview

  • Active and passive monitoring
  • Wide choice of plugins
    • CPU, Load, Memory, Disk, ...
    • Dell, Hewlett Packard, Microsoft, VMware, Juniper, ...
  • Performance data included
  • Scalable
  • Alerting
  • Native Graphite support (since Icinga 2)

Icinga is actually a tool for availability monitoring, which tells you if your hosts or services have any problems. As a part of this monitoring most of the plugins are able to deliver also performance data.

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Icinga Checks

Icinga executes checks and evaluates results to trigger notifications or events. A spin-off product of these checks are performance data.

  • Checks run periodically
  • Checks can be
    • disk, load, memory, cpu, processes, ...
    • VMWare, Microsoft, Dell, HP, ...
    • MySQL, PostgreSQL, Apache2, NginX, ...
    • Many more
  • Full compatibility to Nagios plugins

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Forwarding Data with Icinga 2

One of Icinga 2's key features is the forwarding of stuff. Nearly any data can be forwarded to one or multiple backends.

  • Forward

    • Performance Data
    • Check results
    • Thresholds
    • Logs
  • Forward to

    • Graphite
    • InfluxDB/OpenTSDB
    • Graylog/Logstash
    • PNP4Nagios
    • Elasticsearch

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Training Environment

One of your training virtual machines already includes the following installation:

  • Icinga 2 Core
    • Monitoring plugins
    • IDO support (MariaDB Database)
  • Icinga Web 2

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Icinga Web 2

Icinga Web 2 is a webinterface for Icinga. You can view the current state of your hosts and services and check out historical data.

  • Create dashboards
  • Current and historical states
  • Groups
  • Contacts
  • Acknowledge problems
  • Set downtimes
  • Comments
  • Reschedule checks
  • Reporting

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Icinga Web 2

Icinga Web 2 is available on "graphing1.localdomain" under "/icingaweb2" with user icingaadmin and password icinga.

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Storage Schema for Icinga 2

Icinga 2 provides a storage schema with use for Graphite. Keep in mind that storage schemas are applied from the top to the bottom and the first match wins. How your storage schema looks like depends on your check intervals. By default the check interval is set to 1 minute.

File: /opt/graphite/conf/storage-schemas.conf

[...]

[icinga2_metadata]
pattern = ^icinga2\..*\.metadata\.
retentions = 1m:7d

[icinga2_perfdata]
pattern = ^icinga2\..*\.perfdata\.
retentions = 1m:2d,5m:10d,30m:90d,360m:4y

[...]

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Threshold Metrics

Many of the Icinga plugins provide configured thresholds in addition to the performance counters. This is very useful when creating graphs using this data.

With graphs that show thresholds one can see occuring problems on first sight.

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Icinga 2 Graphite Feature

To add Graphite support to Icinga 2 you need to enable and configure the proper feature.

# icinga2 feature enable graphite

Edit the configuration for the Graphite feature and enable at least sending of thresholds.

File: /etc/icinga2/features-enabled/graphite.conf

object GraphiteWriter "graphite" {
  host = "127.0.0.1"            // default
  port = 2003                   // default

  enable_send_thresholds = true // required
  enable_send_metadata = true   // optional
}

Validate the configuration and restart Icinga 2:

# icinga2 daemon -C
# systemctl restart icinga2.service

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Icinga 2 InfluxDB Feature

Enabling the Icinga InfluxDB feature is similar than the Graphite feature:

# icinga2 feature enable influxdb

File: /etc/icinga2/features-enabled/influxdb.conf

object InfluxdbWriter "influxdb" {
  host = "127.0.0.1"            // default
  port = 8086                   // default
  database = "icinga2"          // default
  ...

  enable_send_thresholds = true // optional
  enable_send_metadata = true   // optional
}

Validate the configuration and restart Icinga 2:

# icinga2 daemon -C
# systemctl restart icinga2.service

~~~SECTION:MAJOR~~~ Icinga

NETWAYS

Grafana Dashboard

With Icinga 2 as Collector you can import a dashboard called "Icinga2 with Graphite" provided by the Icinga Project with ID "56".

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

12 Integrations

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

12.1 Graph Monitoring

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Graph Monitoring

Icinga can be used to monitor nearly anything. It goes without saying that there is a way you can monitor also your graphs with it.

  • check_graphite plugin to check graphs:
    • Set any metric path
    • Set thresholds
    • Set timeframe
    • Apply function on target metric

Note: The check_graphite plugin and its dependency rubygem-rest-client are already pre-installed on "graphing1.localdomain".

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Graph Monitoring

Add a new service to one of your hosts to check metrics from Graphite. Ideally the datapoints you are checking do not come from Icinga itself. The CheckCommand for the check_graphite plugin is provided via Icinga Template Library (ITL).

File: /etc/icinga2/training/services.conf

apply Service "graphite-load" {
  import "generic-service"

  check_command = "graphite"

  vars.graphite_url = "http://graphite"
  vars.graphite_metric = "collectd.graphing1.load.\
    load.shortterm"
  vars.graphite_warning = 1
  vars.graphite_critical = 2
  vars.graphite_duration = 5

  assign where host.name == "graphing1.localdomain"
}

Validate the configuration and reload Icinga 2:

# icinga2 daemon -C
# systemctl reload icinga2.service

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

12.2 Icingabeat

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Icingabeat

Icingabeat is an Elastic Beat that fetches data from the Icinga 2 API and sends it either directly to Elasticsearch or Logstash. There are also example dashboards for Kibana available.

Note: Icingabeat, Elasticsearch and Kibana are already pre-installed on "graphing1.localdomain".

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Event Streams

Icingabeat is pre-configured by default to receive check results (CheckResult) and state changes (StateChange) from Icinga's event stream and send them periodically (every 10s) to Elasticsearch.

Notifications, acknowledgements, comments and downtimes are also available as event streams. We add notifications (Notification) as event stream in our training environment and we have to turn off ssl verification:

File: /etc/icingabeat/icingabeat.yml

...

ssl.verify: false

eventstream.types:
 - CheckResult
 - StateChange
 - Notification

...

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Start Elasticsearch and Icingabeat

Finally we can start Elasticsearch and Icingabeat:

# systemctl start elasticsearch.service
# systemctl start icingabeat.service

Note: If you're interested you can also start Kibana, it will be available at: http://192.168.56.101:5601

# systemctl start kibana.service

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Grafana Elasticsearch Data Source

Add the Elasticsearch data source with the Icingabeat index "[icingabeat-*]" to Grafana:

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Notification Annotations

Create a new graph with icinga2.training_localdomain.services.random. random.metadata.state metrics from Graphite and annotation settings from Elasticsearch:

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Graph with Annotations

The graph should look similar like this one:

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

12.3 Web Modules

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Modules for Icinga Web 2

Icinga Web 2 can be extended with so called modules. There are some modules available for the integration of Graphite and Grafana.

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Graphite Module

The Graphite module provided from the Icinga project integrates the graphs directly into the Icinga Web 2 interface:

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Graphite Module Installation

The Graphite module can be installed via git:

# git clone https://github.com/Icinga/\
icingaweb2-module-graphite.git \
/usr/share/icingaweb2/modules/graphite

# cd /usr/share/icingaweb2/modules/graphite/
# git checkout v1.0.1

After that the module must be enabled and adapted to the environment:

# icingacli module enable graphite

File: /etc/icingaweb2/modules/graphite/config.ini

[graphite]
url = "http://graphite"
insecure = "1"

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Grafana Module

An Icinga Web 2 module that retrieves graphs from Grafana is provided from GitHub user "mikesch-mp":

~~~SECTION:MAJOR~~~ Integrations

NETWAYS

Grafana Module Configuration

The Grafana module provides two default dashboards base-metrics.json and icinga2-default.json which have to be imported into Grafana.

After that Icinga Web 2 needs a configuration in order to use Grafana as backend for the module in /etc/icingaweb2/modules/grafana/config.ini:

[grafana]
host = 192.168.56.101:3000
datasource = "graphite"
username = "admin"
password = "admin"
accessmode = "proxy"

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

13 Alternatives

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

13.1 Transports

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Transport Alternatives

There are a few alternatives for Carbon Relay and/or Carbon Aggregator.

They are written in other languages than Python and aim to be faster. Also they provide more features.

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

carbon-c-relay

  • Written in C
  • Cleansing of received metrics
  • Support for multiple clusters
  • Lots of relay methods available
    • forward (all destinations)
    • any_of (one destination)
    • failover (usually first destination)
    • carbon_ch (distribute metrics)
  • Rewrite rules
  • Aggregation functionality
  • Plaintext input

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

carbon-relay-ng

  • Written in Go
  • Validation on all incoming metrics
  • Adjust the routing table in runtime
  • Web and telnet interface
  • Advanced routing functions (i.e. queue data to disk)
    • sendAllMatch (all destinations)
    • sendFirstMatch (first destination)
    • consistentHashing (distribute metrics)
  • Rewrite rules
  • Aggregation functionality
  • Plaintext, pickle and AMQP input
  • Plaintext, pickle, metrics 2.0 and kafka output

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

graphite-relay

  • Written with Netty
  • Different backend strategies
  • Overflow handler
  • Plaintext input
  • No active development since 2012

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

13.2 Cache

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

go-carbon Overview

go-carbon is a re-implementation of Carbon Cache written in Go.

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

go-carbon Features

  • Receive metrics from TCP and UDP (plaintext protocol)
  • Receive metrics with Pickle protocl (TCP only)
  • Receive metrics from HTTP and Apache Kafka
  • Uses same configuration files as Carbon Cache
  • Automatic config reload for many configuration sections
  • Acts as carbonlink for Graphite-Web (port 7002)
  • Writes statistics about itself
  • Supports carbonapi, which is useful in supported setups

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

go-carbon Pros/Cons

  • Pros:

    • Requests to carbonlink are faster than with Carbon Cache
    • Configuration and deployment is easy
    • Daemon spawns workers automatically, no need for separate configuration
    • Constant development since 2015
  • Cons:

    • Current Graphite-Web implementation is experimental
    • Still needs Graphite-Web to access metrics

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

13.3 Storage Backends

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Whisper Alternatives

  • File Based
    • Ceres
    • Round-Robin Database
    • Go Whisper
  • Based on LevelDB
    • InfluxDB (already seen)
  • Based on HBase
    • OpenTSDB (already seen)
  • Based on Cassandra
    • Cyanite
    • KairosDB
    • BigGraphite

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Ceres

Ceres is an alternative time-series database to Whisper. It is intended to replace Whisper as the default storage for Graphite.

  • Not fixed-sized
  • Calculate timestamps instead of storing them for each datapoint
  • Store datapoints of one metric across multiple servers

Unfortunately Ceres is not intended to be used in production yet. Ceres is not actively maintained, so there is no roadmap if and when a final release will happen.

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Round-Robin Database

Round-Robin Database (RRD) is well-known by many Open Source tools like Cacti, MRTG, Munin or PNP4Nagios. Usually they use RRDtool to store their data into RRD-files.

Graphite-Web has included support for RRD since the very beginning. It is a fixed-size database, similar in design and purpose to Whisper.

Differences compared to Whisper are:

  • Can not take updates to a time-slot prior to its most recent update
  • RRD was not designed with irregular updates in mind
  • Updates are staged first for aggregation and written later
  • More disk space efficient than Whisper
  • Little bit faster than Whisper due to the fact it's written in C

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Go Whisper

Go Whisper is a Go implementation of Whisper. To create a new Whisper database you must define it's retention level(s), aggregation method and the xFilesFactor.

  • Not thread safe on concurrent writes
  • Still in development

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Cyanite

Cyanite is an Apache Cassandra based time-series database designed to be API-compatible with the Graphite eco-system and easy to scale.

Graphite-Cyanite is the so called storage finder and the component between the Cyanite backend and the Graphite Render API. It prefers the Graphite-API to Graphite-Web.

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

KairosDB

KairosDB is a TSDB similar to OpenTSDB but built on top of Cassandra.

cairos-carbon is a re-implementation of Carbon in Java and feeds KairosDB with support for the plaintext protocol.

A storage finder for Graphite-Web called graphite-kairosdb is provided from raintank.

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

BigGraphite

BigGraphite is a storage layer for timeseries data. It integrates with Graphite as a plugin and was developed at Criteo.

The only supported database backend by now is Apache Cassandra.

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

13.4 API

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Graphite-API

  • Only functionality of Graphite-Web's Render API
  • No serverside rendering
  • No database
  • Easy to install and configure
  • Originally developed for Cyanite
  • Last release in 2016

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

13.5 Stacks

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Zipper

  • Written in Go
  • Originally developed at booking.com
  • Can query store servers in parallel
  • Can "zip" the data
  • Consists of:
    • carbonzipper
    • carbonapi
    • carbonserver (since December 2016 part of go-carbon)
    • carbonsearch

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Zipper Stack Architecture

Example architecture of the Zipper Stack:

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

ClickHouse

ClickHouse is an Open Source column-oriented database management system developed from Yandex that allows generating analytical data reports in real time.

carbon-clickhouse is a metrics receiver that uses ClickHouse as storage. The connection from Graphite-Web to the cluster backend graphite-clickhouse including ClickHouse support is configured with CLUSTER_SERVERS in local_settings.py.

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

ClickHouse Architecture

Work scheme for Graphite with ClickHouse support:

Note: The gray components are optional or alternative.

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

13.6 Dashboards

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Dashboard Alternatives

  • Grafana (already seen)
  • Tasseo
  • Dusk
  • Dashing

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Tasseo

  • Live dashboard
  • Supports Graphite, InfluxDB, Librato Metrics and Amazon Cloud Watch backend sources

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Dusk

  • Based on Cubism.js
  • Find hotspots across a range of values in the same metric domain
  • Beta stadium

~~~SECTION:MAJOR~~~ Alternatives

NETWAYS

Dashing

  • Sinatra based framework
  • Pushes data to widgets
  • Currently out of maintenance

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

14 Optimization

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

14.1 Storage

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Whisper Limitations

  • Performance is slower than RRD, since Whisper is written in Python
  • Whisper is disk space inefficient
  • Updates end up involving a lot of IO calls

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Optimize Storage

  • Reduce incoming datapoints
  • Use blacklisting
  • Better aggregation
  • Reduce retention times
  • Less retentions
  • Keep a lower granularity
  • Optimize Cache
  • Replace with alternative Storage

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

14.2 Cache

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Carbon Cache Limitations

  • Cache is not able to keep up with incoming datapoints
  • Incoming datapoints are dropped or in waiting line
  • System runs out of memory when MAX_CACHE_SIZE is set to high

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Optimize Cache (1/2)

  • Adjust MAX_CREATES_PER_MINUTE (Defaults to 50)
  • Adjust MAX_CACHE_SIZE (Defaults to inf)
  • Adjust CACHE_WRITE_STRATEGY (Defaults to sorted)
    • max reduces random I/O
    • naive only with fast I/O and limited CPU resources
  • Set WHISPER_AUTOFLUSH when writing to slow disks (Defaults to False)
  • Set WHISPER_SPARSE_CREATE for faster creates (Defaults to False)
  • Set WHISPER_LOCK_WRITES when multiple Carbon Caches write to the same files (Defaults to False)

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Optimize Cache (2/2)

  • Increase page cache (disk cache) for less read iops
  • Spawn Cache on same system (vertical scaling)
  • Distribute Cache over more systems (horizontal scaling)
  • Replace slow disks with faster ones (SSD's) for better write iops
  • Replace with alternative Cache (go-carbon)

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

14.3 Forwarding

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Carbon Relay Limitations

  • Queue is growing continuously
  • Incoming datapoints are dropped or in waiting line
  • System runs out of memory when MAX_QUEUE_SIZE is set to high
  • Only consistent-hashing or regex based routing supported
  • No aggregation functionality

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Optimize Relay

  • Adjust MAX_QUEUE_SIZE (Defaults to 10000)
  • Set USE_FLOW_CONTROL (Defaults to True)
  • Adjust QUEUE_LOW_WATERMARK_PCT (Defaults to 0.08)
  • Adjust MAX_DATAPOINTS_PER_MESSAGE (Defaults to 500)
  • Put Aggregator afterwards
  • Optimize Cache or Aggregator aftwards
  • Replace with alternative Relay

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Carbon Aggregator Limitations

  • Queue is growing continuously
  • Incoming datapoints are dropped or in waiting line
  • System runs out of memory when MAX_QUEUE_SIZE is set to high
  • Only consistent-hashing supported

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Optimize Aggregator

  • Adjust MAX_QUEUE_SIZE (Defaults to 10000)
  • Set USE_FLOW_CONTROL (Defaults to True)
  • Adjust QUEUE_LOW_WATERMARK_PCT (Defaults to 0.08)
  • Adjust MAX_DATAPOINTS_PER_MESSAGE (Defaults to 500)
  • Put Relay afterwards
  • Optimize Cache or Relay afterwards
  • Replace with alternative Aggregator

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

14.4 Webapp

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Graphite-Web Limitations

  • Performance impacts
  • Awkward operation
  • Configuration of Django is very complex
  • Overhead when only API is needed

~~~SECTION:MAJOR~~~ Optimization

NETWAYS

Optimize Webapp

  • Generate index
  • Use memcached
  • Use dashboard alternatives
  • Use alernatives that provide only API functionality

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

15 Best Practice

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

15.1 Considerations

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

Setup

  • Use fast disks (SSD's) and as many CPU cores as possible
  • Use toolstack that fits to the environment
  • Prefer Pickle protocol over Plaintext
  • Care about Storage Schemas and Retentions
  • Always keep scaling in mind
  • Benchmark the whole installation
  • Tune the installation
  • Benchmark the whole installation again
  • Know the limits of the installation

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

Operating

  • Keep an eye of internal statistics (Monitoring!)
    • Queues
    • Caches
  • Keep an eye of system statistics (Monitoring!)
    • IO reads and writes (iostat, iotop, collectl, dstat, sar, ...)
    • CPU utilization (top/htop, iostat, collectl, dstat, sar, ...)
    • Memory usage (top/htop, collectl, dstat, sar, ...)
    • Network performance (iperf, iftop, tcpdump, nload, dstat, sar, ...)
    • Disk usage (df, ...)

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

Maintenance

  • Think about backups (Validation!)
  • Think about disaster recovery (Tests!)
  • Consider updates
  • Cleanup orphaned or corrupt Whisper files
  • Validate Storage Schemas and Retentions
  • Resize Whisper files

Example command to cleanup orphaned Whisper files:

# find /opt/graphite/storage/whisper/ -type f \
-name *.wsp -mtime +60 -delete

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

15.2 Benchmarking

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

Haggar - Carbon Load Generation Tool

Installation and run on CentOS:

# yum -y install golang
# go get github.com/gorsuch/haggar

# /root/go/bin/haggar -agents=5 -metrics=10 \
-carbon="127.0.0.1:2003"

2018/03/01 11:15:28 master: pid 12863
2018/03/01 11:15:28 agent 0: launched
2018/03/01 11:15:38 agent 0: flushed 10 metrics
2018/03/01 11:15:40 agent 1: launched
2018/03/01 11:15:42 agent 2: launched
2018/03/01 11:15:48 agent 0: flushed 10 metrics
2018/03/01 11:15:50 agent 1: flushed 10 metrics
2018/03/01 11:15:52 agent 2: flushed 10 metrics
2018/03/01 11:15:58 agent 0: flushed 10 metrics
2018/03/01 11:15:58 agent 3: launched
2018/03/01 11:16:00 agent 1: flushed 10 metrics
2018/03/01 11:16:00 agent 4: launched
2018/03/01 11:16:02 agent 2: flushed 10 metrics
...

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

bonnie++ - File System Benchmark

Benchmark file system performance:

  • Data read and write speed
  • Number of seeks that can be performed per second
  • Number of file metadata operations that can be performed per second

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

15.3 Information Resources

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

Information Resources

Keep informed!

  • Documentation
  • GitHub projects
  • Social Media
  • StackExchange
  • IRC channel
  • Meetups and Conferences (e.g. GrafanaCon)

~~~SECTION:MAJOR~~~ Best Practice

NETWAYS

Graphite Book