Created: Dec 05, 2019 | Updated: Jan 12, 2020 | Document Version: 0.3
Table of Contents
- Table of Contents
- Important Note on Python Version
- Netdata Configuration
- General Configuration
- Collectors (Data Collection)
- Health & Alarms
- Backends (Long-term Storage)
A while back I discovered Netdata and I was truly fascinated by it’s take on system monitoring for two main reasons. It was extremely lightweight, something of great usefulness to me, an IoT enthusiast, as it proved to be a very useful tool to debug IoT devices, such as the Raspberry pi.
On a Raspberry pi zero W, it consumes as little as 10% of the cpu, without any optimization whatsoever and update frequency of 1 second. Crazy huh?
Another aspect that I enjoyed as a newcomer to the world of Devops was the intrinsic easiness of both the installation and customization of the platform. It’s truly a fire and forget application that is configured and programmed in such as way that the default settings will be adequate for a large number of users, myself included.
This fascination and the fact that I had just finished my Integrated Master’s diploma ,(thus looking to break into the startup world), led me to contribute a bit of docs and code, as a good open source citizen.
Wanting to dig further, I was extremely fortunate to be introduced to a sys-admin at the University of Patras, named Spiros. Spiros and I, installed and configured Netdata on a production server for the University of Patras. This post is an attempt to distill what we learned.
This post describes how to install and configure Netdata for a production server for a typical PHP webserver (Nginx, php, MariaDB).
It was originally written back in January, but it is published now for the first time.
This post is intended for the purpose of onboarding a production web-server to the Netdata platform, introducing a new era to metrics and performance monitoring. Netdata is built around 4 main principles:
- Per second data collection for all metrics.
- Collection & Visualisation of metrics from all possible sources
- Meaningful presentation of metrics, optimised for visual anomaly detection
- Preconfigured, fire (install) and forget!
Netdata decentralizes monitoring completely. Each Netdata node is autonomous. It collects metrics locally, it stores them locally, it runs checks against them to trigger alarms locally, and provides an API for the dashboards to visualize them. Horizontal Scale
So what Netdata does?
- Metrics are auto-detected, so for 99% of the cases data collection works out of the box.
- Metrics are converted to human readable units, right after data collection, before storing them into the database.
- Metrics are structured, organized in charts, families and applications, so that they can be browsed.
- Dashboards are automatically generated, so all metrics are available for exploration immediately after installation.
- Dashboards are not just visualizing metrics; they are a tool, optimized for visual anomaly detection.
- Hundreds of pre-configured alarm templates are automatically attached to collected metrics.
How netdata collectors work:
Netdata uses internal and external plugins to collect data. Internal plugins run within the Netdata daemon, while external plugins are independent processes that send metrics to Netdata over pipes. There are also plugin orchestrators, which are external plugins with one or more data collection modules.
Requirements for the use-case
- Period: 1 year
- Granularity: (average)5min
- Data points: cpu, net traffic, ram (systemwide, per application)
- System: Ubuntu 18.04.3 LTS
- VM: Yes
- mariaDB (mysql, innodb style)
- Critical: downtime (uptime monitor)
- Normal: uptime monitor, unusual senarios (e.g 70% cpu. 99% i/o filesystem)
- Medium: email
- Local registry
- 2 linux VMs webserver, one for upatras.gr, one for various minor sites
- situated in the same secure network, no need for vpn or authentication
In this section we go through the installation of the netdata platform and all the required libaries, packages and third party programs that will enable the monitoring according to the requirements.
This installation will be replicated to both the machines that will host a web-server, one will work as a slave that will forward metrics to the main machine where all the processing and storage will be conducted. It is explained in the Streaming section.
There are multiple ways to install netdata, but the user is advised to use the one-line script that is provided by netdata. According to the requirements, this is the optimal route.
With the one-liner below, you install netdata and all the dependencies. You also agree that Netdata will aggregate anonymous statistics (such as OS version) that is necessary for rapid product development and discovery (prioritise OS binaries, decide features etc.)
bash <(curl -Ss [https://my-netdata.io/kickstart.sh](https://my-netdata.io/kickstart.sh)) --no-updates --stable-channel
no-updates: Prevent automatic updates of any kind.
stable-channel: Automatically update only on the release of new major versions (thus install latest stable release)
To inspect the installation url, the use can download it from here.
Now that Netdata is (hopefully) installed, please head over to http://localhost:19999 to verify correct installation.
If you can’t connect, it’s possible that you have a firewall installed and that you have to open the port
19999. Belowdav you can find relevant information and commands for Ubuntu & CentOS.
sudo ufw allow 19999/tcp
firewall-cmd --permanent --add-port 19999/tcp
Don’t forget to enable the netdata service using
sudo systemctl enable netdata
All Installation methods: https://docs.netdata.cloud/packaging/installer/
Installation using Package Manager
In a production environment, it is a good practice to use the Package Manager to install Netdata. Please do so by using the Package Manager of the distribution.
Before Installing Netdata from the Package Manager, we need to add the Netdata Repository from PackageCloud into the system package sources.
Run the following code to add the Repository into your system’s package sources:
curl -s https://packagecloud.io/install/repositories/netdata/netdata/script.deb.sh | sudo bash
Now we can install netdata (e.g using
apt-get install in Ubuntu):
sudo apt-get install update sudo apt-get install upgrade sudo apt-get install netdata
Nginx Proxy - Security Alert
For increased security, it is advised that you configure Netdata to be only accessible from localhost and proxy the connection through an NginX Reverse Proxy server. Please follow the guide to set up the reverse proxy.
As per the uproduction web-server scenario requirements, we need to install some external libraries in order for the plugins that enable the data collection (collectors) can be used by netdata.
Although netdata supports plugins from 4 different ecosystems (Bash, Python, NodeJS, Golang), we will be using Python Plugins as they cover our needs and are the most battle-tested.
News Bullet: Netdata is gradually implementing all the python add-ons to Golang, as Golang offers better performance (which is critical to minimise netdata’s footprint). Although Python plugins will still function as expected, after the transition, new versions of the plugins (e.g with more metrics) will likely appear as Go plugins.
Update/Install either Python 2.7+ or Python 3.1+
To test currently installed python version:
#python 2 python -V
#python 3 python3 -V
Important Note on Python Version
Currently, Netdata uses the system’s default python version. Thus, for most systems (e.g Ubuntu), Netdata will use Python2.7. If python2.7 is the default one for your system, please install all the libraries for Python2.7.
All systems have a certain Python version already pre-installed. It is advised to use that one for the rest of the tutorial and not install a new one.
Install PiP (Python Package Manager)
Before proceeding, we will install PIP, a necessary Package Manager for all Python libraries.
Depending on the python version that we will be using, install PIP using the Package Manager of the distribution.
sudo apt install python-pip pip install --upgrade pip #update pip to latest version
sudo apt install python3-pip pip install --upgrade pip #update pip to latest version
Install Collector Libraries:
To monitor the MariaDB server, we will be needing the python library
python-mysqldb. Please follow the instructions bellow to install the package system-wide.
It is worth mentioning that depending on your setup, it is possible that you will be needing to use
sudo pip. In that case, please use the -H flag as follows:
sudo -H. It is needed to force pip to install the package to the system’s home directory, thus making the package available system-wide.
Depending on whether you use Python2 or Python3, the installation of the required packages and libraries is different. The packages are required so that the Python Library that interfaces with MariaDB can function properly.
sudo apt-get install default-libmysqlclient-dev sudo apt-get install python-dev pip install mysqlclient
sudo apt-get install default-libmysqlclient-dev sudo apt-get install python3-dev pip3 install mysqlclient
yum install mariadb-devel pip3 install mysqlclient
yum install mariadb-devel pip install mysqlclient
Github page: https://github.com/PyMySQL/mysqlclient-python
Now that everything is installed, you can restart netdata by running
sudo systemctl restart netdataand head over to the dashboard
http:<netdata_host_ip>:19999. Netdata automatically detects data sources and enable plugins.
The configuration of the netdata agent is conducted through a helper script which is found at:
/etc/netdata/edit-config. It is possible that
sudo is needed to use the script as the script fetches the default configuration, opens it for the user to edit and then saves it in
The main configuration file is named: netdata.conf.
netdata.conf file is broken up into various sections, such as [global], [web], [registry], and more. By default, most options are commented, so you’ll have to uncomment them (remove the #) and restart netdata for Netdata to recognise your change.
We will be leaving the configuration at default for an initial testing period, before we proceed to any modifications to address specific needs. Bellow, the user can find all relevant documentation for easy access.
One metrics we do need to change, is the storage capacity of the dbengine, which is the default saving mechanism and our choice as per the requirements.
Note that there are several options regarding the mechanism with which netdata will save data, including the dbengine and a Round-Robin in-memory only database. For more information: https://docs.netdata.cloud/database
Collectors (Data Collection)
Netdata supports internal and external data collection plugins:
- internal plugins are written in
Cand run as threads inside the
- external plugins may be written in any computer language and are spawn as independent long-running processes by the
netdatadaemon. They communicate with the
To minimize the number of processes spawn for data collection, Netdata also supports plugin orchestrators.
plugin orchestrators are external plugins that do not collect any data by themeselves. Instead they support data collection modules written in the language of the orchestrator. Usually the orchestrator provides a higher level abstraction, making it ideal for writing new data collection modules with the minimum of code.
We will be using the Python plugin orchestrator, each of the collectors states below is in essence a module of the Python.d plugin.
When Netdata starts, it auto-detects dozens of data sources, such as database servers, web servers, and more. To auto-detect and collect metrics from a service or application you just installed, you need to restart Netdata.
Download and install all the software that you want to monitor (in our case: nginx, PHP-fmp, MariaDB) prior to the netdata installation, to leverage automatic discovery)
We can configure both internal and external plugins, along with the individual modules.
[plugins]section: Enable or disable internal or external plugins with
[plugin:XXX]sections: Each plugin has a section for changing collection frequency or passing options to the plugin.
.conffiles for each external plugin: Enable/Disable specific modules or set plugin specific configurations. For example, at
.conffiles for each module. Set module specific configurations, such as the metrics endpoint. : For example, at
We will be leaving the configurations at default for the testing period. In case you have installed an application with non-standard settings (e.g different socket location) please do make the proper change in the appropriate
.conf file. Otherwise, no configuration is needed (unless explicitly stated in this guide).
#Example sudo /etc/netdata/edit-config python.d/nginx.conf
As per the requirements of production web-server, the following plugins were chosen:
- web_log: Parses the log files of known servers (e.g nginx) and detects irregularities
- mysql: connects to a mysql database and fetches various metrics
- php-FPM: Uses the /metrics endpoint of a PHP-FPM powered server and fetches metrics
- nginx: Uses the metrics endpoint( socket, HTTP endpoint) and fetches metrics
Notes on editing the configuration files:
Python plugins use YAML files to be configured, and one of the most important common configurable variable, is the resource address that each plugin will reach to gather statistics.
To facilitate configuration, most Netdata plugins have already pre-configured a number of possible endpoints for each plugin. For example,
mysql searches on multiple directories for the right
Unix socket It does so by defining different
job for one endpoint .
Note that if two or more jobs have the same
name, only one will run. Meaning that the plugin will use the endpoint for which it’s job was the first one to succeed in retrieving information.
Thus, when configuring the plugins, you can edit only one job to point it to the right direction (for example, if you have changed the default directory for the
It is also advised to change the name of all the jobs to the desired name (that will be used by the dashboard) so that you don’t have to find out specifically which job is the one that is activated by the plugin.
Parses the log file of various web servers (nginx, apache, gunicord, etc.) to detect the following scenarios:
- too many redirects
- too many bad requests
- too many internal server errors
unreasonably too many requests
- unreasonably few requests
- unreasonably slow responses
- too few successful responses
🔧 Edit the configuration file to point the collector to the correct weblog metrics endpoint ( the logs of your web server e.g apache, nginx, etc. ).
sudo /etc/netdata/edit-config python.d/web_log.conf
Make sure that web_server logs and the relevant directory is accessible by the user
It monitors one or many mysql servers. To function as intended it requires a python library (as mentioned above) and a netdata user to access the database and gather metrics.
To create the
netdata user, execute the following in the MySQL shell:
create user 'netdata'@'localhost'; grant usage on *.* to 'netdata'@'localhost'; flush privileges;
🔧 Edit the configuration file to point the collector to the correct mysql metrics endpoint.
📍Please note that the default settings probably already cover your use-case.
sudo /etc/netdata/edit-config python.d/mysql.conf
This module will monitor one or more php-fpm instances depending on configuration.
🔧 Edit the configuration file to point the collector to the correct PHP-FPM metrics endpoint.
sudo /etc/netdata/edit-config python.d/phpfpm.conf
This module will monitor one or more nginx servers depending on configuration. Servers can be either local or remote.
🔧 Edit the configuration file to point the collector to the correct nginx metrics endpoint.
sudo /etc/netdata/edit-config python.d/nginx.conf
Module monitors remote http server for availability and response time.
The user must edit the
.conf file to point it to the web server’s location. You can use regex expression, as stated in the configuration file, in order for the plugin to search a specific string in the website.
This is handful in case the PHP server has crashed, so nginx does not return the proper website, but rather the nginx home screen. HTTP (200) is returned either way, meaning that httpcheck without regex can’t tell the difference.
sudo /etc/netdata/edit-config python.d/httpcheck.conf
DB engine is the custom database implemented and used by netdata. It stores most recent data in-memory and “spills” historical data into the disk, compressing them to increase storage performance even further. Notably, it is optimised for HDDs, offering performance that is acceptable versus the more efficient SSDs.
The configuration variables that are of interest to us are the following:
[global] page cache size = 32 dbengine disk space = 256
page cache size sets the maximum amount of RAM (in MiB) the database engine will use for caching and indexing.
dbengine disk space sets the maximum disk space (again, in MiB) the database engine will use for storing compressed metrics.
These default settings will retain about a day’s worth of metrics when Netdata collects roughly 4,000 metrics every second. If you increase either
page cache size or
dbengine disk space, Netdata will retain even more historical metrics.
The use-case of production web-server involves roughly 2,000 metrics per second, thus the default settings will save 2 days worth of data. For a retention period of 1 week, the following settings are advised.
When using the dbengine, the
history configuration variable is irrelevant, please ignore it.
[global] page cache size = 32 dbengine disk space = 1024
Note that both fields affect the memory footprint of the
dbengine and as a result netdata in general. For the specific use-case tests were conducted and was decided that the change in memory consumption is not noteworthy. For more info: https://docs.netdata.cloud/database/engine/#memory-requirements
🙏 With the database engine active, you can back up your /var/cache/netdata/dbengine/ folder to another location for redundancy.
dbengine docs: https://docs.netdata.cloud/database/engine
dbengine & history retention period: https://docs.netdata.cloud/docs/tutorials/longer-metrics-storage/
Health & Alarms
Each Netdata node runs an independent thread evaluating health monitoring checks. This thread has lock free access to the database, so that it can operate as a watchdog.
Netdata also supports alarm templates, so that an alarm can be attached to all the charts of the same context (i.e. all network interfaces, or all disks, or all mysql servers, etc.).
Each alarm can execute a single query to the database using statistical algorithms against past data, but alarms can be combined. So, if you need 2 queries in the database, you can combine 2 alarms together (both will run a query to the database, and the results can be combined).
For the given use case, we will be using the stock alarms as they cover all the scenarios. The user is advised to adjust the alarms in the future for more fine-grained controls.
To edit the configuration files for the alarms of each plugin, we will use the
edit-config command, like this:
# Example for editing the alarms of the httpcheck plugin sudo /etc/netdata/edit-config health.d/httpcheck.conf
To get an easy overview over the active alarms, head over to the dashboard and click on the Alarms section at the top. After clicking the category all, you can overview the activated alarms, their configuration and an apt description.
An important aspect of the alarms component is the notification functionality. Netdata is able to inform the user on important events, via multiple mediums, from e-mail to discord integration.
To configure the alarming configuration (such as email recipient, etc.) we edit the following configuration file:
sudo /etc/netdata/edit-config health_alarm_notify.conf
Since we will be using
DEFAULT_RECIPIENT_EMAIL="YOUR_EMAIL" # to receive only critical alarms, set it to "YOUR_EMAIL|critical"
To configure email with netdata, the user needs to configure a terminal email client.
sendemail is advised as is one of the most common clients and also the default choice for netdata.
To configure please follow the instructions: https://dbaron.org/linux/sendmail
For more information on the syntax of the alarms (in order to fine tune the alarm criteria or add new alarms), please visit the health.d docs and the badges doc (they use same structure).
Each Netdata is able to replicate/mirror its database to another Netdata, by streaming collected metrics, in real-time to it. This is quite different to data archiving to third party time-series databases.
When Netdata streams metrics to another Netdata, the receiving one is able to perform everything a Netdata instance is capable of:
- visualize them with a dashboard
- run health checks that trigger alarms and send alarm notifications
- archive metrics to a backend time-series database
Per the requirements, we will be using a slave-master setup, where a node will behave as the master node and gateway to the system, while each slave will only serve as aggregator of data.
In essence the Master Netdata node will have 2 databases from which it will draw charts and raise alarms.
- The first database is from the machine where the master node is installed
- The second database is from the machine where the slave node is installed
When setting up the Master Node, we will be configuring settings for API_KEYS and not for distinct slaves, meaning that master node will treat each slave that is configured with an identical API_KEY, the same.
Depending on the use case, we might want to have different settings for different slave machines (e.g different history retention policy). We generate
N API_KEYS for
slave groups. All the
slaves that belong to the same
slave group will be treated the same by the
N API_KEYS, we run N times the command:
uuidgen. In our case, we generate only one (1)
Now it is time to setup the Master Node:
- Setup netdata (without installing any extra libraries, as Netdata Master will only monitor the system and itself, serving as the central hub of the metrics from all the slaves).
sudo /etc/netdata/edit-config stream.conf
API_KEYmentioned bellow is the
API_KEYwe generated using
[API_KEY] enabled = yes default history = 3600 default memory mode = dbengine health enabled by default = auto allow from = *
- For the initial testing period we wont’t be editing anything else. You can return later to customise it according to your specific needs.
- Since both the machines belong to the same secured network, we won’t be needing authentication and TLS support.
You can find in-detail explanation of all the available configuration options for the streaming functionality in the docs.
The setup of each slave Node is identical:
- Setup netdata and the required libraries according to the guide above
- When setting up the dbengine, use the default settings that will allow close to 2 days word of data to be stored in the slave for redundancy.
- Use the
API_KEYthat was generated in the Master Slave phase.
[stream] enabled = yes destination = IP:PORT #Master api key = API_KEY #generated in step (1)
Now each slave node is configured to stream the entirety of the metrics that it collects to the master node.
- Use the
Using Netdata, your monitoring infrastructure is embedded on each server, limiting significantly the need of additional resources. Netdata is blazingly fast, very resource efficient and utilizes server resources that already exist and are spare (on each server). This allows scaling out the monitoring infrastructure.
However, the Netdata approach introduces a few new issues that need to be addressed, one being the list of Netdata we have installed, i.e. the URLs our Netdata servers are listening.
To solve this, Netdata utilizes a central registry. This registry, together with certain browser features, allow Netdata to provide unified cross-server dashboards.
The registry keeps track of 4 entities:
machines: i.e. the Netdata installations (a random GUID generated by each Netdata the first time it starts; we call this machine_guid)
For each Netdata installation (each
machine_guid) the registry keeps track of the different URLs it is accessed.
persons: i.e. the web browsers accessing the Netdata installations (a random GUID generated by the registry the first time it sees a new web browser; we call this person_guid)
For each person, the registry keeps track of the Netdata installations it has accessed and their URLs.
URLs of Netdata installations (as seen by the web browsers)
For each URL, the registry keeps the URL and nothing more. Each URL is linked to persons and machines.
accounts: i.e. the information used to sign-in via one of the available sign-in methods. Depending on the method, this may include an email, an email and a profile picture. Only applicable via cloud log-in.
For security purposes and per the production web-server requirements, we will be establishing an agent node as a local registry and then configure all the netdata nodes to advertise themselves to that registry. This way, the user will be able to access the local registry and easily navigate towards all the available netdata instances.
🔧 To turn any Netdata into a registry, edit
/etc/netdata/netdata.conf and set:
[registry] enabled = yes registry to announce = http://<netdata_host_ip>:19999 registry hostname = Master node #hostname used only in the registry
And now set the other netdata instances to advertise to that registry, edit again
[registry] enabled = no registry to announce = http://<netdata_host_ip>:19999 registry hostname = Slave node #hostname used only in the registry
No additional configuration is required. Restart netdata and head over to the Dashboard to evaluate setup.
Backends (Long-term Storage)
Up until this point, netdata has been setup to save data for 1 week in the Master Node and for 1 day in the slave Node.
Remember that the slave’s database is been replicated to the Master Node, with retention period of 1 week.
To enable long-term historic data we will be using the backends functionality of netdata. According to the requirements, we will be using Prometheus to grab data from the Master Node and save it in a time-series long term database.
In essence, we will point Prometheus to netdata in order to fetch data and Grafana to Prometheus in order to visualise them.
We will set the Prometheus sampling at 3m, since for historic data 1s accuracy is not required.
We will be installing Prometheus on the machine where the Netdata Master Node runs.
To install Prometheus run the following commands:
cd /tmp && curl -s https://api.github.com/repos/prometheus/prometheus/releases/latest \ | grep "browser_download_url.*linux-amd64.tar.gz" \ | cut -d '"' -f 4 \ | wget -qi - # create Prometheus user sudo useradd -r prometheus # create prometheus directory sudo mkdir /opt/prometheus sudo chown prometheus:prometheus /opt/prometheus # Untar prometheus directory sudo tar -xvf /tmp/prometheus-*linux-amd64.tar.gz -C /opt/prometheus --strip=1
Prometheus is setup using
sudo nano prometheus.yaml
- Copy the following configuration, replace NETDATA_IP with localhost:
# my global config global: scrape_interval: 3m # Set the scrape interval to every 5 seconds. Default is every 1 minute. evaluation_interval: 5s # Evaluate rules every 5 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'codelab-monitor' # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first.rules" # - "second.rules" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['0.0.0.0:9090'] - job_name: 'netdata-scrape' metrics_path: '/api/v1/allmetrics' params: # format: prometheus | prometheus_all_hosts # You can use `prometheus_all_hosts` if you want Prometheus to set the `instance` to your hostname instead of IP format: [prometheus_all_hosts] #stream both Master & Slave node # # sources: as-collected | raw | average | sum | volume # default is: average #source: [as-collected] # # server name for this prometheus - the default is the client IP # for Netdata to uniquely identify it #server: ['prometheus1'] honor_labels: true static_configs: - targets: ['<netdata_host_ip>:19999']
Some notes on the configuration :
- We set the source as
averagewhich is the default setting. It means that Prometheus will store the average value of each interval (3m). If we set
as-collected, we would be saving the exact value of each metric every 3 minutes.
- Prometheus_all_hosts: It facilitates storage & retrieval, since each metric is saved with a prefix depending on the host_machine from where it originated (remember, we have 2 machines).
- Prometheus needs only to point to the Master Node. Master Node will export both it’s databases.
- For all intents and purposes, we will be using the default settings for Prometheus
👉👉👉👉 Go to
[http://localhost:9090](http://localhost:9090) to verify that everything is working.
Please don’t forget to verify that the system’s firewall allows prometheus to be accessed from the web. For more information, please refer back to the Firewall Section
You can read more in detail on how to setup Prometheus for more customised options at the docs:
Finally, it’s needed to create and install prometheus as a service. Follow the commands:
sudo touch /etc/systemd/system/prometheus.service sudo nano /etc/systemd/system/prometheus.service
and paste the following text inside the service file:
[Unit] Description=Prometheus Server AssertPathExists=/opt/prometheus [Service] Type=simple WorkingDirectory=/opt/prometheus User=prometheus Group=prometheus ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --log.level=info ExecReload=/bin/kill -SIGHUP $MAINPID ExecStop=/bin/kill -SIGINT $MAINPID [Install] WantedBy=multi-user.target
sudo systemctl start prometheus sudo systemctl enable prometheus
Prometheus and High-Availability
At this point, it is important to make a succinct note on Prometheus as a production-ready Long-Term Storage solution. In reality Prometheus is not intended to be used for long-term archiving, despite it’s efficiency (1.3 bytes/sample) because it does not support High-Availability and Fault-Tolerance capabilities out of the box.
It is recommended for the user to research the following Open-Source projects that are placed on top of Prometheus and provide such functionality:
The installation of Grafana is more straight forward, per the use-case and for easy management, the use of docker containers is advised.
sudo apt install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable" sudo apt update sudo apt install docker-ce
Run Grafana container:
docker run -i -p 3000:3000 grafana/grafana
Note that using the above command, we run Grafana as a “program” and it’s more about testing our setup, rather than deploying in for production.
In order to ensure that Grafana will always run, we need to make docker start on reboot, following the Docker documentation and then run the container using the following argument:
docker run -i -p 3000:3000 --restart unless-stopped grafana/grafana
Please don’t forget to verify that the system’s firewall allows Grafana to be accessed from the web. For more information, please refer back to the Firewall Section
- Visit http://localhost:3000/
- Login with username: Admin, Password: Admin
- Click Add data source and point it to Prometheus (
- Finally, start by creating a new dashboard and adding graphs.
📃More information on starting with Grafana: https://grafana.com/docs/guides/getting_started/
I would like to thank Spyros Konstantopoulos, System Administrator at the University of Patras and Dimitrios Giannopoulos, PhD student at the Network Architecture & Management Group of the ECE department, for their feedback and insightful comments on this work.
If you enjoyed this blog post, leave a comment here. You can't imagine how much I appreciate your feedback.