Telemetry Collector and Graphical Front End on Junos Evolved

By Anton Elita posted 07-18-2022 08:15

Recommend

You want to install a TIG (Telegraf, InfluxDB, Grafana) stack directly in your lab router?

Most network engineers have heard of using streaming telemetry in modern networks. Trying it in the lab often requires basic knowledge for a start, and dedicated computing resources.

This article addresses both concerns. Installing the necessary software will be shown step-by-step, and Junos Evolved will be used as a host running 3rd party components for telemetry collection and visualization. A separate dedicated server is not required, all the needed software runs right on top of Junos Evolved. A high-level overview is illustrated in the below figure.

Figure 1: Software Stack on Junos Evolved

This technical post pursues the goal to introduce telemetry collectors and give an example of a graphical interface for the collected data. It is explicitly advised to use this guidance in lab environments.

When the installation is finished, a similar interface to the collected metrics can be observed:

Introduction

Getting telemetry collected and visualized requires the following components: a network device that streams telemetry data, a collector, a database, and a graphical interface. While Junos devices can serve as streaming telemetry agents, Telegraf, InfluxDB, and Grafana (called TIG stack) are often used to do the rest. Junos Evolved can run Docker-based containers, so TIG stack will be running on the router itself.

Junos Evolved is a network operating system running on many Juniper devices, such as the PTX10001-36MR or ACX7100.

A complete installation will require the following steps:

Configure Junos to accept Remote Procedure Calls (gRPC)
Prepare configuration files for docker containers
Start Docker containers
Access the graphical interface and create own dashboards for interesting telemetry streams

Prepare Junos Evolved for Streaming Telemetry

Junos Evolved 21.4R1 is used here as an example. Streaming measurements out of a networking node can be performed in a few ways. In this guide, OpenConfig gRPC Network Management Interface (GNMI) is used for data encoding and transport. It is a widely adopted choice.

Basic configuration of gRPC:

system {
    management-instance;
    services {
        extension-service {
            request-response {
                grpc {
                    clear-text {
                        port 32767;
                    }
                    max-connections 30;
                    routing-instance mgmt_junos;
                }
            }
        }
    }
}
routing-instances {
    mgmt_junos {        
        description default;
    }
}

Configuration 1: Enable gRPC

It should be noted that “clear-text port 32767” is a hidden command in Junos Evolved OS and should be used in a lab environment only. In production networks, gRPC should only be used over secure transport, but we skip the Secure Sockets Layer (SSL) configuration here, for brevity.

We also configure the mgmt_junos routing instance, which is often used to separate communication paths to the routing engine.

Enable docker service and make it start after each host reboot:

> start shell user root
# export DOCKER_HOST=unix:///run/docker-mgmt_junos.sock
# docker image pull grafana/grafana:5.4.5
# docker image pull influxdb:1.8.10
# docker image pull telegrafa

CLI 2: Pull docker images

If there is no internet connectivity available, it’s possible to pull images on a remote docker-enabled host, save them into an archive, then copy them to Junos Evolved and load archives there.

On a remote host (use the IP address of the Junos Evolved node instead of 11.254.253.7):

docker image pull grafana/grafana:5.4.5
docker image pull influxdb:1.8.10
docker image pull telegraf
docker save grafana/grafana -o /var/tmp/grafana.tar
docker save influxdb:1.8.10 -o /var/tmp/influxdb.tar
docker save telegraf -o /var/tmp/telegraf.tar
scp /var/tmp/grafana.tar /var/tmp/influxdb.tar /var/tmp/telegraf.tar 11.254.253.7:/var/tmp/

CLI 3: Prepare images on a remote host

On Junos Evolved we just need to import the prepared images:

> start shell user root
# export DOCKER_HOST=unix:///run/docker-mgmt_junos.sock
# docker load < /var/tmp/telegraf.tar
# docker load < /var/tmp/grafana.tar
# docker load < /var/tmp/influxdb.tar

CLI 4: Import docker images

Docker containers leave no persistent data across container restarts. Using volumes is the preferred way to preserve data such as Grafana dashboards, Telegraf configuration or InfluxDB storage. Use these steps to create persistent directories and allow writing for Grafana:

> start shell user root
# mkdir -p /var/home/root/grafana/dashboards /var/home/root/telegraf /var/home/root/influxdb
# chmod -R 777 /var/home/root/grafana/

CLI 5: Prepare persistent directories

Create Configuration Files

InfluxDB will read an environment file when started. Prepare this file:

> start shell user root
# cat <<EOF > /var/home/root/influxdb/env
INFLUXDB_DB=telegraf
INFLUXDB_USER=telegraf
INFLUXDB_ADMIN_ENABLED=true
INFLUXDB_ADMIN_USER=admin
INFLUXDB_ADMIN_PASSWORD=lab123
EOF

CLI 6: Create InfluxDB environment file

Telegraf needs a configuration file, describing where to connect for streaming telemetry, where to write output data, and if any in-line data normalization is required. Replace 10.100.3.120 with the IP address from the re0:mgmt.-0 interface.

> start shell user root
# cat <<EOF > /var/home/root/telegraf/conf
[[outputs.influxdb]]
   urls = ["http://10.100.3.120:8086"]
   database = "telegraf"
   write_consistency = "any"
   timeout = "5s"
   username = "admin"
   password = "lab123"
[[inputs.jti_openconfig_telemetry]]
  servers = ["10.100.3.120:32767"]
  username = "user"
  password = "SECRET-PASSWORD"
  client_id = "telegraf"
  sample_frequency = "60000ms"
  sensors = [
      "60000ms bgp /network-instances/network-instance/protocols/protocol/bgp/neighbors/neighbor/state/session-state",
     "60000ms task /junos/task-memory-information/",
     "60000ms num_routes /bgp-rib/afi-safis/afi-safi/ipv4-unicast/neighbors/neighbor/adj-rib-out-pre/num-routes",
     "60000ms re0 /components/component[contains(name,'Engine0')]/properties/property[contains(name,'utilization')]/state/value/",
     "60000ms interfaces /interfaces/interface/"
  ]
str_as_tags = false
[[processors.converter]]
  [processors.converter.fields]
    float = ["/components/component/properties/property/state/value"]
EOF

CLI 7: Create Telegraf configuration file

As part of the inputs.jti_openconfig_telemetry plugin a username and a password are specified. Replace them with a valid user and password from your Junos Evolved configuration. Make sure that outputs.influxdb plugin has username and password as earlier defined in the InfluxDB configuration file.

Each specified sensor can have a different reporting rate specified in milliseconds, and a different measurement name in the database. The example above configures measurements called re0, bgp, num_routes, tasks, and interfaces.

Some data might require post-processing. Routing Engine statistics need to be converted into floating numbers, processors.converter is of help here.

Grafana does not require any specific configuration file and can be configured via the graphical user interface.

Start Docker Containers

It’s the right time to start the on-box TIG stack, as we’ve prepared all necessary configuration:

> start shell user root
# export DOCKER_HOST=unix:///run/docker-mgmt_junos.sock
 
# docker run \
-d --name grafana \
-v /var/home/root/grafana/:/var/lib/grafana/ \
-v /var/home/root/grafana/dashboards:/usr/share/grafana/conf/provisioning/dashboards/ \
-v /etc/localtime:/etc/localtime:ro \
--cap-add=NET_ADMIN \
--network=host \
--restart=always \
grafana/grafana:5.4.5

# docker run \
-d --name influxdb \
--env-file=/var/home/root/influxdb/env \
-v /etc/localtime:/etc/localtime:ro \
-v /var/home/root/influxdb:/var/lib/influxdb \
--cap-add=NET_ADMIN \
--network=host \
--restart=always \
influxdb:1.8.10

# docker run \
-d --name telegraf \
-v /var/home/root/telegraf/conf:/etc/telegraf/telegraf.conf:ro \
-v /etc/localtime:/etc/localtime:ro \
--cap-add=NET_ADMIN \
--network=host \
--restart=always \
telegraf

CLI 8: Docker start containers

To verify that every container runs:

# docker ps
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS               NAMES
8c105cb2a587        telegraf                "/entrypoint.sh tele…"   21 hours ago        Up 21 hours                             telegraf
332e790beb6c        influxdb:1.8.10         "/entrypoint.sh infl…"   9 days ago          Up 9 days                               influxdb
a40d11a0c2f0        grafana/grafana:5.4.5   "/run.sh"                9 days ago          Up 9 days                               Grafana

CLI 9: Docker list containers

To see the resource consumption:

# docker stats --no-stream
CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
8c105cb2a587        telegraf            1.20%               30.96MiB / 15.39GiB   0.20%               0B / 0B             0B / 0B             0
332e790beb6c        influxdb            0.15%               265.3MiB / 15.39GiB   1.68%               0B / 0B             0B / 0B             0
a40d11a0c2f0        grafana             0.02%               20.56MiB / 15.39GiB   0.13%               0B / 0B             0B / 0B             0

CLI 10: Docker stats

Work with a Graphical Interface

Grafana will listen on port 3000, and you can access it via HTTP:

http://11.254.253.7:3000/

(replace 11.254.253.7 with the management IP address of your Junos Evolved node)

Use username admin and password admin to log in for the first time.

Navigate to Configuration/Settings icon and add the InfluxDB data source. In this case we use URL: http://localhost:8086, database: “telegraf”, username: “admin”, password: “lab123”, exactly as we configured them earlier in the influxdb environment file.

Click “Save & Test”.

From the Home, hover with your mouse over the + sign on the left panel, and then choose “Import”:

In the open window, click “Or paste JSON” field, and paste JSON code from this page: https://raw.githubusercontent.com/a-elita/Grafana-dashboards/main/TIG-on-Evo-Simple-Metrics

Finally, you’ll be able to navigate to the “Metrics” dashboard and see similar graphs:

Figure 4: Metrics dashboard

Resources Management

To prevent oversubscription, resource limits for docker containers on Junos Evolved are configured in this file:

> start shell user root
# export DOCKER_HOST=unix:///run/docker-mgmt_junos.sock

# cat /etc/extensions/platform_attributes
## Edit to change upper cap of total resource limits for all containers.
## applies only to containers and does not apply to container runtimes.
## memory.memsw.limit_in_bytes = EXTENSIONS_MEMORY_MAX_MIB + EXTENSIONS_MEMORY_SWAP_MAX_MIB:-0
## check current defaults, after starting extensions-cglimits.service
## $ /usr/libexec/extensions/extensions-cglimits get
## please start extensions-cglimits.service to apply changes here
 
## device size limit will be ignored once extensionsfs device is created
#EXTENSIONS_FS_DEVICE_SIZE_MIB=
#EXTENSIONS_CPU_QUOTA_PERCENTAGE=
#EXTENSIONS_MEMORY_MAX_MIB=
#EXTENSIONS_MEMORY_SWAP_MAX_MIB=

CLI 11: Default docker resource limits

EXTENSIONS_FS_DEVICE_SIZE_MIB= is the maximum storage space in bytes that containers can use. The default value is 8 GB or 30% of the total size of /var, whichever is smaller.
EXTENSIONS_CPU_QUOTA_PERCENTAGE= is the maximum percentage of CPU usage that containers can use. The default value is 20% max CPU use across all cores.
EXTENSIONS_MEMORY_MAX_MIB= is the maximum amount of physical memory in bytes that containers can use. The default value is 2 GB or 10% of total physical memory, whichever is smaller.

If it’s required to change those values (do not increase them without a solid reason), then edit the file by uncommenting the required parameter, entering a new value, and saving the file. Then, apply the new settings on the fly:

> start shell user root
# export DOCKER_HOST=unix:///run/docker-mgmt_junos.sock

# systemctl restart extensions-cglimits.service

CLI 12: Apply new resource limits

InfluxDB is keeping received metrics for an infinite time, because we did not specify duration when creating the “telegraf” database.

It is suggested to limit the maximum retention policy to a shorter time somewhere between a few hours and a few weeks – depending on the amount of data written into the database. The shell output example below shows how to check the current retention policy and change it to a new value. The dropping of data outside of the new retention policy might take up to 30 minutes.

> start shell user root
# export DOCKER_HOST=unix:///run/docker-mgmt_junos.sock

# docker exec -it influxdb influx -database telegraf -execute "show retention policies"
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 0s       168h0m0s           1        true

# docker exec -it influxdb influx -database telegraf -execute "ALTER RETENTION POLICY autogen ON telegraf DURATION 168h  DEFAULT"

# docker exec -it influxdb influx -database telegraf -execute "show retention policies"
name    duration shardGroupDuration replicaN default
----    -------- ------------------ -------- -------
autogen 168h0m0s 168h0m0s           1        true

CLI 13: Change InfluxDB retention policy

Useful links

Running third-party applications on Junos Evolved: https://www.juniper.net/documentation/us/en/software/junos/overview-evo/topics/task/third-party-applications-deploying.html
Telegraf: https://www.influxdata.com/time-series-platform/telegraf/
InfluxDB: https://www.influxdata.com/products/influxdb-overview/
Grafana: https://grafana.com/

Glossary

RPD: Routing Process Daemon
JSD: Juniper Extension Toolkit (JET) Service Process
TIG: Telegraf, InfluxDB, Grafana
GNMI: gRPC (Remore Procedure Call) Network Management Interface
DB: database
CPU: central processing unit

Comments

If you want to reach out for comments, feedback or questions, drop us a mail at:

Revision History

Version	Date	Author(s)	Comments
1	July 2022	Anton Elita	Initial Publication

#Automation

Blog Viewer