Difficulties standing up the 128T-monitoring POC

View Only

last person joined: 8 days ago

Ask questions and share experiences with SD-WAN and Session Smart Router (formerly 128T).

Back to discussions

Expand all | Collapse all

Difficulties standing up the 128T-monitoring POC

3. RE: Difficulties standing up the 128T-monitoring POC

Recommend

Chris

Posted 10-26-2022 13:54

| view attached (0)

Thanks! The journalctl is reporting that it can't contact any Kafka brokers, however if I know the traffic is routing and if I kill the kafka container to free up the port I can successfully netcat on port 9092 between the two hosts.

I'd say ~90% of my files are out-of-the-box from the POC repo at this point, but I believe the pertinent config files would be:

Docker host
[working dir]/kafka.env

ADVERTISED_HOST=x.x.x.x   (private IP of host)

running containers:

# docker ps
CONTAINER ID   IMAGE                                                 COMMAND                  CREATED        STATUS          PORTS     NAMES
[redacted]   spotify/kafka                                         "supervisord -n"         44 hours ago   Up 1 second               kafka
[redacted]   docker.elastic.co/kibana/kibana:7.5.2                 "/usr/local/bin/dumb…"   3 weeks ago    Up 30 minutes             kibana
[redacted]   docker.elastic.co/elasticsearch/elasticsearch:7.5.2   "/usr/local/bin/dock…"   3 weeks ago    Up 30 minutes             elasticsearch
[redacted]   docker.elastic.co/logstash/logstash:7.5.2             "/usr/local/bin/dock…"   3 weeks ago    Up 1 second               kafka-logstash

listening ports:

# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1037/sshd
tcp        0      0 0.0.0.0:5601            0.0.0.0:*               LISTEN      1375/node
tcp6       0      0 :::22                   :::*                    LISTEN      1037/sshd
tcp6       0      0 :::9600                 :::*                    LISTEN      3007/java
tcp6       0      0 :::9092                 :::*                    LISTEN      2348/java
tcp6       0      0 :::2181                 :::*                    LISTEN      2022/java
tcp6       0      0 :::41423                :::*                    LISTEN      2348/java
tcp6       0      0 :::9200                 :::*                    LISTEN      1335/java
tcp6       0      0 :::9300                 :::*                    LISTEN      1335/java

Router
/etc/128t-monitoring/config.yaml

name: router03_ZTP
enabled: true
tags:
- key: router
  value: ${ROUTER}
sample-interval: 5
push-interval: 10
inputs:
- name: t128_arp_state
- name: t128_device_state
- name: t128_events
- name: t128_graphql
- name: t128_lte_metric
- name: t128_metrics
- name: t128_peer_path
- name: t128_top_analytics
outputs:
- name: kafka

/var/lib/128t-monitoring/outputs/kafka.conf

[[outputs.kafka]]
## URLs of kafka brokers
brokers = ["x.x.x.x:9092"]  (private IP of docker host; matches IP of Docker host kafka.env)
## Kafka topic for producer messages
topic = "telegraf"
max_retry = 3
data_format = "json"

The journal entry for the service specified (all the rest are the same gist):

Apr 28 17:56:49 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Starting Telegraf 1.17.4
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded inputs: t128_metrics
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded aggregators:
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded processors:
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded outputs: kafka
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:50 router03 telegraf[16886]: 2021-04-28T17:56:50Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:50 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:50 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:50 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Starting Telegraf 1.17.4
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded inputs: t128_metrics
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded aggregators:
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded processors:
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded outputs: kafka
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:51 router03 telegraf[17019]: 2021-04-28T17:56:51Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:51 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:51 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:51 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Starting Telegraf 1.17.4
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded inputs: t128_metrics
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded aggregators:
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded processors:
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded outputs: kafka
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:52 router03 telegraf[17087]: 2021-04-28T17:56:52Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:52 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:52 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:52 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Starting Telegraf 1.17.4
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded inputs: t128_metrics
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded aggregators:
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded processors:
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded outputs: kafka
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:53 router03 telegraf[17156]: 2021-04-28T17:56:53Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:53 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:53 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:53 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Starting Telegraf 1.17.4
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded inputs: t128_metrics
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded aggregators:
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded processors:
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded outputs: kafka
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:54 router03 telegraf[17228]: 2021-04-28T17:56:54Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:54 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:54 router03 systemd[1]: start request repeated too quickly for 128T-telegraf@router03_ZTP-t128_metrics.service
Apr 28 17:56:54 router03 systemd[1]: Failed to start 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.

Kafka is running, based on the fact that I can exec into the container and get data back:

# /opt/kafka_2.11-0.10.1.0/bin/kafka-topics.sh --list --zookeeper localhost
__consumer_offsets
telegraf

And I can create a session manually:

I'm sure whatever I'm missing is something obvious, but I have no idea what it could be!!

------------------------------
Chris Delaney
Lynchburg VA
------------------------------

Original Message

Original Message:
Sent: 04-28-2021 11:18
From: Ryan Sitzman
Subject: Difficulties standing up the 128T-monitoring POC

Hey Chris,

The monitoring agent config can definitely be tricky to get right, but it sounds like you're 90% there!
Try checking the journal for any clues as to why the service is failing. Something like:

journalctl -fu 128T-telegraf@router03_ZTP-t128_metrics.service

The contents of your config.yaml and any journal output from the services will be helpful for further troubleshooting.

-Ryan

------------------------------
Ryan Sitzman
Systems Engineer
WA

Original Message:
Sent: 04-26-2021 20:08
From: Chris Delaney
Subject: Difficulties standing up the 128T-monitoring POC

I want to get an idea of the components behind the monitoring agent and am attempting to set up the POC environment using a variety of sources such as the 128T GitHub repo, the 128T Monitoring Agent doc, and a 128T blog from last year, but am having difficulty getting the agents to export data to the monitoring stack.

Specifically, the plugin configuration section of the 128T Docs page indicates that I should be able to use the Conductor UI to select the desired inputs but I don't have "monitoring" under "authority" nor do I have it listed as an available plugin to add. Additionally, the command "configure authority monitoring" returns, "Command 'monitoring' not found". I'm running on version 4.5, and did see the note under the Installation section indicating requiring version 5.1.0 so realize this may not be applicable to me. The next next line down indicates that it can be installed manually so long as I'm higher than 4.1 however, which is what I've done.

I've configured the agent configs to use the sample inputs (copied to the /var/lib/128t-monitoring/inputs directory) to ship them to my kafka output and the configs pass validation (monitoring-agent-cli validate), but when I start the daemons with monitoring-agent-cli configure they all fail within seconds:

$ systemctl list-units 128T-telegraf*
UNIT LOAD ACTIVE SUB DESCRIPTION
●

128T-telegraf@router03_ZTP-t128_arp_state.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_arp_state
●

128T-telegraf@router03_ZTP-t128_device_state.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_device_state
●

128T-telegraf@router03_ZTP-t128_events.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_events
●

128T-telegraf@router03_ZTP-t128_graphql.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_graphql
●

128T-telegraf@router03_ZTP-t128_lte_metric.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_lte_metric
●

128T-telegraf@router03_ZTP-t128_metrics.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_metrics
●

128T-telegraf@router03_ZTP-t128_peer_path.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_peer_path
●

128T-telegraf@router03_ZTP-t128_top_analytics.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_top_analytics

I see traffic coming from the agent to my monitoring VM as the processes start up so they seem to be communicating initially, but then they fail. I can manually connect between the agent hosts & my monitor host using netcat on port 9202 however, so I don't believe it to be a connectivity issue. The Kibana dashboard is up, and Kafka is listening on 9092.

I would be very welcome to any suggestions from anyone familiar with this POC environment!

------------------------------
Chris Delaney
Lynchburg VA
------------------------------

4. RE: Difficulties standing up the 128T-monitoring POC

Recommend

Ryan

Posted 04-30-2021 10:28

I agree, it looks like things on the router are configured correctly and you have connectivity to the kafka broker.

I'm not super familiar with kafka, but could you be reaching a connection limit? You could try disabling all but one of the monitoring agent inputs and see if that improves service stability.

You could also try increasing the push-interval in your config.yaml to a larger value, maybe 60 seconds. That should help keep the agents from hammering on it.

------------------------------
Ryan Sitzman
Systems Engineer
WA
------------------------------

Original Message

Original Message:
Sent: 04-28-2021 14:32
From: Chris Delaney
Subject: Difficulties standing up the 128T-monitoring POC

Thanks! The journalctl is reporting that it can't contact any Kafka brokers, however if I know the traffic is routing and if I kill the kafka container to free up the port I can successfully netcat on port 9092 between the two hosts.

I'd say ~90% of my files are out-of-the-box from the POC repo at this point, but I believe the pertinent config files would be:

Docker host
[working dir]/kafka.env

ADVERTISED_HOST=x.x.x.x   (private IP of host)

running containers:

# docker ps
CONTAINER ID   IMAGE                                                 COMMAND                  CREATED        STATUS          PORTS     NAMES
[redacted]   spotify/kafka                                         "supervisord -n"         44 hours ago   Up 1 second               kafka
[redacted]   docker.elastic.co/kibana/kibana:7.5.2                 "/usr/local/bin/dumb…"   3 weeks ago    Up 30 minutes             kibana
[redacted]   docker.elastic.co/elasticsearch/elasticsearch:7.5.2   "/usr/local/bin/dock…"   3 weeks ago    Up 30 minutes             elasticsearch
[redacted]   docker.elastic.co/logstash/logstash:7.5.2             "/usr/local/bin/dock…"   3 weeks ago    Up 1 second               kafka-logstash

listening ports:

# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1037/sshd
tcp        0      0 0.0.0.0:5601            0.0.0.0:*               LISTEN      1375/node
tcp6       0      0 :::22                   :::*                    LISTEN      1037/sshd
tcp6       0      0 :::9600                 :::*                    LISTEN      3007/java
tcp6       0      0 :::9092                 :::*                    LISTEN      2348/java
tcp6       0      0 :::2181                 :::*                    LISTEN      2022/java
tcp6       0      0 :::41423                :::*                    LISTEN      2348/java
tcp6       0      0 :::9200                 :::*                    LISTEN      1335/java
tcp6       0      0 :::9300                 :::*                    LISTEN      1335/java

Router
/etc/128t-monitoring/config.yaml

name: router03_ZTP
enabled: true
tags:
- key: router
  value: ${ROUTER}
sample-interval: 5
push-interval: 10
inputs:
- name: t128_arp_state
- name: t128_device_state
- name: t128_events
- name: t128_graphql
- name: t128_lte_metric
- name: t128_metrics
- name: t128_peer_path
- name: t128_top_analytics
outputs:
- name: kafka

/var/lib/128t-monitoring/outputs/kafka.conf

[[outputs.kafka]]
## URLs of kafka brokers
brokers = ["x.x.x.x:9092"]  (private IP of docker host; matches IP of Docker host kafka.env)
## Kafka topic for producer messages
topic = "telegraf"
max_retry = 3
data_format = "json"

The journal entry for the service specified (all the rest are the same gist):

Apr 28 17:56:49 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Starting Telegraf 1.17.4
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded inputs: t128_metrics
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded aggregators:
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded processors:
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded outputs: kafka
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:50 router03 telegraf[16886]: 2021-04-28T17:56:50Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:50 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:50 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:50 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Starting Telegraf 1.17.4
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded inputs: t128_metrics
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded aggregators:
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded processors:
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded outputs: kafka
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:51 router03 telegraf[17019]: 2021-04-28T17:56:51Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:51 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:51 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:51 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Starting Telegraf 1.17.4
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded inputs: t128_metrics
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded aggregators:
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded processors:
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded outputs: kafka
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:52 router03 telegraf[17087]: 2021-04-28T17:56:52Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:52 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:52 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:52 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Starting Telegraf 1.17.4
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded inputs: t128_metrics
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded aggregators:
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded processors:
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded outputs: kafka
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:53 router03 telegraf[17156]: 2021-04-28T17:56:53Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:53 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:53 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:53 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Starting Telegraf 1.17.4
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded inputs: t128_metrics
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded aggregators:
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded processors:
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded outputs: kafka
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:54 router03 telegraf[17228]: 2021-04-28T17:56:54Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:54 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:54 router03 systemd[1]: start request repeated too quickly for 128T-telegraf@router03_ZTP-t128_metrics.service
Apr 28 17:56:54 router03 systemd[1]: Failed to start 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.

Kafka is running, based on the fact that I can exec into the container and get data back:

# /opt/kafka_2.11-0.10.1.0/bin/kafka-topics.sh --list --zookeeper localhost
__consumer_offsets
telegraf

And I can create a session manually:

I'm sure whatever I'm missing is something obvious, but I have no idea what it could be!!

------------------------------
Chris Delaney
Lynchburg VA

Original Message:
Sent: 04-28-2021 11:18
From: Ryan Sitzman
Subject: Difficulties standing up the 128T-monitoring POC

Hey Chris,

The monitoring agent config can definitely be tricky to get right, but it sounds like you're 90% there!
Try checking the journal for any clues as to why the service is failing. Something like:

journalctl -fu 128T-telegraf@router03_ZTP-t128_metrics.service

The contents of your config.yaml and any journal output from the services will be helpful for further troubleshooting.

-Ryan

------------------------------
Ryan Sitzman
Systems Engineer
WA

Original Message:
Sent: 04-26-2021 20:08
From: Chris Delaney
Subject: Difficulties standing up the 128T-monitoring POC

I want to get an idea of the components behind the monitoring agent and am attempting to set up the POC environment using a variety of sources such as the 128T GitHub repo, the 128T Monitoring Agent doc, and a 128T blog from last year, but am having difficulty getting the agents to export data to the monitoring stack.

Specifically, the plugin configuration section of the 128T Docs page indicates that I should be able to use the Conductor UI to select the desired inputs but I don't have "monitoring" under "authority" nor do I have it listed as an available plugin to add. Additionally, the command "configure authority monitoring" returns, "Command 'monitoring' not found". I'm running on version 4.5, and did see the note under the Installation section indicating requiring version 5.1.0 so realize this may not be applicable to me. The next next line down indicates that it can be installed manually so long as I'm higher than 4.1 however, which is what I've done.

I've configured the agent configs to use the sample inputs (copied to the /var/lib/128t-monitoring/inputs directory) to ship them to my kafka output and the configs pass validation (monitoring-agent-cli validate), but when I start the daemons with monitoring-agent-cli configure they all fail within seconds:

$ systemctl list-units 128T-telegraf*
UNIT LOAD ACTIVE SUB DESCRIPTION
●

128T-telegraf@router03_ZTP-t128_arp_state.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_arp_state
●

128T-telegraf@router03_ZTP-t128_device_state.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_device_state
●

128T-telegraf@router03_ZTP-t128_events.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_events
●

128T-telegraf@router03_ZTP-t128_graphql.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_graphql
●

128T-telegraf@router03_ZTP-t128_lte_metric.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_lte_metric
●

128T-telegraf@router03_ZTP-t128_metrics.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_metrics
●

128T-telegraf@router03_ZTP-t128_peer_path.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_peer_path
●

128T-telegraf@router03_ZTP-t128_top_analytics.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_top_analytics

I see traffic coming from the agent to my monitoring VM as the processes start up so they seem to be communicating initially, but then they fail. I can manually connect between the agent hosts & my monitor host using netcat on port 9202 however, so I don't believe it to be a connectivity issue. The Kibana dashboard is up, and Kafka is listening on 9092.

I would be very welcome to any suggestions from anyone familiar with this POC environment!

------------------------------
Chris Delaney
Lynchburg VA
------------------------------

5. RE: Difficulties standing up the 128T-monitoring POC

Recommend

Wayne

Posted 07-03-2021 12:16

Hi,

have you solved the problem ?

I have the same problem after upgrade router to 5.1.3.
ELK is able to receive log from routers running version 4.5.5 not 5.1.3

thanks,
Wayne

------------------------------
Wayne Lee
Network Engineer
Hong Kong
(852) 2138 9388
------------------------------

Original Message

Original Message:
Sent: 04-30-2021 10:28
From: Ryan Sitzman
Subject: Difficulties standing up the 128T-monitoring POC

I agree, it looks like things on the router are configured correctly and you have connectivity to the kafka broker.

I'm not super familiar with kafka, but could you be reaching a connection limit? You could try disabling all but one of the monitoring agent inputs and see if that improves service stability.

You could also try increasing the push-interval in your config.yaml to a larger value, maybe 60 seconds. That should help keep the agents from hammering on it.

------------------------------
Ryan Sitzman
Systems Engineer
WA

Original Message:
Sent: 04-28-2021 14:32
From: Chris Delaney
Subject: Difficulties standing up the 128T-monitoring POC

Thanks! The journalctl is reporting that it can't contact any Kafka brokers, however if I know the traffic is routing and if I kill the kafka container to free up the port I can successfully netcat on port 9092 between the two hosts.

I'd say ~90% of my files are out-of-the-box from the POC repo at this point, but I believe the pertinent config files would be:

Docker host
[working dir]/kafka.env

ADVERTISED_HOST=x.x.x.x   (private IP of host)

running containers:

# docker ps
CONTAINER ID   IMAGE                                                 COMMAND                  CREATED        STATUS          PORTS     NAMES
[redacted]   spotify/kafka                                         "supervisord -n"         44 hours ago   Up 1 second               kafka
[redacted]   docker.elastic.co/kibana/kibana:7.5.2                 "/usr/local/bin/dumb…"   3 weeks ago    Up 30 minutes             kibana
[redacted]   docker.elastic.co/elasticsearch/elasticsearch:7.5.2   "/usr/local/bin/dock…"   3 weeks ago    Up 30 minutes             elasticsearch
[redacted]   docker.elastic.co/logstash/logstash:7.5.2             "/usr/local/bin/dock…"   3 weeks ago    Up 1 second               kafka-logstash

listening ports:

# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1037/sshd
tcp        0      0 0.0.0.0:5601            0.0.0.0:*               LISTEN      1375/node
tcp6       0      0 :::22                   :::*                    LISTEN      1037/sshd
tcp6       0      0 :::9600                 :::*                    LISTEN      3007/java
tcp6       0      0 :::9092                 :::*                    LISTEN      2348/java
tcp6       0      0 :::2181                 :::*                    LISTEN      2022/java
tcp6       0      0 :::41423                :::*                    LISTEN      2348/java
tcp6       0      0 :::9200                 :::*                    LISTEN      1335/java
tcp6       0      0 :::9300                 :::*                    LISTEN      1335/java

Router
/etc/128t-monitoring/config.yaml

name: router03_ZTP
enabled: true
tags:
- key: router
  value: ${ROUTER}
sample-interval: 5
push-interval: 10
inputs:
- name: t128_arp_state
- name: t128_device_state
- name: t128_events
- name: t128_graphql
- name: t128_lte_metric
- name: t128_metrics
- name: t128_peer_path
- name: t128_top_analytics
outputs:
- name: kafka

/var/lib/128t-monitoring/outputs/kafka.conf

[[outputs.kafka]]
## URLs of kafka brokers
brokers = ["x.x.x.x:9092"]  (private IP of docker host; matches IP of Docker host kafka.env)
## Kafka topic for producer messages
topic = "telegraf"
max_retry = 3
data_format = "json"

The journal entry for the service specified (all the rest are the same gist):

Apr 28 17:56:49 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Starting Telegraf 1.17.4
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded inputs: t128_metrics
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded aggregators:
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded processors:
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded outputs: kafka
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:50 router03 telegraf[16886]: 2021-04-28T17:56:50Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:50 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:50 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:50 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Starting Telegraf 1.17.4
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded inputs: t128_metrics
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded aggregators:
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded processors:
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded outputs: kafka
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:51 router03 telegraf[17019]: 2021-04-28T17:56:51Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:51 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:51 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:51 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Starting Telegraf 1.17.4
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded inputs: t128_metrics
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded aggregators:
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded processors:
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded outputs: kafka
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:52 router03 telegraf[17087]: 2021-04-28T17:56:52Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:52 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:52 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:52 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Starting Telegraf 1.17.4
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded inputs: t128_metrics
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded aggregators:
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded processors:
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded outputs: kafka
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:53 router03 telegraf[17156]: 2021-04-28T17:56:53Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:53 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:53 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:53 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Starting Telegraf 1.17.4
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded inputs: t128_metrics
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded aggregators:
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded processors:
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded outputs: kafka
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:54 router03 telegraf[17228]: 2021-04-28T17:56:54Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:54 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:54 router03 systemd[1]: start request repeated too quickly for 128T-telegraf@router03_ZTP-t128_metrics.service
Apr 28 17:56:54 router03 systemd[1]: Failed to start 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.

Kafka is running, based on the fact that I can exec into the container and get data back:

# /opt/kafka_2.11-0.10.1.0/bin/kafka-topics.sh --list --zookeeper localhost
__consumer_offsets
telegraf

And I can create a session manually:

I'm sure whatever I'm missing is something obvious, but I have no idea what it could be!!

------------------------------
Chris Delaney
Lynchburg VA

Original Message:
Sent: 04-28-2021 11:18
From: Ryan Sitzman
Subject: Difficulties standing up the 128T-monitoring POC

Hey Chris,

The monitoring agent config can definitely be tricky to get right, but it sounds like you're 90% there!
Try checking the journal for any clues as to why the service is failing. Something like:

journalctl -fu 128T-telegraf@router03_ZTP-t128_metrics.service

The contents of your config.yaml and any journal output from the services will be helpful for further troubleshooting.

-Ryan

------------------------------
Ryan Sitzman
Systems Engineer
WA

Original Message:
Sent: 04-26-2021 20:08
From: Chris Delaney
Subject: Difficulties standing up the 128T-monitoring POC

I want to get an idea of the components behind the monitoring agent and am attempting to set up the POC environment using a variety of sources such as the 128T GitHub repo, the 128T Monitoring Agent doc, and a 128T blog from last year, but am having difficulty getting the agents to export data to the monitoring stack.

Specifically, the plugin configuration section of the 128T Docs page indicates that I should be able to use the Conductor UI to select the desired inputs but I don't have "monitoring" under "authority" nor do I have it listed as an available plugin to add. Additionally, the command "configure authority monitoring" returns, "Command 'monitoring' not found". I'm running on version 4.5, and did see the note under the Installation section indicating requiring version 5.1.0 so realize this may not be applicable to me. The next next line down indicates that it can be installed manually so long as I'm higher than 4.1 however, which is what I've done.

I've configured the agent configs to use the sample inputs (copied to the /var/lib/128t-monitoring/inputs directory) to ship them to my kafka output and the configs pass validation (monitoring-agent-cli validate), but when I start the daemons with monitoring-agent-cli configure they all fail within seconds:

$ systemctl list-units 128T-telegraf*
UNIT LOAD ACTIVE SUB DESCRIPTION
●

128T-telegraf@router03_ZTP-t128_arp_state.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_arp_state
●

128T-telegraf@router03_ZTP-t128_device_state.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_device_state
●

128T-telegraf@router03_ZTP-t128_events.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_events
●

128T-telegraf@router03_ZTP-t128_graphql.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_graphql
●

128T-telegraf@router03_ZTP-t128_lte_metric.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_lte_metric
●

128T-telegraf@router03_ZTP-t128_metrics.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_metrics
●

128T-telegraf@router03_ZTP-t128_peer_path.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_peer_path
●

128T-telegraf@router03_ZTP-t128_top_analytics.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_top_analytics

I see traffic coming from the agent to my monitoring VM as the processes start up so they seem to be communicating initially, but then they fail. I can manually connect between the agent hosts & my monitor host using netcat on port 9202 however, so I don't believe it to be a connectivity issue. The Kibana dashboard is up, and Kafka is listening on 9092.

I would be very welcome to any suggestions from anyone familiar with this POC environment!

------------------------------
Chris Delaney
Lynchburg VA
------------------------------

6. RE: Difficulties standing up the 128T-monitoring POC

Recommend

Takuya

Posted 07-08-2021 08:00

I checked the issue with Wayne and found that there was API compatibility issue between monitoring agent and kafka.
A parameter "version" is required in output config file.
The version of kafka in monitoring server is kafka_2.11-0.10.1.0, so the parameter should be version="0.10.1.0".
I am not sure if this is correct solution but it works.

Regards,
Takuya

------------------------------
Takuya Takahashi
Systems Engineer
(781) 328-0015
------------------------------

Original Message

Original Message:
Sent: 07-03-2021 12:16
From: Wayne Lee
Subject: Difficulties standing up the 128T-monitoring POC

Hi,

have you solved the problem ?

I have the same problem after upgrade router to 5.1.3.
ELK is able to receive log from routers running version 4.5.5 not 5.1.3

thanks,
Wayne

------------------------------
Wayne Lee
Network Engineer
Hong Kong
(852) 2138 9388

Original Message:
Sent: 04-30-2021 10:28
From: Ryan Sitzman
Subject: Difficulties standing up the 128T-monitoring POC

I agree, it looks like things on the router are configured correctly and you have connectivity to the kafka broker.

I'm not super familiar with kafka, but could you be reaching a connection limit? You could try disabling all but one of the monitoring agent inputs and see if that improves service stability.

You could also try increasing the push-interval in your config.yaml to a larger value, maybe 60 seconds. That should help keep the agents from hammering on it.

------------------------------
Ryan Sitzman
Systems Engineer
WA

Original Message:
Sent: 04-28-2021 14:32
From: Chris Delaney
Subject: Difficulties standing up the 128T-monitoring POC

Thanks! The journalctl is reporting that it can't contact any Kafka brokers, however if I know the traffic is routing and if I kill the kafka container to free up the port I can successfully netcat on port 9092 between the two hosts.

I'd say ~90% of my files are out-of-the-box from the POC repo at this point, but I believe the pertinent config files would be:

Docker host
[working dir]/kafka.env

ADVERTISED_HOST=x.x.x.x   (private IP of host)

running containers:

# docker ps
CONTAINER ID   IMAGE                                                 COMMAND                  CREATED        STATUS          PORTS     NAMES
[redacted]   spotify/kafka                                         "supervisord -n"         44 hours ago   Up 1 second               kafka
[redacted]   docker.elastic.co/kibana/kibana:7.5.2                 "/usr/local/bin/dumb…"   3 weeks ago    Up 30 minutes             kibana
[redacted]   docker.elastic.co/elasticsearch/elasticsearch:7.5.2   "/usr/local/bin/dock…"   3 weeks ago    Up 30 minutes             elasticsearch
[redacted]   docker.elastic.co/logstash/logstash:7.5.2             "/usr/local/bin/dock…"   3 weeks ago    Up 1 second               kafka-logstash

listening ports:

# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1037/sshd
tcp        0      0 0.0.0.0:5601            0.0.0.0:*               LISTEN      1375/node
tcp6       0      0 :::22                   :::*                    LISTEN      1037/sshd
tcp6       0      0 :::9600                 :::*                    LISTEN      3007/java
tcp6       0      0 :::9092                 :::*                    LISTEN      2348/java
tcp6       0      0 :::2181                 :::*                    LISTEN      2022/java
tcp6       0      0 :::41423                :::*                    LISTEN      2348/java
tcp6       0      0 :::9200                 :::*                    LISTEN      1335/java
tcp6       0      0 :::9300                 :::*                    LISTEN      1335/java

Router
/etc/128t-monitoring/config.yaml

name: router03_ZTP
enabled: true
tags:
- key: router
  value: ${ROUTER}
sample-interval: 5
push-interval: 10
inputs:
- name: t128_arp_state
- name: t128_device_state
- name: t128_events
- name: t128_graphql
- name: t128_lte_metric
- name: t128_metrics
- name: t128_peer_path
- name: t128_top_analytics
outputs:
- name: kafka

/var/lib/128t-monitoring/outputs/kafka.conf

[[outputs.kafka]]
## URLs of kafka brokers
brokers = ["x.x.x.x:9092"]  (private IP of docker host; matches IP of Docker host kafka.env)
## Kafka topic for producer messages
topic = "telegraf"
max_retry = 3
data_format = "json"

The journal entry for the service specified (all the rest are the same gist):

Apr 28 17:56:49 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Starting Telegraf 1.17.4
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded inputs: t128_metrics
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded aggregators:
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded processors:
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded outputs: kafka
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:50 router03 telegraf[16886]: 2021-04-28T17:56:50Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:50 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:50 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:50 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Starting Telegraf 1.17.4
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded inputs: t128_metrics
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded aggregators:
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded processors:
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded outputs: kafka
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:51 router03 telegraf[17019]: 2021-04-28T17:56:51Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:51 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:51 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:51 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Starting Telegraf 1.17.4
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded inputs: t128_metrics
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded aggregators:
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded processors:
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded outputs: kafka
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:52 router03 telegraf[17087]: 2021-04-28T17:56:52Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:52 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:52 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:52 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Starting Telegraf 1.17.4
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded inputs: t128_metrics
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded aggregators:
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded processors:
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded outputs: kafka
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:53 router03 telegraf[17156]: 2021-04-28T17:56:53Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:53 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:53 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:53 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Starting Telegraf 1.17.4
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded inputs: t128_metrics
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded aggregators:
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded processors:
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded outputs: kafka
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Tags enabled: host=router03 router=Router03
Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
Apr 28 17:56:54 router03 telegraf[17228]: 2021-04-28T17:56:54Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
Apr 28 17:56:54 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:54 router03 systemd[1]: start request repeated too quickly for 128T-telegraf@router03_ZTP-t128_metrics.service
Apr 28 17:56:54 router03 systemd[1]: Failed to start 128T telegraf service for router03_ZTP/t128_metrics.
Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.

Kafka is running, based on the fact that I can exec into the container and get data back:

# /opt/kafka_2.11-0.10.1.0/bin/kafka-topics.sh --list --zookeeper localhost
__consumer_offsets
telegraf

And I can create a session manually:

I'm sure whatever I'm missing is something obvious, but I have no idea what it could be!!

------------------------------
Chris Delaney
Lynchburg VA

Original Message:
Sent: 04-28-2021 11:18
From: Ryan Sitzman
Subject: Difficulties standing up the 128T-monitoring POC

Hey Chris,

The monitoring agent config can definitely be tricky to get right, but it sounds like you're 90% there!
Try checking the journal for any clues as to why the service is failing. Something like:

journalctl -fu 128T-telegraf@router03_ZTP-t128_metrics.service

The contents of your config.yaml and any journal output from the services will be helpful for further troubleshooting.

-Ryan

------------------------------
Ryan Sitzman
Systems Engineer
WA

Original Message:
Sent: 04-26-2021 20:08
From: Chris Delaney
Subject: Difficulties standing up the 128T-monitoring POC

I want to get an idea of the components behind the monitoring agent and am attempting to set up the POC environment using a variety of sources such as the 128T GitHub repo, the 128T Monitoring Agent doc, and a 128T blog from last year, but am having difficulty getting the agents to export data to the monitoring stack.

Specifically, the plugin configuration section of the 128T Docs page indicates that I should be able to use the Conductor UI to select the desired inputs but I don't have "monitoring" under "authority" nor do I have it listed as an available plugin to add. Additionally, the command "configure authority monitoring" returns, "Command 'monitoring' not found". I'm running on version 4.5, and did see the note under the Installation section indicating requiring version 5.1.0 so realize this may not be applicable to me. The next next line down indicates that it can be installed manually so long as I'm higher than 4.1 however, which is what I've done.

I've configured the agent configs to use the sample inputs (copied to the /var/lib/128t-monitoring/inputs directory) to ship them to my kafka output and the configs pass validation (monitoring-agent-cli validate), but when I start the daemons with monitoring-agent-cli configure they all fail within seconds:

$ systemctl list-units 128T-telegraf*
UNIT LOAD ACTIVE SUB DESCRIPTION
●

128T-telegraf@router03_ZTP-t128_arp_state.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_arp_state
●

128T-telegraf@router03_ZTP-t128_device_state.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_device_state
●

128T-telegraf@router03_ZTP-t128_events.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_events
●

128T-telegraf@router03_ZTP-t128_graphql.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_graphql
●

128T-telegraf@router03_ZTP-t128_lte_metric.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_lte_metric
●

128T-telegraf@router03_ZTP-t128_metrics.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_metrics
●

128T-telegraf@router03_ZTP-t128_peer_path.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_peer_path
●

128T-telegraf@router03_ZTP-t128_top_analytics.service

 loaded failed failed 128T telegraf service for router03_ZTP/t128_top_analytics

I see traffic coming from the agent to my monitoring VM as the processes start up so they seem to be communicating initially, but then they fail. I can manually connect between the agent hosts & my monitor host using netcat on port 9202 however, so I don't believe it to be a connectivity issue. The Kibana dashboard is up, and Kafka is listening on 9092.

I would be very welcome to any suggestions from anyone familiar with this POC environment!

------------------------------
Chris Delaney
Lynchburg VA
------------------------------

7. RE: Difficulties standing up the 128T-monitoring POC

0 Recommend
Dustin
Posted 07-06-2021 10:19

Reply Reply Privately
You said the PlugIn was not available to add? Are any plugin's showing up for you under PLUGINS | AVAILABLE?

------------------------------
Dustin Goss
System Engineer Tech Lead
CO
7202438599
------------------------------

Original Message
8. RE: Difficulties standing up the 128T-monitoring POC

0 Recommend
PATRICK KELLAHER
Posted 02-01-2023 10:07

Reply Reply Privately
Hi Dustin

Plugins aren't available if there isn't a path to the internet (e.g. air-gaped networks). Why the monitoring agent isn't included by default, I don't know... who doesn't want to log their data? Is free, why make it harder to get?

I too am trying to stand up the monitoring agent to send to ELK via kafka and while my problem is most certainly in logstash (the indexes are configured correctly for the dashboard provided in the 128T github).

Patrick

------------------------------
PATRICK KELLAHER
------------------------------

Original Message

SD-WAN

Difficulties standing up the 128T-monitoring POC

Chris04-26-2021 20:09

Ryan04-28-2021 11:19

Chris10-26-2022 13:54

Ryan04-30-2021 10:28

Wayne07-03-2021 12:16

Takuya07-08-2021 08:00

Dustin07-06-2021 10:19

PATRICK KELLAHER02-01-2023 10:07

1. Difficulties standing up the 128T-monitoring POC

2. RE: Difficulties standing up the 128T-monitoring POC

3. RE: Difficulties standing up the 128T-monitoring POC

4. RE: Difficulties standing up the 128T-monitoring POC

5. RE: Difficulties standing up the 128T-monitoring POC

6. RE: Difficulties standing up the 128T-monitoring POC

7. RE: Difficulties standing up the 128T-monitoring POC

8. RE: Difficulties standing up the 128T-monitoring POC