SD-WAN

 View Only
last person joined: 3 days ago 

Ask questions and share experiences with SD-WAN and Session Smart Router (formerly 128T).
  • 1.  Difficulties standing up the 128T-monitoring POC

    Posted 04-26-2021 20:09

    I want to get an idea of the components behind the monitoring agent and am attempting to set up the POC environment using a variety of sources such as the 128T GitHub repo, the 128T Monitoring Agent doc, and a 128T blog from last year, but am having difficulty getting the agents to export data to the monitoring stack.

    Specifically, the plugin configuration section of the 128T Docs page indicates that I should be able to use the Conductor UI to select the desired inputs but I don't have "monitoring" under "authority" nor do I have it listed as an available plugin to add. Additionally, the command "configure authority monitoring" returns, "Command 'monitoring' not found". I'm running on version 4.5, and did see the note under the Installation section indicating requiring version 5.1.0 so realize this may not be applicable to me. The next next line down indicates that it can be installed manually so long as I'm higher than 4.1 however, which is what I've done.

    I've configured the agent configs to use the sample inputs (copied to the /var/lib/128t-monitoring/inputs directory) to ship them to my kafka output and the
    configs pass validation (monitoring-agent-cli validate), but when I start the daemons with monitoring-agent-cli configure they all fail within seconds:

    $ systemctl list-units 128T-telegraf*
    UNIT LOAD ACTIVE SUB DESCRIPTION
    128T-telegraf@router03_ZTP-t128_arp_state.service loaded failed failed 128T telegraf service for router03_ZTP/t128_arp_state
    128T-telegraf@router03_ZTP-t128_device_state.service loaded failed failed 128T telegraf service for router03_ZTP/t128_device_state
    128T-telegraf@router03_ZTP-t128_events.service loaded failed failed 128T telegraf service for router03_ZTP/t128_events
    128T-telegraf@router03_ZTP-t128_graphql.service loaded failed failed 128T telegraf service for router03_ZTP/t128_graphql
    128T-telegraf@router03_ZTP-t128_lte_metric.service loaded failed failed 128T telegraf service for router03_ZTP/t128_lte_metric
    128T-telegraf@router03_ZTP-t128_metrics.service loaded failed failed 128T telegraf service for router03_ZTP/t128_metrics
    128T-telegraf@router03_ZTP-t128_peer_path.service loaded failed failed 128T telegraf service for router03_ZTP/t128_peer_path
    128T-telegraf@router03_ZTP-t128_top_analytics.service loaded failed failed 128T telegraf service for router03_ZTP/t128_top_analytics

    I see traffic coming from the agent to my monitoring VM as the processes start up so they seem to be communicating initially, but then they fail. I can manually connect between the agent hosts & my monitor host using netcat on port 9202 however, so I don't believe it to be a connectivity issue. The Kibana dashboard is up, and Kafka is listening on 9092.

    I would be very welcome to any suggestions from anyone familiar with this POC environment!



    ------------------------------
    Chris Delaney
    Lynchburg VA
    ------------------------------


  • 2.  RE: Difficulties standing up the 128T-monitoring POC

     
    Posted 04-28-2021 11:19

    Hey Chris,

    The monitoring agent config can definitely be tricky to get right, but it sounds like you're 90% there!
    Try checking the journal for any clues as to why the service is failing. Something like:

    journalctl -fu 128T-telegraf@router03_ZTP-t128_metrics.service

    The contents of your config.yaml and any journal output from the services will be helpful for further troubleshooting.

    -Ryan



    ------------------------------
    Ryan Sitzman
    Systems Engineer
    WA
    ------------------------------



  • 3.  RE: Difficulties standing up the 128T-monitoring POC

    Posted 10-26-2022 13:54

    Thanks! The journalctl is reporting that it can't contact any Kafka brokers, however if I know the traffic is routing and if I kill the kafka container to free up the port I can successfully netcat on port 9092 between the two hosts.

    I'd say ~90% of my files are out-of-the-box from the POC repo at this point, but I believe the pertinent config files would be:

    Docker host
    [working dir]/kafka.env

    ADVERTISED_HOST=x.x.x.x   (private IP of host)

    running containers:

    # docker ps
    CONTAINER ID   IMAGE                                                 COMMAND                  CREATED        STATUS          PORTS     NAMES
    [redacted]   spotify/kafka                                         "supervisord -n"         44 hours ago   Up 1 second               kafka
    [redacted]   docker.elastic.co/kibana/kibana:7.5.2                 "/usr/local/bin/dumb…"   3 weeks ago    Up 30 minutes             kibana
    [redacted]   docker.elastic.co/elasticsearch/elasticsearch:7.5.2   "/usr/local/bin/dock…"   3 weeks ago    Up 30 minutes             elasticsearch
    [redacted]   docker.elastic.co/logstash/logstash:7.5.2             "/usr/local/bin/dock…"   3 weeks ago    Up 1 second               kafka-logstash


    listening ports:

    # netstat -ntlp
    Active Internet connections (only servers)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1037/sshd
    tcp        0      0 0.0.0.0:5601            0.0.0.0:*               LISTEN      1375/node
    tcp6       0      0 :::22                   :::*                    LISTEN      1037/sshd
    tcp6       0      0 :::9600                 :::*                    LISTEN      3007/java
    tcp6       0      0 :::9092                 :::*                    LISTEN      2348/java
    tcp6       0      0 :::2181                 :::*                    LISTEN      2022/java
    tcp6       0      0 :::41423                :::*                    LISTEN      2348/java
    tcp6       0      0 :::9200                 :::*                    LISTEN      1335/java
    tcp6       0      0 :::9300                 :::*                    LISTEN      1335/java


    Router
    /etc/128t-monitoring/config.yaml

    name: router03_ZTP
    enabled: true
    tags:
    - key: router
      value: ${ROUTER}
    sample-interval: 5
    push-interval: 10
    inputs:
    - name: t128_arp_state
    - name: t128_device_state
    - name: t128_events
    - name: t128_graphql
    - name: t128_lte_metric
    - name: t128_metrics
    - name: t128_peer_path
    - name: t128_top_analytics
    outputs:
    - name: kafka

    /var/lib/128t-monitoring/outputs/kafka.conf

    [[outputs.kafka]]
    ## URLs of kafka brokers
    brokers = ["x.x.x.x:9092"]  (private IP of docker host; matches IP of Docker host kafka.env)
    ## Kafka topic for producer messages
    topic = "telegraf"
    max_retry = 3
    data_format = "json"

    The journal entry for the service specified (all the rest are the same gist):

    Apr 28 17:56:49 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Starting Telegraf 1.17.4
    Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded inputs: t128_metrics
    Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded aggregators:
    Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded processors:
    Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Loaded outputs: kafka
    Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! Tags enabled: host=router03 router=Router03
    Apr 28 17:56:49 router03 telegraf[16886]: 2021-04-28T17:56:49Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
    Apr 28 17:56:50 router03 telegraf[16886]: 2021-04-28T17:56:50Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
    Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
    Apr 28 17:56:50 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
    Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
    Apr 28 17:56:50 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
    Apr 28 17:56:50 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:50 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Starting Telegraf 1.17.4
    Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded inputs: t128_metrics
    Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded aggregators:
    Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded processors:
    Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Loaded outputs: kafka
    Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! Tags enabled: host=router03 router=Router03
    Apr 28 17:56:50 router03 telegraf[17019]: 2021-04-28T17:56:50Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
    Apr 28 17:56:51 router03 telegraf[17019]: 2021-04-28T17:56:51Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
    Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
    Apr 28 17:56:51 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
    Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
    Apr 28 17:56:51 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
    Apr 28 17:56:51 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:51 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Starting Telegraf 1.17.4
    Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded inputs: t128_metrics
    Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded aggregators:
    Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded processors:
    Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Loaded outputs: kafka
    Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! Tags enabled: host=router03 router=Router03
    Apr 28 17:56:51 router03 telegraf[17087]: 2021-04-28T17:56:51Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
    Apr 28 17:56:52 router03 telegraf[17087]: 2021-04-28T17:56:52Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
    Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
    Apr 28 17:56:52 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
    Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
    Apr 28 17:56:52 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
    Apr 28 17:56:52 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:52 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Starting Telegraf 1.17.4
    Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded inputs: t128_metrics
    Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded aggregators:
    Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded processors:
    Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Loaded outputs: kafka
    Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! Tags enabled: host=router03 router=Router03
    Apr 28 17:56:52 router03 telegraf[17156]: 2021-04-28T17:56:52Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
    Apr 28 17:56:53 router03 telegraf[17156]: 2021-04-28T17:56:53Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
    Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
    Apr 28 17:56:53 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
    Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
    Apr 28 17:56:53 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
    Apr 28 17:56:53 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:53 router03 systemd[1]: Started 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Starting Telegraf 1.17.4
    Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded inputs: t128_metrics
    Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded aggregators:
    Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded processors:
    Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Loaded outputs: kafka
    Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! Tags enabled: host=router03 router=Router03
    Apr 28 17:56:53 router03 telegraf[17228]: 2021-04-28T17:56:53Z I! [agent] Config: Interval:5s, Quiet:false, Hostname:"router03", Flush Interval:10s
    Apr 28 17:56:54 router03 telegraf[17228]: 2021-04-28T17:56:54Z E! [telegraf] Error running agent: could not initialize output kafka: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
    Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service: main process exited, code=exited, status=1/FAILURE
    Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
    Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.
    Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service holdoff time over, scheduling restart.
    Apr 28 17:56:54 router03 systemd[1]: Stopped 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:54 router03 systemd[1]: start request repeated too quickly for 128T-telegraf@router03_ZTP-t128_metrics.service
    Apr 28 17:56:54 router03 systemd[1]: Failed to start 128T telegraf service for router03_ZTP/t128_metrics.
    Apr 28 17:56:54 router03 systemd[1]: Unit 128T-telegraf@router03_ZTP-t128_metrics.service entered failed state.
    Apr 28 17:56:54 router03 systemd[1]: 128T-telegraf@router03_ZTP-t128_metrics.service failed.


    Kafka is running, based on the fact that I can exec into the container and get data back:

    # /opt/kafka_2.11-0.10.1.0/bin/kafka-topics.sh --list --zookeeper localhost
    __consumer_offsets
    telegraf


    And I can create a session manually:

    I'm sure whatever I'm missing is something obvious, but I have no idea what it could be!!



    ------------------------------
    Chris Delaney
    Lynchburg VA
    ------------------------------



  • 4.  RE: Difficulties standing up the 128T-monitoring POC

     
    Posted 04-30-2021 10:28
    I agree, it looks like things on the router are configured correctly and you have connectivity to the kafka broker.

    I'm not super familiar with kafka, but could you be reaching a connection limit? You could try disabling all but one of the monitoring agent inputs and see if that improves service stability.

    You could also try increasing the push-interval in your config.yaml to a larger value, maybe 60 seconds. That should help keep the agents from hammering on it.

    ------------------------------
    Ryan Sitzman
    Systems Engineer
    WA
    ------------------------------



  • 5.  RE: Difficulties standing up the 128T-monitoring POC

    Posted 07-03-2021 12:16
    Hi, 

    have you solved the problem ?

    I have the same problem after upgrade router to 5.1.3. 
    ELK is able to receive log from routers running version 4.5.5 not 5.1.3

    thanks,
    Wayne



    ------------------------------
    Wayne Lee
    Network Engineer
    Hong Kong
    (852) 2138 9388
    ------------------------------



  • 6.  RE: Difficulties standing up the 128T-monitoring POC

     
    Posted 07-08-2021 08:00
    I checked the issue with Wayne and found that there was API compatibility issue between monitoring agent and kafka.
    A parameter "version" is required in output config file.
    The version of kafka in monitoring server is kafka_2.11-0.10.1.0, so the parameter should be version="0.10.1.0".
    I am not sure if this is correct solution but it works.

    Regards,
    Takuya


    ------------------------------
    Takuya Takahashi
    Systems Engineer
    (781) 328-0015
    ------------------------------



  • 7.  RE: Difficulties standing up the 128T-monitoring POC

     
    Posted 07-06-2021 10:19
    You said the PlugIn was not available to add? Are any plugin's showing up for you under PLUGINS | AVAILABLE?

    ------------------------------
    Dustin Goss
    System Engineer Tech Lead
    CO
    7202438599
    ------------------------------



  • 8.  RE: Difficulties standing up the 128T-monitoring POC

    Posted 02-01-2023 10:07
    Hi Dustin

    Plugins aren't available if there isn't a path to the internet (e.g. air-gaped networks).  Why the monitoring agent isn't included by default, I don't know... who doesn't want to log their data?  Is free, why make it harder to get?

    I too am trying to stand up the monitoring agent to send to ELK via kafka and while my problem is most certainly in logstash (the indexes are configured correctly for the dashboard provided in the 128T github).

    Patrick

    ------------------------------
    PATRICK KELLAHER
    ------------------------------