Streaming vEOS Telemetry

One thing I wanted to get my hands dirty is telemetry. I found this blog post from Arista and I have tried to use it for vEOS.

As per the blog, I had to install go and docker in my VM (debian) on GCP.

Installed docker via aptitude:

# aptitude install docker.io
# aptitude install docker-compose
# aptitude install telnet

Installed golang via updating .bashrc

########################
# Go configuration
########################
#
# git clone -b v0.0.4 https://github.com/wfarr/goenv.git $HOME/.goenv
if [ ! -d "$HOME/.goenv" ]; then
    git clone https://github.com/syndbg/goenv.git $HOME/.goenv
fi
#export GOPATH="$HOME/storage/golang/go"
#export GOBIN="$HOME/storage/golang/go/bin"
#export PATH="$GOPATH/bin:$PATH"
if [ -d "$HOME/.goenv"   ]
then
    export GOENV_ROOT="$HOME/.goenv"
    export PATH="$GOENV_ROOT/bin:$PATH"
    if  type "goenv" &> /dev/null; then
        eval "$(goenv init -)"
        # Add the version to my prompt
        __goversion (){
            if  type "goenv" &> /dev/null; then
                goenv_go_version=$(goenv version | sed -e 's/ .*//')
                printf $goenv_go_version
            fi
        }
        export PS1="go:\$(__goversion)|$PS1"
        export PATH="$GOROOT/bin:$PATH"
        export PATH="$PATH:$GOPATH/bin"
    fi
fi

################## End GoLang #####################

Then started a new bash session to trigger the installation of goenv and then install a go version

$ goenv install 1.14.6
$ goenv global 1.14.6

Now we can start the docker containers for influx and grafana:

mkdir telemetry
cd telemetry
mkdir influxdb_data
mkdir grafana_data

cat docker-compose.yaml 
version: "3"

services:
 influxdb:
   container_name: influxdb-tele
   environment:
     INFLUXDB_DB: grpc
     INFLUXDB_ADMIN_USER: "admin"
     INFLUXDB_ADMIN_PASSWORD: "arista"
     INFLUXDB_USER: tac
     INFLUXDB_USER_PASSWORD: arista
     INFLUXDB_RETENTION_ENABLED: "false"
     INFLUXDB_OPENTSDB_0_ENABLED: "true"
     INFLUXDB_OPENTSDB_BIND_ADDRESS: ":4242"
     INFLUXDB_OPENTSDB_DATABASE: "grpc"
   ports:
     - '8086:8086'
     - '4242:4242'
     - '8083:8083'
   networks:
     - monitoring
   volumes:
     - influxdb_data:/var/lib/influxdb
   command:
     - '-config'
     - '/etc/influxdb/influxdb.conf'
   image: influxdb:latest
   restart: always

 grafana:
   container_name: grafana-tele
   environment:
     GF_SECURITY_ADMIN_USER: admin
     GF_SECURITY_ADMIN_PASSWORD: arista
   ports:
     - '3000:3000'
   networks:
     - monitoring
   volumes:
     - grafana_data:/var/lib/grafana
   image: grafana/grafana
   restart: always

networks:
 monitoring:

volumes:
 influxdb_data: {}
 grafana_data: {}

Now we should be able to start the containers:

sudo docker-compose up -d    // start containers
sudo docker-compose down -v  // for stopping containers

sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cad339ead2ee influxdb:latest "/entrypoint.sh -con…" 5 hours ago Up 5 hours 0.0.0.0:4242->4242/tcp, 0.0.0.0:8083->8083/tcp, 0.0.0.0:8086->8086/tcp influxdb-tele
ef88acc47ee3 grafana/grafana "/run.sh" 5 hours ago Up 5 hours 0.0.0.0:3000->3000/tcp grafana-tele

sudo docker network ls
NETWORK ID NAME DRIVER SCOPE
fe19e7876636 bridge bridge local
0a8770578f3f host host local
6e128a7682f1 none null local
3d27d0ed3ab3 telemetry_monitoring bridge local

sudo docker network inspect telemetry_monitoring
[
{
"Name": "telemetry_monitoring",
"Id": "3d27d0ed3ab3b0530441206a128d849434a540f8e5a2c109ee368b01052ed418",
"Created": "2020-08-12T11:22:03.05783331Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.18.0.0/16",
"Gateway": "172.18.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"cad339ead2eeb0b479bd6aa024cb2150fb1643a0a4a59e7729bb5ddf088eba19": {
"Name": "influxdb-tele",
"EndpointID": "e3c7f853766ed8afe6617c8fac358b3302de41f8aeab53d429ffd1a28b6df668",
"MacAddress": "02:42:ac:12:00:03",
"IPv4Address": "172.18.0.3/16",
"IPv6Address": ""
},
"ef88acc47ee30667768c5af9bbd70b95903d3690c4d80b83ba774b298665d15d": {
"Name": "grafana-tele",
"EndpointID": "3fe2b424cbb66a93e9e06f4bcc2e7353a0b40b2d56777c8fee8726c96c97229a",
"MacAddress": "02:42:ac:12:00:02",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"com.docker.compose.network": "monitoring",
"com.docker.compose.project": "telemetry"
}
}
]

Now we have to generate the octsdb binary and copy it to the switches as per instructions

$ go get github.com/aristanetworks/goarista/cmd/octsdb
$ cd $GOPATH/src/github.com/aristanetworks/goarista/cmd/octsdb

$ GOOS=linux GOARCH=386 go build   // I used this one
$ GOOS=linux GOARCH=amd64 go build // if you use EOS 64 Bits

$ scp octsdb user@SWITCH_IP:/mnt/flash/

An important thing is the configuration file for octsdb. I struggled trying to get a config file that provided me CPU and interface counters. All the examples are based on hardware platforms but I am using containers/VMs. But using this other blog post, I worked out the path for the interfaces in vEOS.

This is what I see in vEOS:

bash-4.2# curl localhost:6060/rest/Smash/counters/ethIntf/EtbaDut/current
{
    "counter": {
        "Ethernet1": {
            "counterRefreshTime": 0,
            "ethStatistics": {
...

bash-4.2# curl localhost:6060/rest/Kernel/proc/cpu/utilization/cpu/0
{
    "idle": 293338,
    "name": "0",
    "nice": 2965,
    "system": 1157399,
    "user": 353004,
    "util": 100
}

And this is what I see in cEOS. It seems this is not functional:

bash-4.2# curl localhost:6060/rest/Smash/counters/ethIntf
curl: (7) Failed to connect to localhost port 6060: Connection refused
bash-4.2# 
bash-4.2# 
bash-4.2# curl localhost:6060/rest/Kernel/proc/cpu/utilization/cpu/0
curl: (7) Failed to connect to localhost port 6060: Connection refused
bash-4.2# 

This is the file I have used and pushed to the switches:

$ cat veos4.23.json 
{
	"comment": "This is a sample configuration for vEOS 4.23",
	"subscriptions": [
		"/Smash/counters/ethIntf",
		"/Kernel/proc/cpu/utilization"
	],
	"metricPrefix": "eos",
	"metrics": {
		"intfCounter": {
			"path": "/Smash/counters/ethIntf/EtbaDut/current/(counter)/(?P<intf>.+)/statistics/(?P<direction>(?:in|out))(Octets|Errors|Discards)"
		},
		"intfPktCounter": {
			"path": "/Smash/counters/ethIntf/EtbaDut/current/(counter)/(?P<intf>.+)/statistics/(?P<direction>(?:in|out))(?P<type>(?:Ucast|Multicast|Broadcast))(Pkt)"
		},
                "totalCpu": {
                        "path": "/Kernel/proc/(cpu)/(utilization)/(total)/(?P<type>.+)"
                },
                "coreCpu": {
                        "path": "/Kernel/proc/(cpu)/(utilization)/(.+)/(?P<type>.+)"
                }
	}
}
$ scp veos4.23.json r1:/mnt/flash

Now you have to configure the switch to generate and send the data. In my case, I am using the MGMT vrf.

!
daemon TerminAttr
   exec /usr/bin/TerminAttr -disableaaa -grpcaddr MGMT/0.0.0.0:6042
   no shutdown
!
daemon octsdb
   exec /sbin/ip netns exec ns-MGMT /mnt/flash/octsdb -addr 192.168.249.4:6042 -config /mnt/flash/veos4.23.json -tsdb 10.128.0.4:4242
   no shutdown
!

TermiAttr it is listening on 0.0.0.0:6042. “octsdb” is using the mgmt IP 192.168.249.4 (that is in MGMT vrf) to connect to the influxdb container that is running in 10.128.0.4:4242

Verify that things are running:

# From the switch
show agent octsdb log

# From InfluxDB container
$ sudo docker exec -it influxdb-tele bash
root@cad339ead2ee:/# influx -precision 'rfc3339'
Connected to http://localhost:8086 version 1.8.1
InfluxDB shell version: 1.8.1
> show databases
name: databases
name
----
grpc
_internal
> use grpc
Using database grpc
> show measurements
name: measurements
name
----
eos.corecpu.cpu.utilization._counts
eos.corecpu.cpu.utilization.cpu.0
eos.corecpu.cpu.utilization.total
eos.intfcounter.counter.discards
eos.intfcounter.counter.errors
eos.intfcounter.counter.octets
eos.intfpktcounter.counter.pkt
eos.totalcpu.cpu.utilization.total
> exit
root@cad339ead2ee:/# exit

Now we need to configure grafana. So first we create a connection to influxdb. Again, I struggled with the URL. Influx and grafana are two containers running in the same host. I was using initially localhost and it was failing. At the end I had to find out the IP assigned to the influxdb container and use it.

Now you can create a dashboard with panels.

For octet counters:

For packet types:

For CPU usage:

And this is the dashboard:

Keep in mind that I am not 100% sure my grafana panels are correct for CPU and Counters (PktCounter makes sense)

At some point, I want to try telemetry for LANZ via this telerista plugin.

Multicast + 5G

I have been cleaning up my email box and found some interesting stuff. This is from APNIC regarding a new approach to deploy multicast. Slides from nanog page (check tuesday) In my former employer, we suffered traffic congestion several times after some famous games got new updates. So it is interesting that Akamai is trying to deploy inter-domain multicast in the internet. They have a massive network and I guess they suffered with those updates and this is an attempt to “digest” better those spikes. At minute 16 you can see the network changes required. It doesnt look like a quick/easy change but would be a great thing to happen.

And reading a nanog conversation about 5G I realised that this technology promises high bandwidth (and that could fix the issue of requiring multicast). But still we should have a smarter way to deliver same content to eyeball networks?

From the nanog thread, there are several links to videos about 5G like this from verizon that gives the vision from a big provider and its providers (not technical). This one is more technical with 5G terms (I lost contact of Telco term with early 4G). As well, I see mentioning kubernetes in 5G deployments quite often. I guess something new to learn.

Vim + Golang

I am trying to learn a bit of golang (although sometimes I think I should try to master python and bash first..) with this udemy course.

The same way I have pyenv/virtualenv to create python environments, I want to do the same for golang. So for that we have goenv:

Based on goenv install instructions and ex-collegue snipset, this is my goenv snipset in .bashrc:

########################
# Go configuration
########################
#
# git clone -b v0.0.4 https://github.com/wfarr/goenv.git $HOME/.goenv
if [ ! -d "$HOME/.goenv" ]; then
    git clone https://github.com/syndbg/goenv.git $HOME/.goenv
fi

if [ -d "$HOME/.goenv"   ]
then
    export GOENV_ROOT="$HOME/.goenv"
    export PATH="$GOENV_ROOT/bin:$PATH"
    if  type "goenv" &> /dev/null; then
        eval "$(goenv init -)"
        # Add the version to my prompt
        __goversion (){
            if  type "goenv" &> /dev/null; then
                goenv_go_version=$(goenv version | sed -e 's/ .*//')
                printf $goenv_go_version
            fi
        }
        #PS1_GO="go:\$(__goversion) "
        export PS1="go:\$(__goversion)|$PS1"
        export PATH="$GOROOT/bin:$PATH"
        export PATH="$PATH:$GOPATH/bin"
    fi
fi

################## End GoLang #####################

From time to time, remember to go to ~.goenv and do a “git pull” to get the latest versions of golang.

Ok, once we can install any golang version, I was thinking about the equivalent to python virtualenv, but it seems it is not really needed in golang. At the moment, I am super beginner so no rush about this.

And finally, as I try to use VIM for everything so I can keep learning, I want to use similar python plugins for golang. So I searched and this one looks quite good: vim-go

So I updated vundle config in .vimrc:

Plugin 'fatih/vim-go', { 'do': ':GoUpdateBinaries' }

Then install the new new plugin once in VIM.

:PluginInstall

or >

:PluginUpdate

There is a good tutorial you can follow to learn the new commands.

I am happy enough with “GoRun”, “GoFmt”, “GoImports”, “GoTest”

Keep practising, Keep learning.

OOB

I was reading this blog and realised that OOB is something is not talked about very often. Based on what I have seen in my career:

Design

You need to sell the idea that this is a must. Then you need to secure some budget. You dont need much:

1x switch

1x firewall

1x Internet access (if you have your ASN and IP range, dont use it)

Keep it simple..

Most network kit (firewalls, routers, switches, pdus, console servers, etc) have 1xmgmt port and 1xconsole port. So all those need to go to the console server. I guess most server vendors offer some OOB access (I just know Dell and HP). So all those go to the oob switch.

If you have a massive network with hundreds of devices/servers, then you will need more oob switches and console servers. You still need just one firewall and 1 internet connection. The blog comments about the spine-leaf oob network. I guess this is the way for a massive network/DC.

Access to OOB

You need to be able to access it via your corporate network and from anywhere in the internet.

You need to be sure linux/windows/macs can VPN.

Use very strong passwords and keys.

You need to be sure the oob firewall is quite tight in access. At the end of the day you only want to allow ssh to the console server and https to the ILO/iDRACS. Nothing initiated internally can go to the internet.

Dependencies

Think in the worse scenario. Your DNS server is down. Your authentication is down.

You need to be sure you have local auth enabled in all devices for emergency

You need to work out some DNS service. Write the key IPs in the documentation?

You IP transit has to be reliable. You dont need a massive pipe but you need to be sure it is up.

Monitoring

You dont want to be in the middle of the outage and realise that your OOB is not functional. You need to be sure the ISP for the OOB is up and the devices (oob switch and oob firewall) are functional all the time.

How to check the serial connections? conserver.com

Documentation

Another point frequently lost. You need to be sure people can find info about the OOB: how is built and how to access it.

Training

At the end of the day, if you have a super OOB network but then nobody knows how to connect and use it, then it is useful. Schedule routine checkups with the team to be sure everybody can OOB. This is useful when you get a call at 3am.

Diagram

Update

Funny enough, I was watching today NLNOG live and there was a presentation about OOB with too different approaches: in-band out-of-band and pure out-of-band.

From the NTT side, I liked the comment about conserver.com to manage your serial connections. I will try to use it once I have access to a new network.

Forward TCPDump to Wireshark

Reading this blog entry I realised that very likely I have never tried forward tcpdump to a wireshark. How many times I have taken a pcap in a switch and then I need to download to see the details in wireshark…

I guess you can find some blocking points in firewalls (at least for 2-steps option)

So I tried the single command with a switch in my ceoslab and it works!

Why it works?

ssh <username>@<switch>  "bash tcpdump -s 0 -U -n -w - -i <interface>" | wireshark -k -i -

The ssh command is actually executing the “bash tcpdump…” remotely. But the key is the “-U” and “-w -” flags. “-U” in conjunction with “-w” sends the packet without waiting for the buffer to fill. Then “-w -” says that it writes the output to stdout instead of a file. If you run the command without -U, it would work but it will update a bit slower as it needs to fill the buffers.

From tcpdump manual:

       -U
       --packet-buffered
              If the -w option is not specified, make the printed packet output ``packet-buffered''; i.e., as the description of the contents of each packet is printed, it will be written to the standard  output,  rather  than, when not writing to a terminal, being written only when the output buffer fills.

              If  the  -w  option  is  specified, make the saved raw packet output ``packet-buffered''; i.e., as each packet is saved, it will be written to the output file, rather than being written only when the output buffer fills.

              The -U flag will not be supported if tcpdump was built with an older version of libpcap that lacks the pcap_dump_flush() function.

......
   -w file
          Write the raw packets to file rather than parsing and printing them out.  They can later be printed with the -r option.  Standard output is used if file is ``-''.

          This  output will be buffered if written to a file or pipe, so a program reading from the file or pipe may not see packets for an arbitrary amount of time after they are received.  Use the -U flag to cause packets to be written as soon as they are received.

And the stdout of that process is the ssh command so we redirect that outout via a pipe “|” and it is sent as input for wireshark thanks to “-i -” that makes wireshark to read from stdin (that is the stdout from the tcpdump in the switch!)

The wireshark manual:

       -i|--interface  <capture interface>|-
           Set the name of the network interface or pipe to use for live packet capture.

           Network interface names should match one of the names listed in "wireshark -D" (described above); a number, as reported by "wireshark -D", can also be used.  If you're using UNIX, "netstat -i", "ifconfig -a" or "ip link" might also work to list interface names, although not all versions of UNIX support the -a flag to ifconfig.

           If no interface is specified, Wireshark searches the list of interfaces, choosing the first non-loopback interface if there are any non-loopback interfaces, and choosing the first loopback interface if there are no non-loopback interfaces.  If there are no interfaces at all, Wireshark reports an error and doesn't start the capture.

           Pipe names should be either the name of a FIFO (named pipe) or "-" to read data from the standard input.  On Windows systems, pipe names must be of the form "\\pipe\.\pipename".  Data read from pipes must be in standard pcapng or pcap format. Pcapng data must have the same endianness as the capturing host.

           This option can occur multiple times. When capturing from multiple interfaces, the capture file will be saved in pcapng format.

....

       -k  Start the capture session immediately.  If the -i flag was specified, the capture uses the specified interface.  Otherwise, Wireshark searches the list of interfaces, choosing the first non-loopback interface if
           there are any non-loopback interfaces, and choosing the first loopback interface if there are no non-loopback interfaces; if there are no interfaces, Wireshark reports an error and doesn't start the capture.

The two-steps option relies on “nc” to send/receive the data, but it is the same idea regarding the tcpdump/wireshark flags using “-“

On switch: tcpdump -s 0 -U -n -w - -i <interface> | nc <computer-ip> <port>
On PC: netcat -l -p <port> | wireshark -k -S -i -

Linux Networking – Bonding/Bridging/VxLAN

Bonding

$ sudo modprobe bonding
$ ip link help bond
$ sudo ip link add bond0 type bond mode 802.3ad
$ sudo ip link set eth0 master bond0
$ sudo ip link set eth1 master bond0

Bridging: vlans + trunks

ip neigh show // l2 table
ip route show // l3 table

ip route add default via 192.168.1.1 dev eth1

sudo modprobe 8021q

// create bridge and add links to bridge (switch)
sudo ip link add br0 type bridge vlan_filtering 1 // native vlan = 1
sudo ip link set eth1 master br0
sudo ip link set eth2 master br0
sudo ip link set eth3 master br0

// make eth1 access port for v11
sudo bridge vlan add dev eth1 vid 11 pvid untagged

// make eth3 access port for v12
sudo bridge vlan add dev eth3 vid 12 pvid untagged

// make eth2 trunk port for v11 and v12
sudo bridge vlan add dev eth2 vid 11
sudo bridge vlan add dev eth2 vid 12

// enable bridge and links
sudo ip link set up dev br0
sudo ip link set up dev eth1
sudo ip link set up dev eth2
sudo ip link set up dev eth3

bridge link show
bridge vlan show
bridge fdb show

VxLAN

I havent tried this yet:

Linux System 1
  sudo ip link add br0 type bridge vlan_filtering 1
  sudo ip link add vlan10 type vlan id 10 link bridge protocol none
  sudo ip addr add 10.0.0.1/24 dev vlan10
  sudo ip link add vtep10 type vxlan id 1010 local 10.1.0.1 remote 10.3.0.1 learning
  sudo ip link set eth1 master br0
  sudo bridge vlan add dev eth1 vid 10 pvid untagged

Linux System 2
  sudo ip link add br0 type bridge vlan_filtering 1
  sudo ip link add vlan10 type vlan id 10 link bridge protocol none
  sudo ip addr add 10.0.0.2/24 dev vlan10
  sudo ip link add vtep10 type vxlan id 1010 local 10.3.0.1 remote 10.1.0.1 learning
  sudo ip link set eth1 master br0
  sudo bridge vlan add dev eth1 vid 10 pvid untagged

Monitoring Syslog: InfluxDB-Telegraf-Grafana via Ansible role

This a continuation of the last blog entry. This time we are going to gather syslog messages from the monitoring containers and it is going to be deployed by ansible!

As usual, all this is based on Anton’s Karneliuk blog post. All credits to him.

So initially we built a monitoring stack with InfluxDB, Telegraf and Grafana manually to gather and visualise SNMP infor form the Arista cEOS switches.

This time, we are going to send SYSLOG from the monitoring stack containers to a new Telegraf instance.

Ideally, we would like to send Syslog from the cEOS devices but as Anton mentions, the syslog rfc3164 that most network kit implements, it is not supported (yet) by telegraf, that supports rfc5424.

You can read more info about this in all these links:

https://github.com/influxdata/telegraf/issues/4593

https://github.com/influxdata/go-syslog/pull/27

https://github.com/influxdata/telegraf/issues/7023

https://github.com/influxdata/telegraf/issues/4687

https://medium.com/@leodido/from-logs-to-metrics-f38854e3441a

https://itnext.io/metrics-from-kubernetes-logs-82cb1dcb3551

So the new ansible role for building influx-telegraf-grafana instances is “monitoring_stack”:

├── ansible.cfg
├── ansible-hosts
├── group_vars
│   ├── ceoslab.yaml
│   └── monitoring.yaml
└── playbooks
    ├── monitoring.yaml
    └── roles
        ├── monitoring_stack
        │   ├── tasks
        │   │   ├── container_grafana.yml
        │   │   ├── container_influxdb.yml
        │   │   ├── container_telegraf_snmp.yml
        │   │   ├── container_telegraf_syslog.yml
        │   │   └── main.yml
        │   └── templates
        │       ├── telegraf_snmp_template.j2
        │       └── telegraf_syslog_template.j2

We will have four monitoring containers:

  • influxdb: our time-series database with two databases: snmp and syslog
  • grafana: GUI to visualize influxdb contents, we will have pales for snmp and syslog queries. It will need to connect to influxdb
  • telegraf-snmp: collector of snmp info from the cEOS containers. The list is introduced manually in the template. It will write in influxdb
  • telegraf-syslog: collector of syslog messages from the monitoring containers. It will write in influxdb

As the containers are running locally, we define them in the inventory like this:

$ cat ansible-hosts
....
[monitoring]
localhost

We define some variables too in group_vars for the monitoring containers that will be used in the jinja2 templates and tasks

$ cat group_vars/monitoring.yaml
# Defaults for Docker containers
docker_mon_net:
  name: monitoring
  subnet: 172.18.0.0/16
  gateway: 172.18.0.1

path_to_containers: /PICK_YOUR_PATH/monitoring-example

var_influxdb:
  username: xxx
  password: xxx123
  snmp_community: xxx123
  db_name:
    snmp: snmp
    syslog: syslog

var_grafana:
  username: admin
  password: xxx123

var_telegraf:
…

So we execute the playbook like this:

ansible master$ ansible-playbook playbooks/monitoring.yaml -vvv --ask-become-pass

The very first time, if you pay attention to the ansible logging, everything should success. If for any reason you have to make changes or troubleshoot, and execute again the full playbook, some tasks will fail, but not the playbook (this is done with ignore_errors: yes inside a task). For example, the docker network creation will fail as it is already there. The same if you try to create the user and dbs in a already running influx instance.

That playbook just calls the role “monitoring_stack“. The main playbook in that role will create the docker network where all containers will be attached, all the containers and do something hacky with iptables.

As the cEOS lab is built (using docker-topo) independently of this playbook, there are already some iptables rules in place, and somehow, when executing the role, the rules change and it blocks the new network for any outbound connectivity.

Before the iptables change in the playbook:

# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
#
# iptables -t filter -S DOCKER-ISOLATION-STAGE-2
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-2 -o br-4bd17cfa19a8 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-94c1e813ad6f -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-13ab2b6a0d1d -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-00db5844bbb0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-121978ca0282 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN

I want to avoid DOCKER-ISOLATION-STAGE-2 so I want the “-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT” on top of that chain.

This is not the first (neither last) time that this issue bites me. I need to review carefully the docker-topo file and really get me head around the networking expectations from docker.

Another thing about docker networking that bites me very often. In my head, each monitoring has an IP. For example influxdb is 172.18.0.2 and telegraf-syslog is 172.18.0.4. We have configured influxdb to send syslog to telegraf-syslog container so I would expect the influxdb container to use its 0.2 and everything is local (no forwarding, no firewall, etc0. But not, it uses the host ip, 172.18.0.1.

Apart from that, there are several things that I had to review while adapting the role to my environment regarding docker and ansible.

docker documentation:

how to create network: https://docs.docker.com/engine/reference/commandline/network_create/

how to configure container logs: https://docs.docker.com/engine/reference/commandline/container_logs/

how to configure the logging driver in a container: https://docs.docker.com/config/containers/logging/configure/

how to configure syslog in a container: https://docs.docker.com/config/containers/logging/syslog/

how to run commands from a running container: https://docs.docker.com/engine/reference/commandline/exec/

ansible documentation:

become – run comamnds with sudo in a playbook: https://docs.ansible.com/ansible/latest/user_guide/become.html (–ask-become-pass, -K)

docker container module: https://docs.ansible.com/ansible/latest/modules/docker_container_module.html

grafana data source module: https://docs.ansible.com/ansible/latest/modules/grafana_datasource_module.html

This is important because via ansible, I had to workout the meaning of become, how to add the syslog config in the containers and add grafana datasources via a module.

All my ansible code is here.

Another thing I had to hardcode in the code, it is the IP for the telegraf-syslog container in each container playbook:

syslog-address: “udp://172.18.0.4:6514”

$ cat container_influxdb.yml 
---
...
- name: 4- CONTAINER INFLUXDB // LAUNCHING CONTAINER
  docker_container:
      name: influxdb
      image: influxdb
      state: started
      command: "-config /etc/influxdb/influxdb.conf"
      networks:
          - name: "{{ docker_mon_net.name }}"
      purge_networks: yes
      ports:
          - "8086:8086"
      volumes:
          - "{{ path_to_containers }}/influxdb/influxdb.conf:/etc/influxdb/influxdb.conf:ro"
          - "{{ path_to_containers }}/influxdb/data:/var/lib/influxdb"
      log_driver: syslog
      log_options:
        syslog-address: "udp://172.18.0.4:6514"
        tag: influxdb
        syslog-format: rfc5424
  become: yes
  tags:
      - tag_influx
...

Once you have all containers running:

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                                  NAMES
dd519ff01d6e        telegraf            "/entrypoint.sh -con…"   4 hours ago         Up 4 hours          8092/udp, 0.0.0.0:161->161/udp, 8125/udp, 8094/tcp     telegraf_snmp
869f158046a6        grafana/grafana     "/run.sh"                4 hours ago         Up 4 hours          0.0.0.0:3000->3000/tcp                                 grafana
dc68f261746b        influxdb            "/entrypoint.sh -con…"   4 hours ago         Up 4 hours          0.0.0.0:8086->8086/tcp                                 influxdb
3662c3c69b21        telegraf            "/entrypoint.sh -con…"   6 hours ago         Up 6 hours          8092/udp, 0.0.0.0:6514->6514/udp, 8125/udp, 8094/tcp   telegraf_syslog
ada1f884f1b7        ceos-lab:4.23.3M    "/sbin/init systemd.…"   28 hours ago        Up 4 hours          0.0.0.0:2002->22/tcp, 0.0.0.0:9002->443/tcp            3node_r03
22d9c4ae9043        ceos-lab:4.23.3M    "/sbin/init systemd.…"   28 hours ago        Up 4 hours          0.0.0.0:2001->22/tcp, 0.0.0.0:9001->443/tcp            3node_r02
fe7046b1f425        ceos-lab:4.23.3M    "/sbin/init systemd.…"   28 hours ago        Up 4 hours          0.0.0.0:2000->22/tcp, 0.0.0.0:9000->443/tcp            3node_r01

You should verify that syslog messages are stored in influxdb:

$ curl -G 'https://localhost:8086/query?db=syslog&pretty=true&u=xxx&p=xxx123' --data-urlencode "q=SELECT * FROM syslog limit 2" --insecure
{
    "results": [
        {
            "statement_id": 0,
            "series": [
                {
                    "name": "syslog",
                    "columns": [
                        "time",
                        "appname",
                        "facility",
                        "facility_code",
                        "host",
                        "hostname",
                        "message",
                        "msgid",
                        "procid",
                        "severity",
                        "severity_code",
                        "timestamp",
                        "version"
                    ],
                    "values": [
                        [
                            "2020-07-21T12:08:16.216632823Z",
                            "influxdb",
                            "daemon",
                            3,
                            "3662c3c69b21",
                            "athens",
                            "ts=2020-07-21T12:08:16.169711Z lvl=info msg=\"InfluxDB starting\" log_id=0O8KE_AG000 version=1.8.1 branch=1.8 commit=af0237819ab9c5997c1c0144862dc762b9d8fc25",
                            "influxdb",
                            "11254",
                            "err",
                            3,
                            1595333296000000000,
                            1
                        ],

We can create the new queries in grafana for SYSLOG. The datasources are already created by ansible so we dont have to worry about that.

For creating a query about the number of syslog messages we receive. This is what I did:

grafana – syslog rate query

Most of the entries come from “influxdb”.

For creating a query with the content of each syslog message:

grafana – syslog content

Here I struggled a bit. I can’t really change much in the table view.

And this is the dashboard with the syslog queries and snmp from the last blog entry:

grafana – dashboard – syslog and snmp

So at the end, I have an ansible role working!

Need to learn more about how to backup stuff from grafana. I have been playing with this:

https://github.com/ysde/grafana-backup-tool

Next thing I want to try is telemetry.

Monitoring: InfluxDB-Telegraf-Grafana

This is something I wanted to try for some time. Normally for networks monitoring you use a NMS tool. They can be expensive, free or cheap. I have seen/used Observium and LibreNMS. And many years ago Cacti. There are other tools that can do the job like Zabbix/Nagios/Icinga.

But it seems time-series-databases are the new standard. They give you more flexibility as you can create queries and graph them.

There are many tools out there that I dont really know like Prometheus, the elk-stack (Elasticsearch, Logstash, and Kibana), Influxdb, telegraf and grafana.

I decided for InfluxDB-Telegraf-Grafana stuck as I could find quickly info based on scenarios of networks.

What is the rule of eachc one:

Telegraf: collect data
InfluxDB: store data
Grafana: visualize

My main source is again Anton’s blog. All credits to him.


Environment

My network is just 3 Arista ceos containers via docker. All services will run as containers so you need docker installed. Everything is IPv4.

InfluxDB

Installation:

// Create directories
mkdir telemetry-example/influxdb
cd telemetry-example/influxdb

// Get influxdb config
docker run --rm influxdb influxd config > influxdb.conf

// Create local data folder for influxdb that we will map
mkdir data
ls -ltr

// Check docker status
docker images
docker ps -a

// Create docker instance for influxdb. Keep in  mind that I am giving a name to the instance

docker run -d -p 8086:8086 -p 8088:8088 --name influxdb \
-v $PWD/influxdb.conf:/etc/influxdb/influxdb.conf:ro \
-v $PWD/data:/var/lib/influxdb \
influxdb -config /etc/influxdb/influxdb.conf

// Verify connectivity
curl -i http://localhost:8086/ping

// Create database "test" using http-query (link below for more details)
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE test"
{"results":[{"statement_id":0}]} <-- command was ok!

// Create user/pass for your db. 
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE USER xxx WITH PASSWORD 'xxx123' WITH ALL PRIVILEGES"
{"results":[{"statement_id":0}]} <-- command was ok!

// Create SSL cert for influxdb
docker exec -it influxdb openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/ssl/influxdb-selfsigned.key -out /etc/ssl/influxdb-selfsigned.crt -days 365 -subj "/C=GB/ST=LDN/L=LDN/O=domain.com/CN=influxdb.domain.com"

// Update influxdb.conf for SSL
telemetry-example/influxdb$ vim influxdb.conf
…
https-enabled = true
https-certificate = "/etc/ssl/influxdb-selfsigned.crt"
https-private-key = "/etc/ssl/influxdb-selfsigned.key"
…

// Restart influxdb to take the changes
docker restart influxdb

// Get influxdb IP for using it later
docker container inspect influxdb --format='{{ .NetworkSettings.IPAddress }}'
172.17.0.2

// Verify connectivity via https
curl -i https://localhost:8086/ping --insecure

The verification for HTTPS was a bit more difficult because the result was always correct no matter what query I was running:

$ curl -G https://localhost:8086/query --data-urlencode "db=test" --data-urlencode "q=SELECT * FROM test" --insecure
{"results":[{"statement_id":0}]}

$ curl -XPOST 'https://localhost:8086/query?db=test&u=xxx&p=xxx123' --data-urlencode 'q=SELECT * FROM test' --insecure
{"results":[{"statement_id":0}]}

$ curl -XPOST 'https://localhost:8086/query?db=test&u=xxx&p=yyy1231' --data-urlencode 'q=SELECT * FROM test' --insecure
{"results":[{"statement_id":0}]}

So I decided to see if there was cli/shell for the influxdb (like in mysql, etc). And yes, there is one. Keep in mind that you have to use “-ssl -unsafeSsl” at the same time! That confused me a lot.

$ docker exec -it influxdb influx -ssl -unsafeSsl
Connected to https://localhost:8086 version 1.8.1
InfluxDB shell version: 1.8.1
> show databases
name: databases
name
_internal
test
> use test
Using database test
> show series
key
cpu,cpu=cpu-total,host=5f7aa2c5550e

Links about influxdb that are good for the docker creation and the http queries:

https://hub.docker.com/_/influxdb

https://docs.influxdata.com/influxdb/v1.7/tools/api/#query-http-endpoin

Telegraf

I struggled with the SNMP config needed in Telegraf. The installation was fine.

Links:

https://hub.docker.com/_/telegraf

https://docs.influxdata.com/telegraf/v1.14/introduction/getting-started/

Steps:

// Create dir
mkdir telemetry-example/telegraf
cd telemetry-example/telegraf

// Get config file to be modified
docker run --rm telegraf telegraf config > telegraf.conf

// Add the details of influxdb in telegraf.conf. As well, you need to add the devices you want to poll. In my case 172.23.0.2/3/4.

vim telegraf.conf
....
[[outputs.influxdb]]
urls = ["https://172.17.0.2:8086"]
database = "test"
skip_database_creation = false
## Timeout for HTTP messages.
timeout = "5s"
## HTTP Basic Auth
username = "xxx"
password = "xxx123"
## Use TLS but skip chain & host verification
insecure_skip_verify = true
# Retrieves SNMP values from remote agents
[[inputs.snmp]]
## Agent addresses to retrieve values from.
## example: agents = ["udp://127.0.0.1:161"]
## agents = ["tcp://127.0.0.1:161"]
agents = ["udp://172.23.0.2:161","udp://172.23.0.3:161","udp://172.23.0.4:161"]
#
## Timeout for each request.
timeout = "5s"
#
## SNMP version; can be 1, 2, or 3.
version = 2
#
## SNMP community string.
community = "tomas123"
#
## Number of retries to attempt.
retries = 3

This is the SNMP config I added below the SNMPv3 options in [[inputs.snmp]]

#   ## Add fields and tables defining the variables you wish to collect.  This
#   ## example collects the system uptime and interface variables.  Reference the
#   ## full plugin documentation for configuration details.

  [[inputs.snmp.field]]
    name = "hostname"
    oid = "RFC1213-MIB::sysName.0"
    is_tag = true

  [[inputs.snmp.field]]
    name = "uptime"
    oid = "DISMAN-EVENT-MIB::sysUpTimeInstance"

  # IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards.
  [[inputs.snmp.table]]
    name = "interface"
    inherit_tags = [ "hostname" ]
    oid = "IF-MIB::ifTable"

    # Interface tag - used to identify interface in metrics database
    [[inputs.snmp.table.field]]
      name = "ifDescr"
      oid = "IF-MIB::ifDescr"
      is_tag = true

  # IF-MIB::ifXTable contains newer High Capacity (HC) counters that do not overflow as fast for a few of the ifTable counters
  [[inputs.snmp.table]]
    name = "interfaceX"
    inherit_tags = [ "hostname" ]
    oid = "IF-MIB::ifXTable"

    # Interface tag - used to identify interface in metrics database
    [[inputs.snmp.table.field]]
      name = "ifDescr"
      oid = "IF-MIB::ifDescr"
      is_tag = true

  # EtherLike-MIB::dot3StatsTable contains detailed ethernet-level information about what kind of errors have been logged on an interface (such as FCS error, frame too long, etc)
  [[inputs.snmp.table]]
    name = "interface"
    inherit_tags = [ "hostname" ]
    oid = "EtherLike-MIB::dot3StatsTable"

    # Interface tag - used to identify interface in metrics database
    [[inputs.snmp.table.field]]
      name = "name"
      oid = "IF-MIB::ifDescr"
      is_tag = true

For more info about the SNMP config in telegraf. These are good links. This is the official github page. And this is the page for SNMP input plugin that explain the differences between “field” and “table”.

As well, the link below is really good too for explaining the SNMP config in telegraf:”Gathering Data via SNMP”

https://blog.networktocode.com/post/network_telemetry_for_snmp_devices/

Start the container:

docker run -d -p 8125:8125 -p 8092:8092 -p 8094:8094 --name telegraf \
-v $PWD/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
telegraf -config /etc/telegraf/telegraf.conf

Check the logs:

docker logs telegraf -f
...
2020-07-17T12:45:10Z E! [inputs.snmp] Error in plugin: initializing table interface: translating: exit status 2: MIB search path: /root/.snmp/mibs:/usr/share/snmp/mibs:/usr/share/snmp/mibs/iana:/usr/share/snmp/mibs/ietf:/usr/share/mibs/site:/usr/share/snmp/mibs:/usr/share/mibs/iana:/usr/share/mibs/ietf:/usr/share/mibs/netsnmp
Cannot find module (EtherLike-MIB): At line 0 in (none)
EtherLike-MIB::dot3StatsTable: Unknown Object Identifier
...

You will see errors about not able to find the MIB files! So I used Librenms mibs. I download the project and copied the MIBS I thought I needed (arista and some other that dont belong to a vendor). As well, this is noted by Anton’s in this link:

https://github.com/librenms/librenms/tree/master/mibs

In my case:

/usr/share/snmp/mibs$ ls -ltr
total 4672
-rw-r--r-- 1 root root  52820 Feb  7  2019 UCD-SNMP-MIB.txt
-rw-r--r-- 1 root root  18274 Feb  7  2019 UCD-SNMP-MIB-OLD.txt
-rw-r--r-- 1 root root   8118 Feb  7  2019 UCD-IPFWACC-MIB.txt
-rw-r--r-- 1 root root   6476 Feb  7  2019 UCD-IPFILTER-MIB.txt
-rw-r--r-- 1 root root   3087 Feb  7  2019 UCD-DLMOD-MIB.txt
-rw-r--r-- 1 root root   4965 Feb  7  2019 UCD-DISKIO-MIB.txt
-rw-r--r-- 1 root root   2163 Feb  7  2019 UCD-DEMO-MIB.txt
-rw-r--r-- 1 root root   5039 Feb  7  2019 NET-SNMP-VACM-MIB.txt
-rw-r--r-- 1 root root   4814 Feb  7  2019 NET-SNMP-TC.txt
-rw-r--r-- 1 root root   1226 Feb  7  2019 NET-SNMP-SYSTEM-MIB.txt
-rw-r--r-- 1 root root   2504 Feb  7  2019 NET-SNMP-PERIODIC-NOTIFY-MIB.txt
-rw-r--r-- 1 root root   3730 Feb  7  2019 NET-SNMP-PASS-MIB.txt
-rw-r--r-- 1 root root   1215 Feb  7  2019 NET-SNMP-MONITOR-MIB.txt
-rw-r--r-- 1 root root   2036 Feb  7  2019 NET-SNMP-MIB.txt
-rw-r--r-- 1 root root   9326 Feb  7  2019 NET-SNMP-EXTEND-MIB.txt
-rw-r--r-- 1 root root   9160 Feb  7  2019 NET-SNMP-EXAMPLES-MIB.txt
-rw-r--r-- 1 root root  15901 Feb  7  2019 NET-SNMP-AGENT-MIB.txt
-rw-r--r-- 1 root root   5931 Feb  7  2019 LM-SENSORS-MIB.txt
-rw-r--r-- 1 root root   1913 Jul  2 04:38 GNOME-SMI.txt
-rw-r--r-- 1 root root   5775 Jul 17 13:14 SNMPv2-TM
-rw-r--r-- 1 root root   2501 Jul 17 13:14 SNMPv2-TC-v1
-rw-r--r-- 1 root root  38034 Jul 17 13:14 SNMPv2-TC
-rw-r--r-- 1 root root   1371 Jul 17 13:14 SNMPv2-SMI-v1
-rw-r--r-- 1 root root   8924 Jul 17 13:14 SNMPv2-SMI
-rw-r--r-- 1 root root  29305 Jul 17 13:14 SNMPv2-MIB
-rw-r--r-- 1 root root   8263 Jul 17 13:14 SNMPv2-CONF
-rw-r--r-- 1 root root  17177 Jul 17 13:14 INET-ADDRESS-MIB
-rw-r--r-- 1 root root  71691 Jul 17 13:14 IF-MIB
-rw-r--r-- 1 root root   3129 Jul 17 13:15 ARISTA-BGP4V2-TC-MIB
-rw-r--r-- 1 root root  64691 Jul 17 13:15 ARISTA-BGP4V2-MIB
-rw-r--r-- 1 root root   7155 Jul 17 13:15 ARISTA-VRF-MIB
-rw-r--r-- 1 root root   1964 Jul 17 13:15 ARISTA-SMI-MIB
-rw-r--r-- 1 root root  10901 Jul 17 13:15 ARISTA-NEXTHOP-GROUP-MIB
-rw-r--r-- 1 root root   5826 Jul 17 13:15 ARISTA-IF-MIB
-rw-r--r-- 1 root root   4547 Jul 17 13:15 ARISTA-GENERAL-MIB
-rw-r--r-- 1 root root   7014 Jul 17 13:15 ARISTA-ENTITY-SENSOR-MIB
-rw-r--r-- 1 root root  62277 Jul 17 13:21 IANA-PRINTER-MIB
-rw-r--r-- 1 root root  36816 Jul 17 13:21 IANA-MAU-MIB
-rw-r--r-- 1 root root   4299 Jul 17 13:21 IANA-LANGUAGE-MIB
-rw-r--r-- 1 root root  13954 Jul 17 13:21 IANA-ITU-ALARM-TC-MIB
-rw-r--r-- 1 root root  35628 Jul 17 13:21 IANAifType-MIB
-rw-r--r-- 1 root root  15150 Jul 17 13:21 IANA-GMPLS-TC-MIB
-rw-r--r-- 1 root root  10568 Jul 17 13:21 IANA-CHARSET-MIB
-rw-r--r-- 1 root root   4743 Jul 17 13:21 IANA-ADDRESS-FAMILY-NUMBERS-MIB
-rw-r--r-- 1 root root   3518 Jul 17 13:21 IANA-RTPROTO-MIB
-rw-r--r-- 1 root root  13100 Jul 17 13:31 ENTITY-STATE-MIB
-rw-r--r-- 1 root root  16248 Jul 17 13:31 ENTITY-SENSOR-MIB
-rw-r--r-- 1 root root  59499 Jul 17 13:31 ENTITY-MIB
-rw-r--r-- 1 root root   2114 Jul 17 13:31 BGP4V2-TC-MIB
-rw-r--r-- 1 root root  50513 Jul 17 13:31 BGP4-MIB
-rw-r--r-- 1 root root  96970 Jul 17 13:31 IEEE8021-CFMD8-MIB
-rw-r--r-- 1 root root  86455 Jul 17 13:31 IEEE8021-BRIDGE-MIB
-rw-r--r-- 1 root root 113507 Jul 17 13:31 IEEE802171-CFM-MIB
-rw-r--r-- 1 root root   6321 Jul 17 13:31 ENTITY-STATE-TC-MIB
-rw-r--r-- 1 root root 112096 Jul 17 13:31 IEEE802dot11-MIB
-rw-r--r-- 1 root root  44559 Jul 17 13:31 IEEE8023-LAG-MIB
-rw-r--r-- 1 root root  24536 Jul 17 13:31 IEEE8021-TC-MIB
-rw-r--r-- 1 root root  62182 Jul 17 13:31 IEEE8021-SECY-MIB
-rw-r--r-- 1 root root  96020 Jul 17 13:31 IEEE8021-Q-BRIDGE-MIB
-rw-r--r-- 1 root root  62591 Jul 17 13:31 IEEE8021-PAE-MIB
-rw-r--r-- 1 root root 135156 Jul 17 13:31 IEEE8021-CFM-MIB
-rw-r--r-- 1 root root  48703 Jul 17 13:31 IPV6-MIB
-rw-r--r-- 1 root root  15936 Jul 17 13:31 IPV6-ICMP-MIB
-rw-r--r-- 1 root root   3768 Jul 17 13:31 IPV6-FLOW-LABEL-MIB
-rw-r--r-- 1 root root  31323 Jul 17 13:31 IPMROUTE-STD-MIB
-rw-r--r-- 1 root root  33626 Jul 17 13:31 IPMROUTE-MIB
-rw-r--r-- 1 root root 186550 Jul 17 13:31 IP-MIB
-rw-r--r-- 1 root root  46366 Jul 17 13:31 IP-FORWARD-MIB
-rw-r--r-- 1 root root 240526 Jul 17 13:31 IEEE-802DOT17-RPR-MIB
-rw-r--r-- 1 root root   4400 Jul 17 13:31 IPV6-UDP-MIB
-rw-r--r-- 1 root root   7257 Jul 17 13:31 IPV6-TCP-MIB
-rw-r--r-- 1 root root   2367 Jul 17 13:31 IPV6-TC
-rw-r--r-- 1 root root  14758 Jul 17 13:31 IPV6-MLD-MIB
-rw-r--r-- 1 root root  69938 Jul 17 13:31 MPLS-LSR-MIB
-rw-r--r-- 1 root root  61017 Jul 17 13:31 MPLS-L3VPN-STD-MIB
-rw-r--r-- 1 root root  16414 Jul 17 13:31 LLDP-V2-TC-MIB.mib
-rw-r--r-- 1 root root  77651 Jul 17 13:31 LLDP-V2-MIB.mib
-rw-r--r-- 1 root root  76945 Jul 17 13:31 LLDP-MIB
-rw-r--r-- 1 root root  59110 Jul 17 13:31 LLDP-EXT-MED-MIB
-rw-r--r-- 1 root root  30192 Jul 17 13:31 LLDP-EXT-DOT3-MIB
-rw-r--r-- 1 root root  30182 Jul 17 13:31 LLDP-EXT-DOT1-MIB
-rw-r--r-- 1 root root  36147 Jul 17 13:31 LLDP-EXT-DCBX-MIB
-rw-r--r-- 1 root root 145796 Jul 17 13:31 ISIS-MIB
-rw-r--r-- 1 root root  28564 Jul 17 13:31 TCP-MIB
-rw-r--r-- 1 root root   1291 Jul 17 13:31 RFC-1215
-rw-r--r-- 1 root root  79667 Jul 17 13:31 RFC1213-MIB
-rw-r--r-- 1 root root   2866 Jul 17 13:31 RFC-1212
-rw-r--r-- 1 root root   3067 Jul 17 13:31 RFC1155-SMI
-rw-r--r-- 1 root root  60053 Jul 17 13:31 MPLS-VPN-MIB
-rw-r--r-- 1 root root  95418 Jul 17 13:31 MPLS-TE-STD-MIB
-rw-r--r-- 1 root root  55490 Jul 17 13:31 MPLS-TE-MIB
-rw-r--r-- 1 root root  26327 Jul 17 13:31 MPLS-TC-STD-MIB
-rw-r--r-- 1 root root  76361 Jul 17 13:31 MPLS-LSR-STD-MIB
-rw-r--r-- 1 root root  12931 Jul 17 13:31 RFC1389-MIB
-rw-r--r-- 1 root root  30091 Jul 17 13:31 RFC1284-MIB
-rw-r--r-- 1 root root 147614 Jul 17 13:31 RFC1271-MIB
-rw-r--r-- 1 root root  22342 Jul 17 13:33 SNMP-FRAMEWORK-MIB
-rw-r--r-- 1 root root 223833 Jul 17 13:33 RMON2-MIB
-rw-r--r-- 1 root root 127407 Jul 17 13:33 DIFFSERV-MIB
-rw-r--r-- 1 root root 101324 Jul 17 13:34 TOKEN-RING-RMON-MIB
-rw-r--r-- 1 root root 147822 Jul 17 13:34 RMON-MIB
-rw-r--r-- 1 root root  26750 Jul 17 13:34 INTEGRATED-SERVICES-MIB
-rw-r--r-- 1 root root   1863 Jul 17 13:34 DIFFSERV-DSCP-TC
-rw-r--r-- 1 root root  34162 Jul 17 13:34 SNMP-VIEW-BASED-ACM-MIB
-rw-r--r-- 1 root root  84133 Jul 17 13:34 Q-BRIDGE-MIB
-rw-r--r-- 1 root root  16414 Jul 17 13:34 TRANSPORT-ADDRESS-MIB
-rw-r--r-- 1 root root  39879 Jul 17 13:35 P-BRIDGE-MIB
-rw-r--r-- 1 root root   4660 Jul 17 13:35 HCNUM-TC
-rw-r--r-- 1 root root  54884 Jul 17 13:35 BRIDGE-MIB
-rw-r--r-- 1 root root   2628 Jul 17 13:35 VPN-TC-STD-MIB
-rw-r--r-- 1 root root  10575 Jul 17 13:35 HC-PerfHist-TC-MIB
-rw-r--r-- 1 root root  22769 Jul 17 13:50 SNMP-TARGET-MIB
-rw-r--r-- 1 root root  84492 Jul 17 13:54 EtherLike-MIB
-rw-r--r-- 1 root root  68177 Jul 17 13:56 DISMAN-EXPRESSION-MIB

Once you have the MIB files, you need to copy across to the telegraf container:

docker cp /usr/share/snmp/mibs/. telegraf:/usr/share/snmp/mibs/
docker restart telegraf

These two links helped me a lot to troubleshoot and understand snmpwalk:

https://www.dev-eth0.de/2016/12/06/grafana_snmp/

If you dont want to see the OIDS in snmpwalk, you need to load all your MIBS with -m ALL

/usr/share/snmp/mibs$ snmpwalk -Os -v2c -c community -m ALL 172.23.0.2 | head -10
sysDescr.0 = STRING: Linux r01 5.7.0-1-amd64 #1 SMP Debian 5.7.6-1 (2020-06-24) x86_64
sysObjectID.0 = OID: aristaProducts.2600
sysUpTimeInstance = Timeticks: (2503676) 6:57:16.76
sysContact.0 = STRING:
sysName.0 = STRING: r01
sysLocation.0 = STRING: ceoslab
sysServices.0 = INTEGER: 14
sysORLastChange.0 = Timeticks: (35427) 0:05:54.27
sysORID.1 = OID: tcpMIB
sysORID.2 = OID: mib-2.50

$ snmpwalk -v2c -c community -m ALL 172.23.0.2 | head -10
SNMPv2-MIB::sysDescr.0 = STRING: Linux r01 5.7.0-1-amd64 #1 SMP Debian 5.7.6-1 (2020-06-24) x86_64
SNMPv2-MIB::sysObjectID.0 = OID: ARISTA-SMI-MIB::aristaProducts.2600
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2535736) 7:02:37.36
SNMPv2-MIB::sysContact.0 = STRING:
SNMPv2-MIB::sysName.0 = STRING: r01
SNMPv2-MIB::sysLocation.0 = STRING: ceoslab
SNMPv2-MIB::sysServices.0 = INTEGER: 14
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (35427) 0:05:54.27
SNMPv2-MIB::sysORID.1 = OID: TCP-MIB::tcpMIB
SNMPv2-MIB::sysORID.2 = OID: SNMPv2-SMI::mib-2.50

/usr/share/snmp/mibs$ snmpwalk -Os -v2c -c community 172.23.0.2 | head -10
iso.3.6.1.2.1.1.1.0 = STRING: "Linux r01 5.7.0-1-amd64 #1 SMP Debian 5.7.6-1 (2020-06-24) x86_64"
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.30065.1.2600
iso.3.6.1.2.1.1.3.0 = Timeticks: (2505381) 6:57:33.81
iso.3.6.1.2.1.1.4.0 = ""
iso.3.6.1.2.1.1.5.0 = STRING: "r01"
iso.3.6.1.2.1.1.6.0 = STRING: "ceoslab"
iso.3.6.1.2.1.1.7.0 = INTEGER: 14
iso.3.6.1.2.1.1.8.0 = Timeticks: (35427) 0:05:54.27
iso.3.6.1.2.1.1.9.1.2.1 = OID: iso.3.6.1.2.1.49
iso.3.6.1.2.1.1.9.1.2.2 = OID: iso.3.6.1.2.1.50

And if you want to verify that telegraf is capable to use the MIBS:

docker exec -it telegraf snmpwalk -Os -v2c -c community -m ALL 172.23.0.2 | head -10

Now, you can check if telegraf is updating influxdb. If there is output, it is good!

$ curl -G 'https://localhost:8086/query?db=test&pretty=true&u=xxx&p=xxx123' --data-urlencode "q=SELECT * FROM interfaceX limit 2" --insecure
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "interfaceX",
"columns": [
"time",
"agent_host",
"host",
"hostname",
"ifAlias",
"ifConnectorPresent",
"ifCounterDiscontinuityTime",
"ifDescr",
"ifHCInBroadcastPkts",
"ifHCInMulticastPkts",
"ifHCInOctets",
"ifHCInUcastPkts",
"ifHCOutBroadcastPkts",
"ifHCOutMulticastPkts",
"ifHCOutOctets",
"ifHCOutUcastPkts",
"ifHighSpeed",
"ifInBroadcastPkts",
"ifInMulticastPkts",
"ifLinkUpDownTrapEnable",
"ifName",
"ifOutBroadcastPkts",
"ifOutMulticastPkts",
"ifPromiscuousMode",
"name"
],
"values": [
[
"2020-07-17T13:00:10Z",
"172.23.0.2",
"6778bdf4ea85",
"r01",
null,
1,
0,
"Ethernet1",
0,
2118,
2013125,
7824,
0,
0,
0,
0,
0,
0,
2118,
1,
"Ethernet1",
0,
0,
2,
null
],
[
"2020-07-17T13:00:10Z",
"172.23.0.2",
"6778bdf4ea85",
"r01",
"CORE Loopback",
2,
0,
"Loopback1",
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
1,
"Loopback1",
0,
0,
2,
null
]
]
}
]
}
]
}

Grafana

I have seen Grafana before but I have never used it so the configuration on queries was a bit of a challenge but I was lucky and I found very good blogs for that. The installation process is ok:

// Create folder for grafana and data
mkdir -p telemetry-example/grafana/data
cd telemetry-example/grafana

// Create docker instance
docker run -d -p 3000:3000 --name grafana \
--user root \
-v $PWD/data:/var/lib/grafana \
grafana/grafana

// Create SSL cert for grafana
docker exec -it grafana openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/ssl/grafana-selfsigned.key -out /etc/ssl/grafana-selfsigned.crt -days 365 -subj "/C=GB/ST=LDN/L=LDN/O=domain.com/CN=grafana.domain.com"

// Copy grafana config so we can update it
docker cp grafana:/etc/grafana/grafana.ini grafana.ini

// Update grafana config with SSL
vim grafana.ini
############################## Server
[server]
# Protocol (http, https, h2, socket)
protocol = https
…
# https certs & key file
cert_file = /etc/ssl/grafana-selfsigned.crt
cert_key = /etc/ssl/grafana-selfsigned.key

// Copy back the config to the container and restart
docker cp grafana.ini grafana:/etc/grafana/grafana.ini
docker container restart grafana

Now you can open in your browser to grafana “https://0.0.0.0:3000/ ” using admin/admin

You need to add a data source that is our influxdb container. So you need to pick up the “influxdb” type and fill the values as per below.

Now, you need to create a dashboard with panel.

Links that I reviewed for creating the dasbord

For creating a panel. The link below was the best on section “Interface Throughput”. Big thanks to the author.

https://lkhill.com/telegraf-influx-grafana-network-stats/

This is my query for checking all details:

And this is my final dashboard.

SNMP Config

BTW, you need to config SNMP in the switches so telegraf can poll it:

snmp-server location ceoslab
snmp-server community xxx123 ro
snmp-server host 172.17.0.1 version 2c xxx123

In my case, the stack of containers Influx-Telegraf-Grafana are running on the default bridge. Each container has its own IP but as the Arista containers are in the different docker network, it needs to “route” so the IP of telegraf container will be NAT-ed to 172.17.0.1 from the switches point of view.

Next

I would like to manage all this process via Ansible… Something like this.. but will take me time

https://github.com/akarneliuk/service_provider_fabric/tree/master/ansible/roles/cloud_monitoring/tasks

Notes

As usual, I have struggled but I have learned a lot and at the end things are working. I am happy with that.

Debian – apt-file

In the past, I have had to use Centos systems a lot at work and there was something I really liked from rpm, it is “yum provides” that tells you which package you need to install based on the command you need.

I always struggle to do that in Debian. I hope I remember it for the next time. Based on this link:

# aptitude install apt-file

# apt-file update

# apt-file search snmpwalk
libnet-snmp-perl: /usr/share/doc/libnet-snmp-perl/examples/snmpwalk.pl
libsnmp-session-perl: /usr/share/doc/libsnmp-session-perl/examples/snmpwalkh.pl
python3-pysnmp4-apps: /usr/bin/pysnmpwalk
python3-pysnmp4-apps: /usr/share/man/man1/pysnmpwalk.1.gz
snmp: /usr/bin/snmpwalk  <=== THIS IS WHAT I WANT !!!!
snmp: /usr/share/man/man1/snmpwalk.1.gz
snmpsim: /usr/share/doc/snmpsim/examples/data/foreignformats/linux.snmpwalk.gz
snmpsim: /usr/share/doc/snmpsim/examples/data/foreignformats/winxp1.snmpwalk.gz

# aptitude install snmp