Evolved-Indiana

This week I realised that Juniper JunOS was moving to Linux…. called Evolved. I guess they will still be supporting FreeBSD version but long term will be Linux. I am quite surprised as this was really announced early 2020, always late joining the party. So all big boys are running linux at some level: Cisco has done it sometime ago with nx-os, Brocade/Extrene did it too with SLX (based on Ubuntu) and obviously Arista with EOS (based on Fedora). So the trend of more “open” network OS will be on the raise.

And as well, I finished “Indiana Jones and the Temple of Doom” book. Indiana Jones films are among my favourites… although this was was always considered the “worse” (I erased from my mind the “fourth”) I have really enjoyed the book. It was like watching the movie at slow pace and didnt care that I knew the plot. I will get the other books likely.

NTS

From a new Cloudflare post, I learned that NTS is a standard. To be honest, I can’t remember there was work for making NTP secure. In the last years I have seen development in PTP for time sync in financial systems but nothing else. So it is nice to see this happening. We only need to encrypt BGP and we are done in the internet.. oh wait. Dreaming is free.

So I am trying to install and configure NTS in my system following these links: link1 link2

I have just installed ntpsec via debian packages system and that’s it, ntpsec is running…

# apt install ntpsec
...
# service ntpsec status
● ntpsec.service - Network Time Service
Loaded: loaded (/lib/systemd/system/ntpsec.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2020-10-04 20:35:58 BST; 6min ago
Docs: man:ntpd(8)
Main PID: 292116 (ntpd)
Tasks: 1 (limit: 9354)
Memory: 10.2M
CGroup: /system.slice/ntpsec.service
└─292116 /usr/sbin/ntpd -p /run/ntpd.pid -c /etc/ntpsec/ntp.conf -g -N -u ntpsec:ntpsec
Oct 04 20:36:02 athens ntpd[292116]: DNS: dns_check: processing 3.debian.pool.ntp.org, 8, 101
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 81.128.218.110
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 81.128.218.110
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 139.162.219.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 139.162.219.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 62.3.77.2
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 62.3.77.2
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 213.130.44.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 213.130.44.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: dns_take_status: 3.debian.pool.ntp.org=>good, 8
#

Checking the default config, there is nothing configured to use NTS so I made some changes based on the links above:

# vim /etc/ntpsec/ntp.conf
...


# Public NTP servers supporting Network Time Security:
server time.cloudflare.com:1234 nts

# Example 2: NTS-secured NTP (default NTS-KE port (123); using certificate pool of the operating system)
server ntp1.glypnod.com iburst minpoll 3 maxpoll 6 nts

#Via https://www.netnod.se/time-and-frequency/how-to-use-nts
server nts.ntp.se:3443 nts iburst
server nts.sth1.ntp.se:3443 nts iburst
server nts.sth2.ntp.se:3443 nts iburst

After restart, still not seeing NTS in sync πŸ™

# service ntpsec restart
...
# ntpq -puw
remote refid st t when poll reach delay offset jitter
time.cloudflare.com .NTS. 16 0 - 64 0 0ns 0ns 119ns
ntp1.glypnod.com .NTS. 16 5 - 32 0 0ns 0ns 119ns
2a01:3f7:2:202::202 .NTS. 16 1 - 64 0 0ns 0ns 119ns
2a01:3f7:2:52::11 .NTS. 16 1 - 64 0 0ns 0ns 119ns
2a01:3f7:2:62::11 .NTS. 16 1 - 64 0 0ns 0ns 119ns
0.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
1.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
2.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
3.debian.pool.ntp.org .POOL. 16 p - 64 0 0ns 0ns 119ns
-229.191.57.185.no-ptr.as201971.net .GPS. 1 u 25 64 177 65.754ms 26.539ms 7.7279ms
+ns3.turbodns.co.uk 85.199.214.99 2 u 23 64 177 12.200ms 2.5267ms 1.5544ms
+time.cloudflare.com 10.21.8.19 3 u 25 64 177 5.0848ms 2.6248ms 2.6293ms
-ntp1.wirehive.net 202.70.69.81 2 u 21 64 177 9.6036ms 2.3986ms 1.9814ms
+ns4.turbodns.co.uk 195.195.221.100 2 u 21 64 177 10.896ms 2.9528ms 1.5288ms
-lond-web-1.speedwelshpool.com 194.58.204.148 2 u 23 64 177 5.6202ms 5.8218ms 3.2582ms
-time.shf.uk.as44574.net 85.199.214.98 2 u 29 64 77 9.0190ms 4.9419ms 2.5810ms
lux.22pf.org .INIT. 16 u - 64 0 0ns 0ns 119ns
ns1.thorcom.net .INIT. 16 u - 64 0 0ns 0ns 119ns
time.cloudflare.com .INIT. 16 u - 64 0 0ns 0ns 119ns
time.rdg.uk.as44574.net .INIT. 16 u - 64 0 0ns 0ns 119ns
-herm4.doylem.co.uk 185.203.69.150 2 u 19 64 177 15.024ms 9.5098ms 3.2011ms
-213.251.53.217 193.62.22.74 2 u 17 64 177 5.7211ms 1.4122ms 2.1895ms
*babbage.betadome.net 85.199.214.99 2 u 20 64 177 4.8614ms 4.1187ms 2.5533ms
#
#
# ntpq -c nts
NTS client sends: 56
NTS client recvs good: 0
NTS client recvs w error: 0
NTS server recvs good: 0
NTS server recvs w error: 0
NTS server sends: 0
NTS make cookies: 0
NTS decode cookies: 0
NTS decode cookies old: 0
NTS decode cookies too old: 0
NTS decode cookies error: 0
NTS KE probes good: 8
NTS KE probes_bad: 0
NTS KE serves good: 0
NTS KE serves_bad: 0
#

I ran tcpdump filtering on TCP ports 1234 (cloudflare) and 3443 (netnod), and I can see my system trying to negotiate NTS with Cloudflare and NetNod but both sessions are TCP RST πŸ™

Need to carry on researching…

BPF – Linux

Last time I tried BPF was via an Ubuntu VM prepared for BPF. But this week checking another article, I realised that I can run BPF natively in my laptop!!!

So aptitude did the job installing the package, and didn’t have to install a new kernel or patch, so super easy and I can see it is working as based in the article:

# apt depends bpftrace
bpftrace
Depends: libbpfcc (>= 0.12.0)
Depends: libc6 (>= 2.27)
Depends: libclang1-9 (>= 1:9~svn359771-1~)
Depends: libgcc-s1 (>= 3.0)
Depends: libllvm9 (>= 1:9~svn298832-1~)
Depends: libstdc++6 (>= 5.2)
#
#
# dpkg -l | grep bpftrace
ii bpftrace 0.11.0-1 amd64 high-level tracing language for Linux eBPF
#
# uname -a
Linux athens 5.8.0-1-amd64 #1 SMP Debian 5.8.7-1 (2020-09-05) x86_64 GNU/Linux
#
#
# bpftrace -e 'software:faults:1 { @[comm] = count(); }'
Attaching 1 probe…
^C
@[BatteryStatusNo]: 1
@[slack]: 52
@[Xorg]: 139
@[VizCompositorTh]: 455
@[Chrome_IOThread]: 463
@[ThreadPoolForeg]: 1305
@[CompositorTileW]: 2272
@[Compositor]: 3789
@[Chrome_ChildIOT]: 4610
@[chrome]: 8020
#

And run the same script.

# bpftrace bpftrace-example.bt
Attaching 2 probes…
Sampling CPU at 99hz… Hit Ctrl-C to end.
^C
@cpu:
[0, 1) 33 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1, 2) 23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[2, 3) 31 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[3, 4) 23 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
#
Now I really need to play with it in my own system, no excuse...

Screen-Brightness

Another thing I realized lately was that my laptop screen was very dark, not bright at all like my external screen so it was hard to use both. I use Debian Testing LXDE as it is quite light and I dont need anything as heavy as Gnome/KDE. So I struggle how to adjust the brightness but finally got it.

I had to try different programs but finally a blog showed all possibilities and found the one that works for me.

$ brightnessctl set 800 -d intel_backlight

The next thing, I had to be sure that was effective after reboots…. So not sure if this is very clean solution, but I just added that command to my .bashrc. It works. Moving on.

VirtualBox-Python2-Debian-Dependencies

This week I realised that Debian was removing python2 support and surprisingly…. it was trying to remove VirtualBox from my system…

So it seems that VirtualBox is still depending on python2. A bit disappointing.

I am not really keen of VirtualBox but I have had to use it lately for my Kubernetes training and testing OpenBSD. I prefer using kvm/quemu. So I know I will have to workout how to do kubernetes/bsd outside VirtualBox….

Something I learned by the way was to check the dependencies of a package in Debian…. I guess it is about time.

apt-cache depends package-name

ZSWAP

Yesterday read for first time an article about zswap. I though it was something new, but a bit search showed that it started around 2013 as per this article.

I have a 2015 Dell XPS13 with i7, 128GB SSD and 8GB RAM. But some times my systems struggle with memory and when swapping kicks in, the system gets stuck. I have used swappiness in the past but not much improvement (or it was in my former company laptop??). Anyway this is my current swappiness:

# cat /proc/sys/vm/swappiness
60
# sysctl vm.swappiness
vm.swappiness = 60

Even having “suspender” enabled in Chrome, I just have over 2GB RAM free.

$ top
top - 10:23:14 up 6 days, 1:54, 1 user, load average: 0.23, 0.41, 0.44
Tasks: 333 total, 1 running, 332 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.7 us, 0.2 sy, 0.0 ni, 99.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 56.6/7933.8 [||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ]
MiB Swap: 1.3/6964.0 [| ]

Ok, check you have ZSWAP available in your kernel.

$ uname -a
Linux x 5.7.0-2-amd64 #1 SMP Debian 5.7.10-1 (2020-07-26) x86_64 GNU/Linux

$ cat /boot/config-uname -r | grep -i zswap
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
# CONFIG_ZSWAP_DEFAULT_ON is not set

$ cat /sys/module/zswap/parameters/enabled
N

If you have “N”in the last command and “CONFIG_ZSWAP=y” then your systems supports zswap but it is not enabled.

$ echo Y | sudo tee /sys/module/zswap/parameters/enabled

$ cat /sys/module/zswap/parameters/enabled
Y

Now you can tune some parameters to increase compression rate (3:1)

# list all parameters
grep . /sys/module/zswap/parameters/*

# change  compression params
echo z3fold | sudo tee /sys/module/zswap/parameters/zpool
echo lzo | sudo tee /sys/module/zswap/parameters/compressor

How to check zswap is working? I followed this email thread to find some clues:

# cd /sys/kernel/debug/zswap

/sys/kernel/debug/zswap# grep . *
duplicate_entry:0
pool_limit_hit:0
pool_total_size:237568
reject_alloc_fail:0
reject_compress_poor:0
reject_kmemcache_fail:0
reject_reclaim_fail:0
same_filled_pages:33
stored_pages:151
written_back_pages:0

###############
## 12h later ##
###############

/sys/kernel/debug/zswap# grep . *
duplicate_entry:0
pool_limit_hit:0
pool_total_size:17944576
reject_alloc_fail:0
reject_compress_poor:8
reject_kmemcache_fail:0
reject_reclaim_fail:0
same_filled_pages:4146
stored_pages:16927
written_back_pages:0

So it seems some values are increasing.

The difficult part seems to make this change to survive a reboot. The easy way, it is just to update “/etc/rc.local”.

# cat /etc/rc.local
....
# enabling zswap - 2020-08
echo Y >/sys/module/zswap/parameters/enabled
echo z3fold >/sys/module/zswap/parameters/zpool

journalctl -fu

I rebooted my laptop today and realised that docker wasnt running… It was running before the reboot and I didn’t upgrade anything related to docker (or I thought)

$ docker ps -a
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
$

Let’s check status and start if needed:

root@athens:/var/log# service docker status
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-21 08:34:03 BST; 7min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 12015 (code=exited, status=1/FAILURE)
Aug 21 08:34:03 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:34:03 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:34:03 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:34:03 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:34:03 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:34:42 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:34:42 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:34:42 athens systemd[1]: Failed to start Docker Application Container Engine.
root@athens:/var/log#
root@athens:/var/log#
root@athens:/var/log# service docker start
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
root@athens:/var/log# systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-21 08:41:20 BST; 5s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Process: 35305 ExecStart=/usr/sbin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
Main PID: 35305 (code=exited, status=1/FAILURE)
Aug 21 08:41:19 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:41:19 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:19 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:41:20 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:41:20 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:20 athens systemd[1]: Failed to start Docker Application Container Engine.
root@athens:/var/log#

Ok, so not much info… let check the recommend details:

root@athens:/var/log# journalctl -xe
β–‘β–‘ Support: https://www.debian.org/support
β–‘β–‘
β–‘β–‘ A start job for unit docker.socket has begun execution.
β–‘β–‘
β–‘β–‘ The job identifier is 4236.
Aug 21 08:41:20 athens systemd[1]: Listening on Docker Socket for the API.
β–‘β–‘ Subject: A start job for unit docker.socket has finished successfully
β–‘β–‘ Defined-By: systemd
β–‘β–‘ Support: https://www.debian.org/support
β–‘β–‘
β–‘β–‘ A start job for unit docker.socket has finished successfully.
β–‘β–‘
β–‘β–‘ The job identifier is 4236.
Aug 21 08:41:20 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:41:20 athens systemd[1]: docker.service: Failed with result 'exit-code'.
β–‘β–‘ Subject: Unit failed
β–‘β–‘ Defined-By: systemd
β–‘β–‘ Support: https://www.debian.org/support
β–‘β–‘
β–‘β–‘ The unit docker.service has entered the 'failed' state with result 'exit-code'.
Aug 21 08:41:20 athens systemd[1]: Failed to start Docker Application Container Engine.
β–‘β–‘ Subject: A start job for unit docker.service has failed
β–‘β–‘ Defined-By: systemd
β–‘β–‘ Support: https://www.debian.org/support
β–‘β–‘
β–‘β–‘ A start job for unit docker.service has finished with a failure.
β–‘β–‘
β–‘β–‘ The job identifier is 4113 and the job result is failed.
Aug 21 08:41:20 athens systemd[1]: docker.socket: Failed with result 'service-start-limit-hit'.
β–‘β–‘ Subject: Unit failed
β–‘β–‘ Defined-By: systemd
β–‘β–‘ Support: https://www.debian.org/support
β–‘β–‘
β–‘β–‘ The unit docker.socket has entered the 'failed' state with result 'service-start-limit-hit'.
root@athens:/var/log# systemctl status docker.service log
Unit log.service could not be found.
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-21 08:41:20 BST; 1min 2s ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Process: 35305 ExecStart=/usr/sbin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
Main PID: 35305 (code=exited, status=1/FAILURE)
Aug 21 08:41:19 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:41:19 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:19 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:41:20 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:41:20 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:20 athens systemd[1]: Failed to start Docker Application Container Engine.
root@athens:/var/log#

So “journalctl -xe” and “systemctl status docker.service log” gave nothing useful….

So I searched for “docker.socket: Failed with result ‘service-start-limit-hit'” as it was the message that looked more suspicious. I landed here and tried one command to get more logs that I didnt know: “journaltctl -fu docker”

root@athens:/var/log# journalctl -fu docker
-- Logs begin at Sun 2020-02-02 21:12:23 GMT. --
Aug 21 08:42:41 athens dockerd[35469]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaStat
Aug 21 08:42:41 athens dockerd[35469]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaEntry
Aug 21 08:42:41 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:42:41 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:42:41 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:42:41 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:42:41 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:42:41 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:42:41 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:42:41 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:44:32 athens systemd[1]: Starting Docker Application Container Engine…
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.Metrics
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.HugetlbStat
Aug 21 08:44:32 athens dockerd[35538]: unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '"' after object key:value pair
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.PidsStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.CPUStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.CPUUsage
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.Throttle
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.MemoryStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.MemoryEntry
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.BlkIOStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.BlkIOEntry
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaEntry
Aug 21 08:44:32 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:44:32 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:44:32 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:44:32 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 1.
Aug 21 08:44:32 athens systemd[1]: Stopped Docker Application Container Engine.

And now, yes, I could see the docker logs properly… and found the culprit and fixed. I am pretty sure the last time I played with “/etc/docker/daemon.json” I restarted docker and it was fine…

Anyway, I learned a new command “journaltctl -fu SERVICE” to troubleshoot services.

Forward TCPDump to Wireshark

Reading this blog entry I realised that very likely I have never tried forward tcpdump to a wireshark. How many times I have taken a pcap in a switch and then I need to download to see the details in wireshark…

I guess you can find some blocking points in firewalls (at least for 2-steps option)

So I tried the single command with a switch in my ceoslab and it works!

Why it works?

ssh <username>@<switch>Β  "bash tcpdump -s 0 -U -n -w - -i <interface>" | wireshark -k -i -

The ssh command is actually executing the “bash tcpdump…” remotely. But the key is the “-U” and “-w -” flags. “-U” in conjunction with “-w” sends the packet without waiting for the buffer to fill. Then “-w -” says that it writes the output to stdout instead of a file. If you run the command without -U, it would work but it will update a bit slower as it needs to fill the buffers.

From tcpdump manual:

       -U
       --packet-buffered
              If the -w option is not specified, make the printed packet output ``packet-buffered''; i.e., as the description of the contents of each packet is printed, it will be written to the standard  output,  rather  than, when not writing to a terminal, being written only when the output buffer fills.

              If  the  -w  option  is  specified, make the saved raw packet output ``packet-buffered''; i.e., as each packet is saved, it will be written to the output file, rather than being written only when the output buffer fills.

              The -U flag will not be supported if tcpdump was built with an older version of libpcap that lacks the pcap_dump_flush() function.

......
   -w file
          Write the raw packets to file rather than parsing and printing them out.  They can later be printed with the -r option.  Standard output is used if file is ``-''.

          This  output will be buffered if written to a file or pipe, so a program reading from the file or pipe may not see packets for an arbitrary amount of time after they are received.  Use the -U flag to cause packets to be written as soon as they are received.

And the stdout of that process is the ssh command so we redirect that outout via a pipe “|” and it is sent as input for wireshark thanks to “-i -” that makes wireshark to read from stdin (that is the stdout from the tcpdump in the switch!)

The wireshark manual:

       -i|--interface  <capture interface>|-
           Set the name of the network interface or pipe to use for live packet capture.

           Network interface names should match one of the names listed in "wireshark -D" (described above); a number, as reported by "wireshark -D", can also be used.  If you're using UNIX, "netstat -i", "ifconfig -a" or "ip link" might also work to list interface names, although not all versions of UNIX support the -a flag to ifconfig.

           If no interface is specified, Wireshark searches the list of interfaces, choosing the first non-loopback interface if there are any non-loopback interfaces, and choosing the first loopback interface if there are no non-loopback interfaces.  If there are no interfaces at all, Wireshark reports an error and doesn't start the capture.

           Pipe names should be either the name of a FIFO (named pipe) or "-" to read data from the standard input.  On Windows systems, pipe names must be of the form "\\pipe\.\pipename".  Data read from pipes must be in standard pcapng or pcap format. Pcapng data must have the same endianness as the capturing host.

           This option can occur multiple times. When capturing from multiple interfaces, the capture file will be saved in pcapng format.

....

       -k  Start the capture session immediately.  If the -i flag was specified, the capture uses the specified interface.  Otherwise, Wireshark searches the list of interfaces, choosing the first non-loopback interface if
           there are any non-loopback interfaces, and choosing the first loopback interface if there are no non-loopback interfaces; if there are no interfaces, Wireshark reports an error and doesn't start the capture.

The two-steps option relies on “nc” to send/receive the data, but it is the same idea regarding the tcpdump/wireshark flags using “-“

On switch: tcpdump -s 0 -U -n -w - -i <interface> | nc <computer-ip> <port>
On PC: netcat -l -p <port> | wireshark -k -S -i -

Linux Networking – Bonding/Bridging/VxLAN

Bonding

$ sudo modprobe bonding
$ ip link help bond
$ sudo ip link add bond0 type bond mode 802.3ad
$ sudo ip link set eth0 master bond0
$ sudo ip link set eth1 master bond0

Bridging: vlans + trunks

ip neigh show // l2 table
ip route show // l3 table

ip route add default via 192.168.1.1 dev eth1

sudo modprobe 8021q

// create bridge and add links to bridge (switch)
sudo ip link add br0 type bridge vlan_filtering 1 // native vlan = 1
sudo ip link set eth1 master br0
sudo ip link set eth2 master br0
sudo ip link set eth3 master br0

// make eth1 access port for v11
sudo bridge vlan add dev eth1 vid 11 pvid untagged

// make eth3 access port for v12
sudo bridge vlan add dev eth3 vid 12 pvid untagged

// make eth2 trunk port for v11 and v12
sudo bridge vlan add dev eth2 vid 11
sudo bridge vlan add dev eth2 vid 12

// enable bridge and links
sudo ip link set up dev br0
sudo ip link set up dev eth1
sudo ip link set up dev eth2
sudo ip link set up dev eth3

bridge link show
bridge vlan show
bridge fdb show

VxLAN

I havent tried this yet:

Linux System 1
  sudo ip link add br0 type bridge vlan_filtering 1
  sudo ip link add vlan10 type vlan id 10 link bridge protocol none
  sudo ip addr add 10.0.0.1/24 dev vlan10
  sudo ip link add vtep10 type vxlan id 1010 local 10.1.0.1 remote 10.3.0.1 learning
  sudo ip link set eth1 master br0
  sudo bridge vlan add dev eth1 vid 10 pvid untagged

Linux System 2
  sudo ip link add br0 type bridge vlan_filtering 1
  sudo ip link add vlan10 type vlan id 10 link bridge protocol none
  sudo ip addr add 10.0.0.2/24 dev vlan10
  sudo ip link add vtep10 type vxlan id 1010 local 10.3.0.1 remote 10.1.0.1 learning
  sudo ip link set eth1 master br0
  sudo bridge vlan add dev eth1 vid 10 pvid untagged

Monitoring: InfluxDB-Telegraf-Grafana

This is something I wanted to try for some time. Normally for networks monitoring you use a NMS tool. They can be expensive, free or cheap. I have seen/used Observium and LibreNMS. And many years ago Cacti. There are other tools that can do the job like Zabbix/Nagios/Icinga.

But it seems time-series-databases are the new standard. They give you more flexibility as you can create queries and graph them.

There are many tools out there that I dont really know like Prometheus, the elk-stack (Elasticsearch, Logstash, and Kibana), Influxdb, telegraf and grafana.

I decided for InfluxDB-Telegraf-Grafana stuck as I could find quickly info based on scenarios of networks.

What is the rule of eachc one:

Telegraf: collect data
InfluxDB: store data
Grafana: visualize

My main source is again Anton’s blog. All credits to him.


Environment

My network is just 3 Arista ceos containers via docker. All services will run as containers so you need docker installed. Everything is IPv4.

InfluxDB

Installation:

// Create directories
mkdir telemetry-example/influxdb
cd telemetry-example/influxdb

// Get influxdb config
docker run --rm influxdb influxd config > influxdb.conf

// Create local data folder for influxdb that we will map
mkdir data
ls -ltr

// Check docker status
docker images
docker ps -a

// Create docker instance for influxdb. Keep in  mind that I am giving a name to the instance

docker run -d -p 8086:8086 -p 8088:8088 --name influxdb \
-v $PWD/influxdb.conf:/etc/influxdb/influxdb.conf:ro \
-v $PWD/data:/var/lib/influxdb \
influxdb -config /etc/influxdb/influxdb.conf

// Verify connectivity
curl -i http://localhost:8086/ping

// Create database "test" using http-query (link below for more details)
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE test"
{"results":[{"statement_id":0}]} <-- command was ok!

// Create user/pass for your db. 
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE USER xxx WITH PASSWORD 'xxx123' WITH ALL PRIVILEGES"
{"results":[{"statement_id":0}]} <-- command was ok!

// Create SSL cert for influxdb
docker exec -it influxdb openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/ssl/influxdb-selfsigned.key -out /etc/ssl/influxdb-selfsigned.crt -days 365 -subj "/C=GB/ST=LDN/L=LDN/O=domain.com/CN=influxdb.domain.com"

// Update influxdb.conf for SSL
telemetry-example/influxdb$ vim influxdb.conf
…
https-enabled = true
https-certificate = "/etc/ssl/influxdb-selfsigned.crt"
https-private-key = "/etc/ssl/influxdb-selfsigned.key"
…

// Restart influxdb to take the changes
docker restart influxdb

// Get influxdb IP for using it later
docker container inspect influxdb --format='{{ .NetworkSettings.IPAddress }}'
172.17.0.2

// Verify connectivity via https
curl -i https://localhost:8086/ping --insecure

The verification for HTTPS was a bit more difficult because the result was always correct no matter what query I was running:

$ curl -G https://localhost:8086/query --data-urlencode "db=test" --data-urlencode "q=SELECT * FROM test" --insecure
{"results":[{"statement_id":0}]}

$ curl -XPOST 'https://localhost:8086/query?db=test&u=xxx&p=xxx123' --data-urlencode 'q=SELECT * FROM test' --insecure
{"results":[{"statement_id":0}]}

$ curl -XPOST 'https://localhost:8086/query?db=test&u=xxx&p=yyy1231' --data-urlencode 'q=SELECT * FROM test' --insecure
{"results":[{"statement_id":0}]}

So I decided to see if there was cli/shell for the influxdb (like in mysql, etc). And yes, there is one. Keep in mind that you have to use “-ssl -unsafeSsl” at the same time! That confused me a lot.

$ docker exec -it influxdb influx -ssl -unsafeSsl
Connected to https://localhost:8086 version 1.8.1
InfluxDB shell version: 1.8.1
> show databases
name: databases
name
_internal
test
> use test
Using database test
> show series
key
cpu,cpu=cpu-total,host=5f7aa2c5550e

Links about influxdb that are good for the docker creation and the http queries:

https://hub.docker.com/_/influxdb

https://docs.influxdata.com/influxdb/v1.7/tools/api/#query-http-endpoin

Telegraf

I struggled with the SNMP config needed in Telegraf. The installation was fine.

Links:

https://hub.docker.com/_/telegraf

https://docs.influxdata.com/telegraf/v1.14/introduction/getting-started/

Steps:

// Create dir
mkdir telemetry-example/telegraf
cd telemetry-example/telegraf

// Get config file to be modified
docker run --rm telegraf telegraf config > telegraf.conf

// Add the details of influxdb in telegraf.conf. As well, you need to add the devices you want to poll. In my case 172.23.0.2/3/4.

vim telegraf.conf
....
[[outputs.influxdb]]
urls = ["https://172.17.0.2:8086"]
database = "test"
skip_database_creation = false
## Timeout for HTTP messages.
timeout = "5s"
## HTTP Basic Auth
username = "xxx"
password = "xxx123"
## Use TLS but skip chain & host verification
insecure_skip_verify = true
# Retrieves SNMP values from remote agents
[[inputs.snmp]]
## Agent addresses to retrieve values from.
## example: agents = ["udp://127.0.0.1:161"]
## agents = ["tcp://127.0.0.1:161"]
agents = ["udp://172.23.0.2:161","udp://172.23.0.3:161","udp://172.23.0.4:161"]
#
## Timeout for each request.
timeout = "5s"
#
## SNMP version; can be 1, 2, or 3.
version = 2
#
## SNMP community string.
community = "tomas123"
#
## Number of retries to attempt.
retries = 3

This is the SNMP config I added below the SNMPv3 options in [[inputs.snmp]]

#   ## Add fields and tables defining the variables you wish to collect.  This
#   ## example collects the system uptime and interface variables.  Reference the
#   ## full plugin documentation for configuration details.

  [[inputs.snmp.field]]
    name = "hostname"
    oid = "RFC1213-MIB::sysName.0"
    is_tag = true

  [[inputs.snmp.field]]
    name = "uptime"
    oid = "DISMAN-EVENT-MIB::sysUpTimeInstance"

  # IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards.
  [[inputs.snmp.table]]
    name = "interface"
    inherit_tags = [ "hostname" ]
    oid = "IF-MIB::ifTable"

    # Interface tag - used to identify interface in metrics database
    [[inputs.snmp.table.field]]
      name = "ifDescr"
      oid = "IF-MIB::ifDescr"
      is_tag = true

  # IF-MIB::ifXTable contains newer High Capacity (HC) counters that do not overflow as fast for a few of the ifTable counters
  [[inputs.snmp.table]]
    name = "interfaceX"
    inherit_tags = [ "hostname" ]
    oid = "IF-MIB::ifXTable"

    # Interface tag - used to identify interface in metrics database
    [[inputs.snmp.table.field]]
      name = "ifDescr"
      oid = "IF-MIB::ifDescr"
      is_tag = true

  # EtherLike-MIB::dot3StatsTable contains detailed ethernet-level information about what kind of errors have been logged on an interface (such as FCS error, frame too long, etc)
  [[inputs.snmp.table]]
    name = "interface"
    inherit_tags = [ "hostname" ]
    oid = "EtherLike-MIB::dot3StatsTable"

    # Interface tag - used to identify interface in metrics database
    [[inputs.snmp.table.field]]
      name = "name"
      oid = "IF-MIB::ifDescr"
      is_tag = true

For more info about the SNMP config in telegraf. These are good links. This is the official github page. And this is the page for SNMP input plugin that explain the differences between “field” and “table”.

As well, the link below is really good too for explaining the SNMP config in telegraf:”Gathering Data via SNMP”

https://blog.networktocode.com/post/network_telemetry_for_snmp_devices/

Start the container:

docker run -d -p 8125:8125 -p 8092:8092 -p 8094:8094 --name telegraf \
-v $PWD/telegraf.conf:/etc/telegraf/telegraf.conf:ro \
telegraf -config /etc/telegraf/telegraf.conf

Check the logs:

docker logs telegraf -f
...
2020-07-17T12:45:10Z E! [inputs.snmp] Error in plugin: initializing table interface: translating: exit status 2: MIB search path: /root/.snmp/mibs:/usr/share/snmp/mibs:/usr/share/snmp/mibs/iana:/usr/share/snmp/mibs/ietf:/usr/share/mibs/site:/usr/share/snmp/mibs:/usr/share/mibs/iana:/usr/share/mibs/ietf:/usr/share/mibs/netsnmp
Cannot find module (EtherLike-MIB): At line 0 in (none)
EtherLike-MIB::dot3StatsTable: Unknown Object Identifier
...

You will see errors about not able to find the MIB files! So I used Librenms mibs. I download the project and copied the MIBS I thought I needed (arista and some other that dont belong to a vendor). As well, this is noted by Anton’s in this link:

https://github.com/librenms/librenms/tree/master/mibs

In my case:

/usr/share/snmp/mibs$ ls -ltr
total 4672
-rw-r--r-- 1 root root  52820 Feb  7  2019 UCD-SNMP-MIB.txt
-rw-r--r-- 1 root root  18274 Feb  7  2019 UCD-SNMP-MIB-OLD.txt
-rw-r--r-- 1 root root   8118 Feb  7  2019 UCD-IPFWACC-MIB.txt
-rw-r--r-- 1 root root   6476 Feb  7  2019 UCD-IPFILTER-MIB.txt
-rw-r--r-- 1 root root   3087 Feb  7  2019 UCD-DLMOD-MIB.txt
-rw-r--r-- 1 root root   4965 Feb  7  2019 UCD-DISKIO-MIB.txt
-rw-r--r-- 1 root root   2163 Feb  7  2019 UCD-DEMO-MIB.txt
-rw-r--r-- 1 root root   5039 Feb  7  2019 NET-SNMP-VACM-MIB.txt
-rw-r--r-- 1 root root   4814 Feb  7  2019 NET-SNMP-TC.txt
-rw-r--r-- 1 root root   1226 Feb  7  2019 NET-SNMP-SYSTEM-MIB.txt
-rw-r--r-- 1 root root   2504 Feb  7  2019 NET-SNMP-PERIODIC-NOTIFY-MIB.txt
-rw-r--r-- 1 root root   3730 Feb  7  2019 NET-SNMP-PASS-MIB.txt
-rw-r--r-- 1 root root   1215 Feb  7  2019 NET-SNMP-MONITOR-MIB.txt
-rw-r--r-- 1 root root   2036 Feb  7  2019 NET-SNMP-MIB.txt
-rw-r--r-- 1 root root   9326 Feb  7  2019 NET-SNMP-EXTEND-MIB.txt
-rw-r--r-- 1 root root   9160 Feb  7  2019 NET-SNMP-EXAMPLES-MIB.txt
-rw-r--r-- 1 root root  15901 Feb  7  2019 NET-SNMP-AGENT-MIB.txt
-rw-r--r-- 1 root root   5931 Feb  7  2019 LM-SENSORS-MIB.txt
-rw-r--r-- 1 root root   1913 Jul  2 04:38 GNOME-SMI.txt
-rw-r--r-- 1 root root   5775 Jul 17 13:14 SNMPv2-TM
-rw-r--r-- 1 root root   2501 Jul 17 13:14 SNMPv2-TC-v1
-rw-r--r-- 1 root root  38034 Jul 17 13:14 SNMPv2-TC
-rw-r--r-- 1 root root   1371 Jul 17 13:14 SNMPv2-SMI-v1
-rw-r--r-- 1 root root   8924 Jul 17 13:14 SNMPv2-SMI
-rw-r--r-- 1 root root  29305 Jul 17 13:14 SNMPv2-MIB
-rw-r--r-- 1 root root   8263 Jul 17 13:14 SNMPv2-CONF
-rw-r--r-- 1 root root  17177 Jul 17 13:14 INET-ADDRESS-MIB
-rw-r--r-- 1 root root  71691 Jul 17 13:14 IF-MIB
-rw-r--r-- 1 root root   3129 Jul 17 13:15 ARISTA-BGP4V2-TC-MIB
-rw-r--r-- 1 root root  64691 Jul 17 13:15 ARISTA-BGP4V2-MIB
-rw-r--r-- 1 root root   7155 Jul 17 13:15 ARISTA-VRF-MIB
-rw-r--r-- 1 root root   1964 Jul 17 13:15 ARISTA-SMI-MIB
-rw-r--r-- 1 root root  10901 Jul 17 13:15 ARISTA-NEXTHOP-GROUP-MIB
-rw-r--r-- 1 root root   5826 Jul 17 13:15 ARISTA-IF-MIB
-rw-r--r-- 1 root root   4547 Jul 17 13:15 ARISTA-GENERAL-MIB
-rw-r--r-- 1 root root   7014 Jul 17 13:15 ARISTA-ENTITY-SENSOR-MIB
-rw-r--r-- 1 root root  62277 Jul 17 13:21 IANA-PRINTER-MIB
-rw-r--r-- 1 root root  36816 Jul 17 13:21 IANA-MAU-MIB
-rw-r--r-- 1 root root   4299 Jul 17 13:21 IANA-LANGUAGE-MIB
-rw-r--r-- 1 root root  13954 Jul 17 13:21 IANA-ITU-ALARM-TC-MIB
-rw-r--r-- 1 root root  35628 Jul 17 13:21 IANAifType-MIB
-rw-r--r-- 1 root root  15150 Jul 17 13:21 IANA-GMPLS-TC-MIB
-rw-r--r-- 1 root root  10568 Jul 17 13:21 IANA-CHARSET-MIB
-rw-r--r-- 1 root root   4743 Jul 17 13:21 IANA-ADDRESS-FAMILY-NUMBERS-MIB
-rw-r--r-- 1 root root   3518 Jul 17 13:21 IANA-RTPROTO-MIB
-rw-r--r-- 1 root root  13100 Jul 17 13:31 ENTITY-STATE-MIB
-rw-r--r-- 1 root root  16248 Jul 17 13:31 ENTITY-SENSOR-MIB
-rw-r--r-- 1 root root  59499 Jul 17 13:31 ENTITY-MIB
-rw-r--r-- 1 root root   2114 Jul 17 13:31 BGP4V2-TC-MIB
-rw-r--r-- 1 root root  50513 Jul 17 13:31 BGP4-MIB
-rw-r--r-- 1 root root  96970 Jul 17 13:31 IEEE8021-CFMD8-MIB
-rw-r--r-- 1 root root  86455 Jul 17 13:31 IEEE8021-BRIDGE-MIB
-rw-r--r-- 1 root root 113507 Jul 17 13:31 IEEE802171-CFM-MIB
-rw-r--r-- 1 root root   6321 Jul 17 13:31 ENTITY-STATE-TC-MIB
-rw-r--r-- 1 root root 112096 Jul 17 13:31 IEEE802dot11-MIB
-rw-r--r-- 1 root root  44559 Jul 17 13:31 IEEE8023-LAG-MIB
-rw-r--r-- 1 root root  24536 Jul 17 13:31 IEEE8021-TC-MIB
-rw-r--r-- 1 root root  62182 Jul 17 13:31 IEEE8021-SECY-MIB
-rw-r--r-- 1 root root  96020 Jul 17 13:31 IEEE8021-Q-BRIDGE-MIB
-rw-r--r-- 1 root root  62591 Jul 17 13:31 IEEE8021-PAE-MIB
-rw-r--r-- 1 root root 135156 Jul 17 13:31 IEEE8021-CFM-MIB
-rw-r--r-- 1 root root  48703 Jul 17 13:31 IPV6-MIB
-rw-r--r-- 1 root root  15936 Jul 17 13:31 IPV6-ICMP-MIB
-rw-r--r-- 1 root root   3768 Jul 17 13:31 IPV6-FLOW-LABEL-MIB
-rw-r--r-- 1 root root  31323 Jul 17 13:31 IPMROUTE-STD-MIB
-rw-r--r-- 1 root root  33626 Jul 17 13:31 IPMROUTE-MIB
-rw-r--r-- 1 root root 186550 Jul 17 13:31 IP-MIB
-rw-r--r-- 1 root root  46366 Jul 17 13:31 IP-FORWARD-MIB
-rw-r--r-- 1 root root 240526 Jul 17 13:31 IEEE-802DOT17-RPR-MIB
-rw-r--r-- 1 root root   4400 Jul 17 13:31 IPV6-UDP-MIB
-rw-r--r-- 1 root root   7257 Jul 17 13:31 IPV6-TCP-MIB
-rw-r--r-- 1 root root   2367 Jul 17 13:31 IPV6-TC
-rw-r--r-- 1 root root  14758 Jul 17 13:31 IPV6-MLD-MIB
-rw-r--r-- 1 root root  69938 Jul 17 13:31 MPLS-LSR-MIB
-rw-r--r-- 1 root root  61017 Jul 17 13:31 MPLS-L3VPN-STD-MIB
-rw-r--r-- 1 root root  16414 Jul 17 13:31 LLDP-V2-TC-MIB.mib
-rw-r--r-- 1 root root  77651 Jul 17 13:31 LLDP-V2-MIB.mib
-rw-r--r-- 1 root root  76945 Jul 17 13:31 LLDP-MIB
-rw-r--r-- 1 root root  59110 Jul 17 13:31 LLDP-EXT-MED-MIB
-rw-r--r-- 1 root root  30192 Jul 17 13:31 LLDP-EXT-DOT3-MIB
-rw-r--r-- 1 root root  30182 Jul 17 13:31 LLDP-EXT-DOT1-MIB
-rw-r--r-- 1 root root  36147 Jul 17 13:31 LLDP-EXT-DCBX-MIB
-rw-r--r-- 1 root root 145796 Jul 17 13:31 ISIS-MIB
-rw-r--r-- 1 root root  28564 Jul 17 13:31 TCP-MIB
-rw-r--r-- 1 root root   1291 Jul 17 13:31 RFC-1215
-rw-r--r-- 1 root root  79667 Jul 17 13:31 RFC1213-MIB
-rw-r--r-- 1 root root   2866 Jul 17 13:31 RFC-1212
-rw-r--r-- 1 root root   3067 Jul 17 13:31 RFC1155-SMI
-rw-r--r-- 1 root root  60053 Jul 17 13:31 MPLS-VPN-MIB
-rw-r--r-- 1 root root  95418 Jul 17 13:31 MPLS-TE-STD-MIB
-rw-r--r-- 1 root root  55490 Jul 17 13:31 MPLS-TE-MIB
-rw-r--r-- 1 root root  26327 Jul 17 13:31 MPLS-TC-STD-MIB
-rw-r--r-- 1 root root  76361 Jul 17 13:31 MPLS-LSR-STD-MIB
-rw-r--r-- 1 root root  12931 Jul 17 13:31 RFC1389-MIB
-rw-r--r-- 1 root root  30091 Jul 17 13:31 RFC1284-MIB
-rw-r--r-- 1 root root 147614 Jul 17 13:31 RFC1271-MIB
-rw-r--r-- 1 root root  22342 Jul 17 13:33 SNMP-FRAMEWORK-MIB
-rw-r--r-- 1 root root 223833 Jul 17 13:33 RMON2-MIB
-rw-r--r-- 1 root root 127407 Jul 17 13:33 DIFFSERV-MIB
-rw-r--r-- 1 root root 101324 Jul 17 13:34 TOKEN-RING-RMON-MIB
-rw-r--r-- 1 root root 147822 Jul 17 13:34 RMON-MIB
-rw-r--r-- 1 root root  26750 Jul 17 13:34 INTEGRATED-SERVICES-MIB
-rw-r--r-- 1 root root   1863 Jul 17 13:34 DIFFSERV-DSCP-TC
-rw-r--r-- 1 root root  34162 Jul 17 13:34 SNMP-VIEW-BASED-ACM-MIB
-rw-r--r-- 1 root root  84133 Jul 17 13:34 Q-BRIDGE-MIB
-rw-r--r-- 1 root root  16414 Jul 17 13:34 TRANSPORT-ADDRESS-MIB
-rw-r--r-- 1 root root  39879 Jul 17 13:35 P-BRIDGE-MIB
-rw-r--r-- 1 root root   4660 Jul 17 13:35 HCNUM-TC
-rw-r--r-- 1 root root  54884 Jul 17 13:35 BRIDGE-MIB
-rw-r--r-- 1 root root   2628 Jul 17 13:35 VPN-TC-STD-MIB
-rw-r--r-- 1 root root  10575 Jul 17 13:35 HC-PerfHist-TC-MIB
-rw-r--r-- 1 root root  22769 Jul 17 13:50 SNMP-TARGET-MIB
-rw-r--r-- 1 root root  84492 Jul 17 13:54 EtherLike-MIB
-rw-r--r-- 1 root root  68177 Jul 17 13:56 DISMAN-EXPRESSION-MIB

Once you have the MIB files, you need to copy across to the telegraf container:

docker cp /usr/share/snmp/mibs/. telegraf:/usr/share/snmp/mibs/
docker restart telegraf

These two links helped me a lot to troubleshoot and understand snmpwalk:

https://www.dev-eth0.de/2016/12/06/grafana_snmp/

If you dont want to see the OIDS in snmpwalk, you need to load all your MIBS with -m ALL

/usr/share/snmp/mibs$ snmpwalk -Os -v2c -c community -m ALL 172.23.0.2 | head -10
sysDescr.0 = STRING: Linux r01 5.7.0-1-amd64 #1 SMP Debian 5.7.6-1 (2020-06-24) x86_64
sysObjectID.0 = OID: aristaProducts.2600
sysUpTimeInstance = Timeticks: (2503676) 6:57:16.76
sysContact.0 = STRING:
sysName.0 = STRING: r01
sysLocation.0 = STRING: ceoslab
sysServices.0 = INTEGER: 14
sysORLastChange.0 = Timeticks: (35427) 0:05:54.27
sysORID.1 = OID: tcpMIB
sysORID.2 = OID: mib-2.50

$ snmpwalk -v2c -c community -m ALL 172.23.0.2 | head -10
SNMPv2-MIB::sysDescr.0 = STRING: Linux r01 5.7.0-1-amd64 #1 SMP Debian 5.7.6-1 (2020-06-24) x86_64
SNMPv2-MIB::sysObjectID.0 = OID: ARISTA-SMI-MIB::aristaProducts.2600
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (2535736) 7:02:37.36
SNMPv2-MIB::sysContact.0 = STRING:
SNMPv2-MIB::sysName.0 = STRING: r01
SNMPv2-MIB::sysLocation.0 = STRING: ceoslab
SNMPv2-MIB::sysServices.0 = INTEGER: 14
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (35427) 0:05:54.27
SNMPv2-MIB::sysORID.1 = OID: TCP-MIB::tcpMIB
SNMPv2-MIB::sysORID.2 = OID: SNMPv2-SMI::mib-2.50

/usr/share/snmp/mibs$ snmpwalk -Os -v2c -c community 172.23.0.2 | head -10
iso.3.6.1.2.1.1.1.0 = STRING: "Linux r01 5.7.0-1-amd64 #1 SMP Debian 5.7.6-1 (2020-06-24) x86_64"
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.30065.1.2600
iso.3.6.1.2.1.1.3.0 = Timeticks: (2505381) 6:57:33.81
iso.3.6.1.2.1.1.4.0 = ""
iso.3.6.1.2.1.1.5.0 = STRING: "r01"
iso.3.6.1.2.1.1.6.0 = STRING: "ceoslab"
iso.3.6.1.2.1.1.7.0 = INTEGER: 14
iso.3.6.1.2.1.1.8.0 = Timeticks: (35427) 0:05:54.27
iso.3.6.1.2.1.1.9.1.2.1 = OID: iso.3.6.1.2.1.49
iso.3.6.1.2.1.1.9.1.2.2 = OID: iso.3.6.1.2.1.50

And if you want to verify that telegraf is capable to use the MIBS:

docker exec -it telegraf snmpwalk -Os -v2c -c community -m ALL 172.23.0.2 | head -10

Now, you can check if telegraf is updating influxdb. If there is output, it is good!

$ curl -G 'https://localhost:8086/query?db=test&pretty=true&u=xxx&p=xxx123' --data-urlencode "q=SELECT * FROM interfaceX limit 2" --insecure
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "interfaceX",
"columns": [
"time",
"agent_host",
"host",
"hostname",
"ifAlias",
"ifConnectorPresent",
"ifCounterDiscontinuityTime",
"ifDescr",
"ifHCInBroadcastPkts",
"ifHCInMulticastPkts",
"ifHCInOctets",
"ifHCInUcastPkts",
"ifHCOutBroadcastPkts",
"ifHCOutMulticastPkts",
"ifHCOutOctets",
"ifHCOutUcastPkts",
"ifHighSpeed",
"ifInBroadcastPkts",
"ifInMulticastPkts",
"ifLinkUpDownTrapEnable",
"ifName",
"ifOutBroadcastPkts",
"ifOutMulticastPkts",
"ifPromiscuousMode",
"name"
],
"values": [
[
"2020-07-17T13:00:10Z",
"172.23.0.2",
"6778bdf4ea85",
"r01",
null,
1,
0,
"Ethernet1",
0,
2118,
2013125,
7824,
0,
0,
0,
0,
0,
0,
2118,
1,
"Ethernet1",
0,
0,
2,
null
],
[
"2020-07-17T13:00:10Z",
"172.23.0.2",
"6778bdf4ea85",
"r01",
"CORE Loopback",
2,
0,
"Loopback1",
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
1,
"Loopback1",
0,
0,
2,
null
]
]
}
]
}
]
}

Grafana

I have seen Grafana before but I have never used it so the configuration on queries was a bit of a challenge but I was lucky and I found very good blogs for that. The installation process is ok:

// Create folder for grafana and data
mkdir -p telemetry-example/grafana/data
cd telemetry-example/grafana

// Create docker instance
docker run -d -p 3000:3000 --name grafana \
--user root \
-v $PWD/data:/var/lib/grafana \
grafana/grafana

// Create SSL cert for grafana
docker exec -it grafana openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/ssl/grafana-selfsigned.key -out /etc/ssl/grafana-selfsigned.crt -days 365 -subj "/C=GB/ST=LDN/L=LDN/O=domain.com/CN=grafana.domain.com"

// Copy grafana config so we can update it
docker cp grafana:/etc/grafana/grafana.ini grafana.ini

// Update grafana config with SSL
vim grafana.ini
############################## Server
[server]
# Protocol (http, https, h2, socket)
protocol = https
…
# https certs & key file
cert_file = /etc/ssl/grafana-selfsigned.crt
cert_key = /etc/ssl/grafana-selfsigned.key

// Copy back the config to the container and restart
docker cp grafana.ini grafana:/etc/grafana/grafana.ini
docker container restart grafana

Now you can open in your browser to grafana “https://0.0.0.0:3000/ ” using admin/admin

You need to add a data source that is our influxdb container. So you need to pick up the “influxdb” type and fill the values as per below.

Now, you need to create a dashboard with panel.

Links that I reviewed for creating the dasbord

For creating a panel. The link below was the best on section “Interface Throughput”. Big thanks to the author.

https://lkhill.com/telegraf-influx-grafana-network-stats/

This is my query for checking all details:

And this is my final dashboard.

SNMP Config

BTW, you need to config SNMP in the switches so telegraf can poll it:

snmp-server location ceoslab
snmp-server community xxx123 ro
snmp-server host 172.17.0.1 version 2c xxx123

In my case, the stack of containers Influx-Telegraf-Grafana are running on the default bridge. Each container has its own IP but as the Arista containers are in the different docker network, it needs to “route” so the IP of telegraf container will be NAT-ed to 172.17.0.1 from the switches point of view.

Next

I would like to manage all this process via Ansible… Something like this.. but will take me time

https://github.com/akarneliuk/service_provider_fabric/tree/master/ansible/roles/cloud_monitoring/tasks

Notes

As usual, I have struggled but I have learned a lot and at the end things are working. I am happy with that.