This week I realised that Juniper JunOS was moving to Linux…. called Evolved. I guess they will still be supporting FreeBSD version but long term will be Linux. I am quite surprised as this was really announced early 2020, always late joining the party. So all big boys are running linux at some level: Cisco has done it sometime ago with nx-os, Brocade/Extrene did it too with SLX (based on Ubuntu) and obviously Arista with EOS (based on Fedora). So the trend of more “open” network OS will be on the raise.
And as well, I finished “Indiana Jones and the Temple of Doom” book. Indiana Jones films are among my favourites… although this was was always considered the “worse” (I erased from my mind the “fourth”) I have really enjoyed the book. It was like watching the movie at slow pace and didnt care that I knew the plot. I will get the other books likely.
From a new Cloudflare post, I learned that NTS is a standard. To be honest, I can’t remember there was work for making NTP secure. In the last years I have seen development in PTP for time sync in financial systems but nothing else. So it is nice to see this happening. We only need to encrypt BGP and we are done in the internet.. oh wait. Dreaming is free.
So I am trying to install and configure NTS in my system following these links: link1link2
I have just installed ntpsec via debian packages system and that’s it, ntpsec is running…
# apt install ntpsec
...
# service ntpsec status
β ntpsec.service - Network Time Service
Loaded: loaded (/lib/systemd/system/ntpsec.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2020-10-04 20:35:58 BST; 6min ago
Docs: man:ntpd(8)
Main PID: 292116 (ntpd)
Tasks: 1 (limit: 9354)
Memory: 10.2M
CGroup: /system.slice/ntpsec.service
ββ292116 /usr/sbin/ntpd -p /run/ntpd.pid -c /etc/ntpsec/ntp.conf -g -N -u ntpsec:ntpsec
Oct 04 20:36:02 athens ntpd[292116]: DNS: dns_check: processing 3.debian.pool.ntp.org, 8, 101
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 81.128.218.110
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 81.128.218.110
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 139.162.219.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 139.162.219.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 62.3.77.2
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 62.3.77.2
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool taking: 213.130.44.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: Pool poking hole in restrictions for: 213.130.44.252
Oct 04 20:36:02 athens ntpd[292116]: DNS: dns_take_status: 3.debian.pool.ntp.org=>good, 8
#
Checking the default config, there is nothing configured to use NTS so I made some changes based on the links above:
# vim /etc/ntpsec/ntp.conf
...
# Public NTP servers supporting Network Time Security:
server time.cloudflare.com:1234 nts
# Example 2: NTS-secured NTP (default NTS-KE port (123); using certificate pool of the operating system)
server ntp1.glypnod.com iburst minpoll 3 maxpoll 6 nts
#Via https://www.netnod.se/time-and-frequency/how-to-use-nts
server nts.ntp.se:3443 nts iburst
server nts.sth1.ntp.se:3443 nts iburst
server nts.sth2.ntp.se:3443 nts iburst
After restart, still not seeing NTS in sync π
# service ntpsec restart
...
# ntpq -puw
remote refid st t when poll reach delay offset jitter
time.cloudflare.com .NTS. 16 0 - 64 0 0ns 0ns 119ns
ntp1.glypnod.com .NTS. 16 5 - 32 0 0ns 0ns 119ns
2a01:3f7:2:202::202 .NTS. 16 1 - 64 0 0ns 0ns 119ns
2a01:3f7:2:52::11 .NTS. 16 1 - 64 0 0ns 0ns 119ns
2a01:3f7:2:62::11 .NTS. 16 1 - 64 0 0ns 0ns 119ns
0.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
1.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
2.debian.pool.ntp.org .POOL. 16 p - 256 0 0ns 0ns 119ns
3.debian.pool.ntp.org .POOL. 16 p - 64 0 0ns 0ns 119ns
-229.191.57.185.no-ptr.as201971.net .GPS. 1 u 25 64 177 65.754ms 26.539ms 7.7279ms
+ns3.turbodns.co.uk 85.199.214.99 2 u 23 64 177 12.200ms 2.5267ms 1.5544ms
+time.cloudflare.com 10.21.8.19 3 u 25 64 177 5.0848ms 2.6248ms 2.6293ms
-ntp1.wirehive.net 202.70.69.81 2 u 21 64 177 9.6036ms 2.3986ms 1.9814ms
+ns4.turbodns.co.uk 195.195.221.100 2 u 21 64 177 10.896ms 2.9528ms 1.5288ms
-lond-web-1.speedwelshpool.com 194.58.204.148 2 u 23 64 177 5.6202ms 5.8218ms 3.2582ms
-time.shf.uk.as44574.net 85.199.214.98 2 u 29 64 77 9.0190ms 4.9419ms 2.5810ms
lux.22pf.org .INIT. 16 u - 64 0 0ns 0ns 119ns
ns1.thorcom.net .INIT. 16 u - 64 0 0ns 0ns 119ns
time.cloudflare.com .INIT. 16 u - 64 0 0ns 0ns 119ns
time.rdg.uk.as44574.net .INIT. 16 u - 64 0 0ns 0ns 119ns
-herm4.doylem.co.uk 185.203.69.150 2 u 19 64 177 15.024ms 9.5098ms 3.2011ms
-213.251.53.217 193.62.22.74 2 u 17 64 177 5.7211ms 1.4122ms 2.1895ms
*babbage.betadome.net 85.199.214.99 2 u 20 64 177 4.8614ms 4.1187ms 2.5533ms
#
#
# ntpq -c nts
NTS client sends: 56
NTS client recvs good: 0
NTS client recvs w error: 0
NTS server recvs good: 0
NTS server recvs w error: 0
NTS server sends: 0
NTS make cookies: 0
NTS decode cookies: 0
NTS decode cookies old: 0
NTS decode cookies too old: 0
NTS decode cookies error: 0
NTS KE probes good: 8
NTS KE probes_bad: 0
NTS KE serves good: 0
NTS KE serves_bad: 0
#
I ran tcpdump filtering on TCP ports 1234 (cloudflare) and 3443 (netnod), and I can see my system trying to negotiate NTS with Cloudflare and NetNod but both sessions are TCP RST π
Last time I tried BPF was via an Ubuntu VM prepared for BPF. But this week checking another article, I realised that I can run BPF natively in my laptop!!!
So aptitude did the job installing the package, and didn’t have to install a new kernel or patch, so super easy and I can see it is working as based in the article:
Another thing I realized lately was that my laptop screen was very dark, not bright at all like my external screen so it was hard to use both. I use Debian Testing LXDE as it is quite light and I dont need anything as heavy as Gnome/KDE. So I struggle how to adjust the brightness but finally got it.
I had to try different programs but finally a blog showed all possibilities and found the one that works for me.
$ brightnessctl set 800 -d intel_backlight
The next thing, I had to be sure that was effective after reboots…. So not sure if this is very clean solution, but I just added that command to my .bashrc. It works. Moving on.
This week I realised that Debian was removing python2 support and surprisingly…. it was trying to remove VirtualBox from my system…
So it seems that VirtualBox is still depending on python2. A bit disappointing.
I am not really keen of VirtualBox but I have had to use it lately for my Kubernetes training and testing OpenBSD. I prefer using kvm/quemu. So I know I will have to workout how to do kubernetes/bsd outside VirtualBox….
Something I learned by the way was to check the dependencies of a package in Debian…. I guess it is about time.
Yesterday read for first time an article about zswap. I though it was something new, but a bit search showed that it started around 2013 as per this article.
I have a 2015 Dell XPS13 with i7, 128GB SSD and 8GB RAM. But some times my systems struggle with memory and when swapping kicks in, the system gets stuck. I have used swappiness in the past but not much improvement (or it was in my former company laptop??). Anyway this is my current swappiness:
Ok, check you have ZSWAP available in your kernel.
$ uname -a
Linux x 5.7.0-2-amd64 #1 SMP Debian 5.7.10-1 (2020-07-26) x86_64 GNU/Linux
$ cat /boot/config-uname -r | grep -i zswap
CONFIG_ZSWAP=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_DEFLATE is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZO=y
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_842 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4 is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_LZ4HC is not set
# CONFIG_ZSWAP_COMPRESSOR_DEFAULT_ZSTD is not set
CONFIG_ZSWAP_COMPRESSOR_DEFAULT="lzo"
CONFIG_ZSWAP_ZPOOL_DEFAULT_ZBUD=y
# CONFIG_ZSWAP_ZPOOL_DEFAULT_Z3FOLD is not set
# CONFIG_ZSWAP_ZPOOL_DEFAULT_ZSMALLOC is not set
CONFIG_ZSWAP_ZPOOL_DEFAULT="zbud"
# CONFIG_ZSWAP_DEFAULT_ON is not set
$ cat /sys/module/zswap/parameters/enabled
N
If you have “N”in the last command and “CONFIG_ZSWAP=y” then your systems supports zswap but it is not enabled.
$ echo Y | sudo tee /sys/module/zswap/parameters/enabled
$ cat /sys/module/zswap/parameters/enabled
Y
Now you can tune some parameters to increase compression rate (3:1)
# list all parameters
grep . /sys/module/zswap/parameters/*
# change compression params
echo z3fold | sudo tee /sys/module/zswap/parameters/zpool
echo lzo | sudo tee /sys/module/zswap/parameters/compressor
How to check zswap is working? I followed this email thread to find some clues:
I rebooted my laptop today and realised that docker wasnt running… It was running before the reboot and I didn’t upgrade anything related to docker (or I thought)
$ docker ps -a
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
$
Let’s check status and start if needed:
root@athens:/var/log# service docker status
β docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-21 08:34:03 BST; 7min ago
TriggeredBy: β docker.socket
Docs: https://docs.docker.com
Main PID: 12015 (code=exited, status=1/FAILURE)
Aug 21 08:34:03 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:34:03 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:34:03 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:34:03 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:34:03 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:34:42 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:34:42 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:34:42 athens systemd[1]: Failed to start Docker Application Container Engine.
root@athens:/var/log#
root@athens:/var/log#
root@athens:/var/log# service docker start
Job for docker.service failed because the control process exited with error code.
See "systemctl status docker.service" and "journalctl -xe" for details.
root@athens:/var/log# systemctl status docker.service
β docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-21 08:41:20 BST; 5s ago
TriggeredBy: β docker.socket
Docs: https://docs.docker.com
Process: 35305 ExecStart=/usr/sbin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
Main PID: 35305 (code=exited, status=1/FAILURE)
Aug 21 08:41:19 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:41:19 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:19 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:41:20 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:41:20 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:20 athens systemd[1]: Failed to start Docker Application Container Engine.
root@athens:/var/log#
Ok, so not much info… let check the recommend details:
root@athens:/var/log# journalctl -xe
ββ Support: https://www.debian.org/support
ββ
ββ A start job for unit docker.socket has begun execution.
ββ
ββ The job identifier is 4236.
Aug 21 08:41:20 athens systemd[1]: Listening on Docker Socket for the API.
ββ Subject: A start job for unit docker.socket has finished successfully
ββ Defined-By: systemd
ββ Support: https://www.debian.org/support
ββ
ββ A start job for unit docker.socket has finished successfully.
ββ
ββ The job identifier is 4236.
Aug 21 08:41:20 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:41:20 athens systemd[1]: docker.service: Failed with result 'exit-code'.
ββ Subject: Unit failed
ββ Defined-By: systemd
ββ Support: https://www.debian.org/support
ββ
ββ The unit docker.service has entered the 'failed' state with result 'exit-code'.
Aug 21 08:41:20 athens systemd[1]: Failed to start Docker Application Container Engine.
ββ Subject: A start job for unit docker.service has failed
ββ Defined-By: systemd
ββ Support: https://www.debian.org/support
ββ
ββ A start job for unit docker.service has finished with a failure.
ββ
ββ The job identifier is 4113 and the job result is failed.
Aug 21 08:41:20 athens systemd[1]: docker.socket: Failed with result 'service-start-limit-hit'.
ββ Subject: Unit failed
ββ Defined-By: systemd
ββ Support: https://www.debian.org/support
ββ
ββ The unit docker.socket has entered the 'failed' state with result 'service-start-limit-hit'.
root@athens:/var/log# systemctl status docker.service log
Unit log.service could not be found.
β docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-21 08:41:20 BST; 1min 2s ago
TriggeredBy: β docker.socket
Docs: https://docs.docker.com
Process: 35305 ExecStart=/usr/sbin/dockerd -H fd:// $DOCKER_OPTS (code=exited, status=1/FAILURE)
Main PID: 35305 (code=exited, status=1/FAILURE)
Aug 21 08:41:19 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:41:19 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:19 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:41:20 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:41:20 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:41:20 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:41:20 athens systemd[1]: Failed to start Docker Application Container Engine.
root@athens:/var/log#
So “journalctl -xe” and “systemctl status docker.service log” gave nothing useful….
So I searched for “docker.socket: Failed with result ‘service-start-limit-hit'” as it was the message that looked more suspicious. I landed here and tried one command to get more logs that I didnt know: “journaltctl -fu docker”
root@athens:/var/log# journalctl -fu docker
-- Logs begin at Sun 2020-02-02 21:12:23 GMT. --
Aug 21 08:42:41 athens dockerd[35469]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaStat
Aug 21 08:42:41 athens dockerd[35469]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaEntry
Aug 21 08:42:41 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:42:41 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:42:41 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:42:41 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Aug 21 08:42:41 athens systemd[1]: Stopped Docker Application Container Engine.
Aug 21 08:42:41 athens systemd[1]: docker.service: Start request repeated too quickly.
Aug 21 08:42:41 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:42:41 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:44:32 athens systemd[1]: Starting Docker Application Container Engineβ¦
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.Metrics
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.HugetlbStat
Aug 21 08:44:32 athens dockerd[35538]: unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character '"' after object key:value pair
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.PidsStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.CPUStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.CPUUsage
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.Throttle
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.MemoryStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.MemoryEntry
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.BlkIOStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.BlkIOEntry
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaStat
Aug 21 08:44:32 athens dockerd[35538]: proto: duplicate proto type registered: io.containerd.cgroups.v1.RdmaEntry
Aug 21 08:44:32 athens systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 08:44:32 athens systemd[1]: docker.service: Failed with result 'exit-code'.
Aug 21 08:44:32 athens systemd[1]: Failed to start Docker Application Container Engine.
Aug 21 08:44:32 athens systemd[1]: docker.service: Scheduled restart job, restart counter is at 1.
Aug 21 08:44:32 athens systemd[1]: Stopped Docker Application Container Engine.
And now, yes, I could see the docker logs properly… and found the culprit and fixed. I am pretty sure the last time I played with “/etc/docker/daemon.json” I restarted docker and it was fine…
Anyway, I learned a new command “journaltctl -fu SERVICE” to troubleshoot services.
Reading this blog entry I realised that very likely I have never tried forward tcpdump to a wireshark. How many times I have taken a pcap in a switch and then I need to download to see the details in wireshark…
I guess you can find some blocking points in firewalls (at least for 2-steps option)
So I tried the single command with a switch in my ceoslab and it works!
The ssh command is actually executing the “bash tcpdump…” remotely. But the key is the “-U” and “-w -” flags. “-U” in conjunction with “-w” sends the packet without waiting for the buffer to fill. Then “-w -” says that it writes the output to stdout instead of a file. If you run the command without -U, it would work but it will update a bit slower as it needs to fill the buffers.
From tcpdump manual:
-U
--packet-buffered
If the -w option is not specified, make the printed packet output ``packet-buffered''; i.e., as the description of the contents of each packet is printed, it will be written to the standard output, rather than, when not writing to a terminal, being written only when the output buffer fills.
If the -w option is specified, make the saved raw packet output ``packet-buffered''; i.e., as each packet is saved, it will be written to the output file, rather than being written only when the output buffer fills.
The -U flag will not be supported if tcpdump was built with an older version of libpcap that lacks the pcap_dump_flush() function.
......
-w file
Write the raw packets to file rather than parsing and printing them out. They can later be printed with the -r option. Standard output is used if file is ``-''.
This output will be buffered if written to a file or pipe, so a program reading from the file or pipe may not see packets for an arbitrary amount of time after they are received. Use the -U flag to cause packets to be written as soon as they are received.
And the stdout of that process is the ssh command so we redirect that outout via a pipe “|” and it is sent as input for wireshark thanks to “-i -” that makes wireshark to read from stdin (that is the stdout from the tcpdump in the switch!)
The wireshark manual:
-i|--interface <capture interface>|-
Set the name of the network interface or pipe to use for live packet capture.
Network interface names should match one of the names listed in "wireshark -D" (described above); a number, as reported by "wireshark -D", can also be used. If you're using UNIX, "netstat -i", "ifconfig -a" or "ip link" might also work to list interface names, although not all versions of UNIX support the -a flag to ifconfig.
If no interface is specified, Wireshark searches the list of interfaces, choosing the first non-loopback interface if there are any non-loopback interfaces, and choosing the first loopback interface if there are no non-loopback interfaces. If there are no interfaces at all, Wireshark reports an error and doesn't start the capture.
Pipe names should be either the name of a FIFO (named pipe) or "-" to read data from the standard input. On Windows systems, pipe names must be of the form "\\pipe\.\pipename". Data read from pipes must be in standard pcapng or pcap format. Pcapng data must have the same endianness as the capturing host.
This option can occur multiple times. When capturing from multiple interfaces, the capture file will be saved in pcapng format.
....
-k Start the capture session immediately. If the -i flag was specified, the capture uses the specified interface. Otherwise, Wireshark searches the list of interfaces, choosing the first non-loopback interface if
there are any non-loopback interfaces, and choosing the first loopback interface if there are no non-loopback interfaces; if there are no interfaces, Wireshark reports an error and doesn't start the capture.
The two-steps option relies on “nc” to send/receive the data, but it is the same idea regarding the tcpdump/wireshark flags using “-“
$ sudo modprobe bonding
$ ip link help bond
$ sudo ip link add bond0 type bond mode 802.3ad
$ sudo ip link set eth0 master bond0
$ sudo ip link set eth1 master bond0
Bridging: vlans + trunks
ip neigh show // l2 table
ip route show // l3 table
ip route add default via 192.168.1.1 dev eth1
sudo modprobe 8021q
// create bridge and add links to bridge (switch)
sudo ip link add br0 type bridge vlan_filtering 1 // native vlan = 1
sudo ip link set eth1 master br0
sudo ip link set eth2 master br0
sudo ip link set eth3 master br0
// make eth1 access port for v11
sudo bridge vlan add dev eth1 vid 11 pvid untagged
// make eth3 access port for v12
sudo bridge vlan add dev eth3 vid 12 pvid untagged
// make eth2 trunk port for v11 and v12
sudo bridge vlan add dev eth2 vid 11
sudo bridge vlan add dev eth2 vid 12
// enable bridge and links
sudo ip link set up dev br0
sudo ip link set up dev eth1
sudo ip link set up dev eth2
sudo ip link set up dev eth3
bridge link show
bridge vlan show
bridge fdb show
VxLAN
I havent tried this yet:
Linux System 1
sudo ip link add br0 type bridge vlan_filtering 1
sudo ip link add vlan10 type vlan id 10 link bridge protocol none
sudo ip addr add 10.0.0.1/24 dev vlan10
sudo ip link add vtep10 type vxlan id 1010 local 10.1.0.1 remote 10.3.0.1 learning
sudo ip link set eth1 master br0
sudo bridge vlan add dev eth1 vid 10 pvid untagged
Linux System 2
sudo ip link add br0 type bridge vlan_filtering 1
sudo ip link add vlan10 type vlan id 10 link bridge protocol none
sudo ip addr add 10.0.0.2/24 dev vlan10
sudo ip link add vtep10 type vxlan id 1010 local 10.3.0.1 remote 10.1.0.1 learning
sudo ip link set eth1 master br0
sudo bridge vlan add dev eth1 vid 10 pvid untagged
This is something I wanted to try for some time. Normally for networks monitoring you use a NMS tool. They can be expensive, free or cheap. I have seen/used Observium and LibreNMS. And many years ago Cacti. There are other tools that can do the job like Zabbix/Nagios/Icinga.
But it seems time-series-databases are the new standard. They give you more flexibility as you can create queries and graph them.
I decided for InfluxDB-Telegraf-Grafana stuck as I could find quickly info based on scenarios of networks.
What is the rule of eachc one:
Telegraf: collect data InfluxDB: store data Grafana: visualize
My main source is again Anton’s blog. All credits to him.
Environment
My network is just 3 Arista ceos containers via docker. All services will run as containers so you need docker installed. Everything is IPv4.
InfluxDB
Installation:
// Create directories
mkdir telemetry-example/influxdb
cd telemetry-example/influxdb
// Get influxdb config
docker run --rm influxdb influxd config > influxdb.conf
// Create local data folder for influxdb that we will map
mkdir data
ls -ltr
// Check docker status
docker images
docker ps -a
// Create docker instance for influxdb. Keep in mind that I am giving a name to the instance
docker run -d -p 8086:8086 -p 8088:8088 --name influxdb \
-v $PWD/influxdb.conf:/etc/influxdb/influxdb.conf:ro \
-v $PWD/data:/var/lib/influxdb \
influxdb -config /etc/influxdb/influxdb.conf
// Verify connectivity
curl -i http://localhost:8086/ping
// Create database "test" using http-query (link below for more details)
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE test"
{"results":[{"statement_id":0}]} <-- command was ok!
// Create user/pass for your db.
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE USER xxx WITH PASSWORD 'xxx123' WITH ALL PRIVILEGES"
{"results":[{"statement_id":0}]} <-- command was ok!
// Create SSL cert for influxdb
docker exec -it influxdb openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/ssl/influxdb-selfsigned.key -out /etc/ssl/influxdb-selfsigned.crt -days 365 -subj "/C=GB/ST=LDN/L=LDN/O=domain.com/CN=influxdb.domain.com"
// Update influxdb.conf for SSL
telemetry-example/influxdb$ vim influxdb.conf
β¦
https-enabled = true
https-certificate = "/etc/ssl/influxdb-selfsigned.crt"
https-private-key = "/etc/ssl/influxdb-selfsigned.key"
β¦
// Restart influxdb to take the changes
docker restart influxdb
// Get influxdb IP for using it later
docker container inspect influxdb --format='{{ .NetworkSettings.IPAddress }}'
172.17.0.2
// Verify connectivity via https
curl -i https://localhost:8086/ping --insecure
The verification for HTTPS was a bit more difficult because the result was always correct no matter what query I was running:
So I decided to see if there was cli/shell for the influxdb (like in mysql, etc). And yes, there is one. Keep in mind that you have to use “-ssl -unsafeSsl” at the same time! That confused me a lot.
$ docker exec -it influxdb influx -ssl -unsafeSsl
Connected to https://localhost:8086 version 1.8.1
InfluxDB shell version: 1.8.1
> show databases
name: databases
name
_internal
test
> use test
Using database test
> show series
key
cpu,cpu=cpu-total,host=5f7aa2c5550e
Links about influxdb that are good for the docker creation and the http queries:
// Create dir
mkdir telemetry-example/telegraf
cd telemetry-example/telegraf
// Get config file to be modified
docker run --rm telegraf telegraf config > telegraf.conf
// Add the details of influxdb in telegraf.conf. As well, you need to add the devices you want to poll. In my case 172.23.0.2/3/4.
vim telegraf.conf
....
[[outputs.influxdb]]
urls = ["https://172.17.0.2:8086"]
database = "test"
skip_database_creation = false
## Timeout for HTTP messages.
timeout = "5s"
## HTTP Basic Auth
username = "xxx"
password = "xxx123"
## Use TLS but skip chain & host verification
insecure_skip_verify = true
# Retrieves SNMP values from remote agents
[[inputs.snmp]]
## Agent addresses to retrieve values from.
## example: agents = ["udp://127.0.0.1:161"]
## agents = ["tcp://127.0.0.1:161"]
agents = ["udp://172.23.0.2:161","udp://172.23.0.3:161","udp://172.23.0.4:161"]
#
## Timeout for each request.
timeout = "5s"
#
## SNMP version; can be 1, 2, or 3.
version = 2
#
## SNMP community string.
community = "tomas123"
#
## Number of retries to attempt.
retries = 3
This is the SNMP config I added below the SNMPv3 options in [[inputs.snmp]]
# ## Add fields and tables defining the variables you wish to collect. This
# ## example collects the system uptime and interface variables. Reference the
# ## full plugin documentation for configuration details.
[[inputs.snmp.field]]
name = "hostname"
oid = "RFC1213-MIB::sysName.0"
is_tag = true
[[inputs.snmp.field]]
name = "uptime"
oid = "DISMAN-EVENT-MIB::sysUpTimeInstance"
# IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards.
[[inputs.snmp.table]]
name = "interface"
inherit_tags = [ "hostname" ]
oid = "IF-MIB::ifTable"
# Interface tag - used to identify interface in metrics database
[[inputs.snmp.table.field]]
name = "ifDescr"
oid = "IF-MIB::ifDescr"
is_tag = true
# IF-MIB::ifXTable contains newer High Capacity (HC) counters that do not overflow as fast for a few of the ifTable counters
[[inputs.snmp.table]]
name = "interfaceX"
inherit_tags = [ "hostname" ]
oid = "IF-MIB::ifXTable"
# Interface tag - used to identify interface in metrics database
[[inputs.snmp.table.field]]
name = "ifDescr"
oid = "IF-MIB::ifDescr"
is_tag = true
# EtherLike-MIB::dot3StatsTable contains detailed ethernet-level information about what kind of errors have been logged on an interface (such as FCS error, frame too long, etc)
[[inputs.snmp.table]]
name = "interface"
inherit_tags = [ "hostname" ]
oid = "EtherLike-MIB::dot3StatsTable"
# Interface tag - used to identify interface in metrics database
[[inputs.snmp.table.field]]
name = "name"
oid = "IF-MIB::ifDescr"
is_tag = true
For more info about the SNMP config in telegraf. These are good links. This is the official github page. And this is the page for SNMP input plugin that explain the differences between “field” and “table”.
As well, the link below is really good too for explaining the SNMP config in telegraf:”Gathering Data via SNMP”
docker logs telegraf -f
...
2020-07-17T12:45:10Z E! [inputs.snmp] Error in plugin: initializing table interface: translating: exit status 2: MIB search path: /root/.snmp/mibs:/usr/share/snmp/mibs:/usr/share/snmp/mibs/iana:/usr/share/snmp/mibs/ietf:/usr/share/mibs/site:/usr/share/snmp/mibs:/usr/share/mibs/iana:/usr/share/mibs/ietf:/usr/share/mibs/netsnmp
Cannot find module (EtherLike-MIB): At line 0 in (none)
EtherLike-MIB::dot3StatsTable: Unknown Object Identifier
...
You will see errors about not able to find the MIB files! So I used Librenms mibs. I download the project and copied the MIBS I thought I needed (arista and some other that dont belong to a vendor). As well, this is noted by Anton’s in this link:
I have seen Grafana before but I have never used it so the configuration on queries was a bit of a challenge but I was lucky and I found very good blogs for that. The installation process is ok:
// Create folder for grafana and data
mkdir -p telemetry-example/grafana/data
cd telemetry-example/grafana
// Create docker instance
docker run -d -p 3000:3000 --name grafana \
--user root \
-v $PWD/data:/var/lib/grafana \
grafana/grafana
// Create SSL cert for grafana
docker exec -it grafana openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/ssl/grafana-selfsigned.key -out /etc/ssl/grafana-selfsigned.crt -days 365 -subj "/C=GB/ST=LDN/L=LDN/O=domain.com/CN=grafana.domain.com"
// Copy grafana config so we can update it
docker cp grafana:/etc/grafana/grafana.ini grafana.ini
// Update grafana config with SSL
vim grafana.ini
############################## Server
[server]
# Protocol (http, https, h2, socket)
protocol = https
β¦
# https certs & key file
cert_file = /etc/ssl/grafana-selfsigned.crt
cert_key = /etc/ssl/grafana-selfsigned.key
// Copy back the config to the container and restart
docker cp grafana.ini grafana:/etc/grafana/grafana.ini
docker container restart grafana
Now you can open in your browser to grafana “https://0.0.0.0:3000/ ” using admin/admin
You need to add a data source that is our influxdb container. So you need to pick up the “influxdb” type and fill the values as per below.
Now, you need to create a dashboard with panel.
Links that I reviewed for creating the dasbord
For creating a panel. The link below was the best on section “Interface Throughput”. Big thanks to the author.
BTW, you need to config SNMP in the switches so telegraf can poll it:
snmp-server location ceoslab
snmp-server community xxx123 ro
snmp-server host 172.17.0.1 version 2c xxx123
In my case, the stack of containers Influx-Telegraf-Grafana are running on the default bridge. Each container has its own IP but as the Arista containers are in the different docker network, it needs to “route” so the IP of telegraf container will be NAT-ed to 172.17.0.1 from the switches point of view.
Next
I would like to manage all this process via Ansible… Something like this.. but will take me time