$ sudo modprobe bonding
$ ip link help bond
$ sudo ip link add bond0 type bond mode 802.3ad
$ sudo ip link set eth0 master bond0
$ sudo ip link set eth1 master bond0
Bridging: vlans + trunks
ip neigh show // l2 table
ip route show // l3 table
ip route add default via 192.168.1.1 dev eth1
sudo modprobe 8021q
// create bridge and add links to bridge (switch)
sudo ip link add br0 type bridge vlan_filtering 1 // native vlan = 1
sudo ip link set eth1 master br0
sudo ip link set eth2 master br0
sudo ip link set eth3 master br0
// make eth1 access port for v11
sudo bridge vlan add dev eth1 vid 11 pvid untagged
// make eth3 access port for v12
sudo bridge vlan add dev eth3 vid 12 pvid untagged
// make eth2 trunk port for v11 and v12
sudo bridge vlan add dev eth2 vid 11
sudo bridge vlan add dev eth2 vid 12
// enable bridge and links
sudo ip link set up dev br0
sudo ip link set up dev eth1
sudo ip link set up dev eth2
sudo ip link set up dev eth3
bridge link show
bridge vlan show
bridge fdb show
VxLAN
I havent tried this yet:
Linux System 1
sudo ip link add br0 type bridge vlan_filtering 1
sudo ip link add vlan10 type vlan id 10 link bridge protocol none
sudo ip addr add 10.0.0.1/24 dev vlan10
sudo ip link add vtep10 type vxlan id 1010 local 10.1.0.1 remote 10.3.0.1 learning
sudo ip link set eth1 master br0
sudo bridge vlan add dev eth1 vid 10 pvid untagged
Linux System 2
sudo ip link add br0 type bridge vlan_filtering 1
sudo ip link add vlan10 type vlan id 10 link bridge protocol none
sudo ip addr add 10.0.0.2/24 dev vlan10
sudo ip link add vtep10 type vxlan id 1010 local 10.3.0.1 remote 10.1.0.1 learning
sudo ip link set eth1 master br0
sudo bridge vlan add dev eth1 vid 10 pvid untagged
This a continuation of the last blog entry. This time we are going to gather syslog messages from the monitoring containers and it is going to be deployed by ansible!
As usual, all this is based on Anton’s Karneliuk blog post. All credits to him.
So initially we built a monitoring stack with InfluxDB, Telegraf and Grafana manually to gather and visualise SNMP infor form the Arista cEOS switches.
This time, we are going to send SYSLOG from the monitoring stack containers to a new Telegraf instance.
Ideally, we would like to send Syslog from the cEOS devices but as Anton mentions, the syslog rfc3164 that most network kit implements, it is not supported (yet) by telegraf, that supports rfc5424.
You can read more info about this in all these links:
The very first time, if you pay attention to the ansible logging, everything should success. If for any reason you have to make changes or troubleshoot, and execute again the full playbook, some tasks will fail, but not the playbook (this is done with ignore_errors: yes inside a task). For example, the docker network creation will fail as it is already there. The same if you try to create the user and dbs in a already running influx instance.
That playbook just calls the role “monitoring_stack“. The main playbook in that role will create the docker network where all containers will be attached, all the containers and do something hacky with iptables.
As the cEOS lab is built (using docker-topo) independently of this playbook, there are already some iptables rules in place, and somehow, when executing the role, the rules change and it blocks the new network for any outbound connectivity.
Before the iptables change in the playbook:
# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
#
# iptables -t filter -S DOCKER-ISOLATION-STAGE-2
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-2 -o br-4bd17cfa19a8 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-94c1e813ad6f -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-13ab2b6a0d1d -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-00db5844bbb0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o br-121978ca0282 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
I want to avoid DOCKER-ISOLATION-STAGE-2 so I want the “-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT” on top of that chain.
This is not the first (neither last) time that this issue bites me. I need to review carefully the docker-topo file and really get me head around the networking expectations from docker.
Another thing about docker networking that bites me very often. In my head, each monitoring has an IP. For example influxdb is 172.18.0.2 and telegraf-syslog is 172.18.0.4. We have configured influxdb to send syslog to telegraf-syslog container so I would expect the influxdb container to use its 0.2 and everything is local (no forwarding, no firewall, etc0. But not, it uses the host ip, 172.18.0.1.
Apart from that, there are several things that I had to review while adapting the role to my environment regarding docker and ansible.
docker documentation:
how to create network: https://docs.docker.com/engine/reference/commandline/network_create/
how to configure container logs: https://docs.docker.com/engine/reference/commandline/container_logs/
how to configure the logging driver in a container: https://docs.docker.com/config/containers/logging/configure/
how to configure syslog in a container: https://docs.docker.com/config/containers/logging/syslog/
how to run commands from a running container: https://docs.docker.com/engine/reference/commandline/exec/
ansible documentation:
become – run comamnds with sudo in a playbook: https://docs.ansible.com/ansible/latest/user_guide/become.html (–ask-become-pass, -K)
grafana data source module: https://docs.ansible.com/ansible/latest/modules/grafana_datasource_module.html
This is important because via ansible, I had to workout the meaning of become, how to add the syslog config in the containers and add grafana datasources via a module.
This is something I wanted to try for some time. Normally for networks monitoring you use a NMS tool. They can be expensive, free or cheap. I have seen/used Observium and LibreNMS. And many years ago Cacti. There are other tools that can do the job like Zabbix/Nagios/Icinga.
But it seems time-series-databases are the new standard. They give you more flexibility as you can create queries and graph them.
I decided for InfluxDB-Telegraf-Grafana stuck as I could find quickly info based on scenarios of networks.
What is the rule of eachc one:
Telegraf: collect data InfluxDB: store data Grafana: visualize
My main source is again Anton’s blog. All credits to him.
Environment
My network is just 3 Arista ceos containers via docker. All services will run as containers so you need docker installed. Everything is IPv4.
InfluxDB
Installation:
// Create directories
mkdir telemetry-example/influxdb
cd telemetry-example/influxdb
// Get influxdb config
docker run --rm influxdb influxd config > influxdb.conf
// Create local data folder for influxdb that we will map
mkdir data
ls -ltr
// Check docker status
docker images
docker ps -a
// Create docker instance for influxdb. Keep in mind that I am giving a name to the instance
docker run -d -p 8086:8086 -p 8088:8088 --name influxdb \
-v $PWD/influxdb.conf:/etc/influxdb/influxdb.conf:ro \
-v $PWD/data:/var/lib/influxdb \
influxdb -config /etc/influxdb/influxdb.conf
// Verify connectivity
curl -i http://localhost:8086/ping
// Create database "test" using http-query (link below for more details)
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE DATABASE test"
{"results":[{"statement_id":0}]} <-- command was ok!
// Create user/pass for your db.
curl -XPOST http://localhost:8086/query --data-urlencode "q=CREATE USER xxx WITH PASSWORD 'xxx123' WITH ALL PRIVILEGES"
{"results":[{"statement_id":0}]} <-- command was ok!
// Create SSL cert for influxdb
docker exec -it influxdb openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/ssl/influxdb-selfsigned.key -out /etc/ssl/influxdb-selfsigned.crt -days 365 -subj "/C=GB/ST=LDN/L=LDN/O=domain.com/CN=influxdb.domain.com"
// Update influxdb.conf for SSL
telemetry-example/influxdb$ vim influxdb.conf
…
https-enabled = true
https-certificate = "/etc/ssl/influxdb-selfsigned.crt"
https-private-key = "/etc/ssl/influxdb-selfsigned.key"
…
// Restart influxdb to take the changes
docker restart influxdb
// Get influxdb IP for using it later
docker container inspect influxdb --format='{{ .NetworkSettings.IPAddress }}'
172.17.0.2
// Verify connectivity via https
curl -i https://localhost:8086/ping --insecure
The verification for HTTPS was a bit more difficult because the result was always correct no matter what query I was running:
So I decided to see if there was cli/shell for the influxdb (like in mysql, etc). And yes, there is one. Keep in mind that you have to use “-ssl -unsafeSsl” at the same time! That confused me a lot.
$ docker exec -it influxdb influx -ssl -unsafeSsl
Connected to https://localhost:8086 version 1.8.1
InfluxDB shell version: 1.8.1
> show databases
name: databases
name
_internal
test
> use test
Using database test
> show series
key
cpu,cpu=cpu-total,host=5f7aa2c5550e
Links about influxdb that are good for the docker creation and the http queries:
// Create dir
mkdir telemetry-example/telegraf
cd telemetry-example/telegraf
// Get config file to be modified
docker run --rm telegraf telegraf config > telegraf.conf
// Add the details of influxdb in telegraf.conf. As well, you need to add the devices you want to poll. In my case 172.23.0.2/3/4.
vim telegraf.conf
....
[[outputs.influxdb]]
urls = ["https://172.17.0.2:8086"]
database = "test"
skip_database_creation = false
## Timeout for HTTP messages.
timeout = "5s"
## HTTP Basic Auth
username = "xxx"
password = "xxx123"
## Use TLS but skip chain & host verification
insecure_skip_verify = true
# Retrieves SNMP values from remote agents
[[inputs.snmp]]
## Agent addresses to retrieve values from.
## example: agents = ["udp://127.0.0.1:161"]
## agents = ["tcp://127.0.0.1:161"]
agents = ["udp://172.23.0.2:161","udp://172.23.0.3:161","udp://172.23.0.4:161"]
#
## Timeout for each request.
timeout = "5s"
#
## SNMP version; can be 1, 2, or 3.
version = 2
#
## SNMP community string.
community = "tomas123"
#
## Number of retries to attempt.
retries = 3
This is the SNMP config I added below the SNMPv3 options in [[inputs.snmp]]
# ## Add fields and tables defining the variables you wish to collect. This
# ## example collects the system uptime and interface variables. Reference the
# ## full plugin documentation for configuration details.
[[inputs.snmp.field]]
name = "hostname"
oid = "RFC1213-MIB::sysName.0"
is_tag = true
[[inputs.snmp.field]]
name = "uptime"
oid = "DISMAN-EVENT-MIB::sysUpTimeInstance"
# IF-MIB::ifTable contains counters on input and output traffic as well as errors and discards.
[[inputs.snmp.table]]
name = "interface"
inherit_tags = [ "hostname" ]
oid = "IF-MIB::ifTable"
# Interface tag - used to identify interface in metrics database
[[inputs.snmp.table.field]]
name = "ifDescr"
oid = "IF-MIB::ifDescr"
is_tag = true
# IF-MIB::ifXTable contains newer High Capacity (HC) counters that do not overflow as fast for a few of the ifTable counters
[[inputs.snmp.table]]
name = "interfaceX"
inherit_tags = [ "hostname" ]
oid = "IF-MIB::ifXTable"
# Interface tag - used to identify interface in metrics database
[[inputs.snmp.table.field]]
name = "ifDescr"
oid = "IF-MIB::ifDescr"
is_tag = true
# EtherLike-MIB::dot3StatsTable contains detailed ethernet-level information about what kind of errors have been logged on an interface (such as FCS error, frame too long, etc)
[[inputs.snmp.table]]
name = "interface"
inherit_tags = [ "hostname" ]
oid = "EtherLike-MIB::dot3StatsTable"
# Interface tag - used to identify interface in metrics database
[[inputs.snmp.table.field]]
name = "name"
oid = "IF-MIB::ifDescr"
is_tag = true
For more info about the SNMP config in telegraf. These are good links. This is the official github page. And this is the page for SNMP input plugin that explain the differences between “field” and “table”.
As well, the link below is really good too for explaining the SNMP config in telegraf:”Gathering Data via SNMP”
docker logs telegraf -f
...
2020-07-17T12:45:10Z E! [inputs.snmp] Error in plugin: initializing table interface: translating: exit status 2: MIB search path: /root/.snmp/mibs:/usr/share/snmp/mibs:/usr/share/snmp/mibs/iana:/usr/share/snmp/mibs/ietf:/usr/share/mibs/site:/usr/share/snmp/mibs:/usr/share/mibs/iana:/usr/share/mibs/ietf:/usr/share/mibs/netsnmp
Cannot find module (EtherLike-MIB): At line 0 in (none)
EtherLike-MIB::dot3StatsTable: Unknown Object Identifier
...
You will see errors about not able to find the MIB files! So I used Librenms mibs. I download the project and copied the MIBS I thought I needed (arista and some other that dont belong to a vendor). As well, this is noted by Anton’s in this link:
I have seen Grafana before but I have never used it so the configuration on queries was a bit of a challenge but I was lucky and I found very good blogs for that. The installation process is ok:
// Create folder for grafana and data
mkdir -p telemetry-example/grafana/data
cd telemetry-example/grafana
// Create docker instance
docker run -d -p 3000:3000 --name grafana \
--user root \
-v $PWD/data:/var/lib/grafana \
grafana/grafana
// Create SSL cert for grafana
docker exec -it grafana openssl req -x509 -nodes -newkey rsa:2048 -keyout /etc/ssl/grafana-selfsigned.key -out /etc/ssl/grafana-selfsigned.crt -days 365 -subj "/C=GB/ST=LDN/L=LDN/O=domain.com/CN=grafana.domain.com"
// Copy grafana config so we can update it
docker cp grafana:/etc/grafana/grafana.ini grafana.ini
// Update grafana config with SSL
vim grafana.ini
############################## Server
[server]
# Protocol (http, https, h2, socket)
protocol = https
…
# https certs & key file
cert_file = /etc/ssl/grafana-selfsigned.crt
cert_key = /etc/ssl/grafana-selfsigned.key
// Copy back the config to the container and restart
docker cp grafana.ini grafana:/etc/grafana/grafana.ini
docker container restart grafana
Now you can open in your browser to grafana “https://0.0.0.0:3000/ ” using admin/admin
You need to add a data source that is our influxdb container. So you need to pick up the “influxdb” type and fill the values as per below.
Now, you need to create a dashboard with panel.
Links that I reviewed for creating the dasbord
For creating a panel. The link below was the best on section “Interface Throughput”. Big thanks to the author.
BTW, you need to config SNMP in the switches so telegraf can poll it:
snmp-server location ceoslab
snmp-server community xxx123 ro
snmp-server host 172.17.0.1 version 2c xxx123
In my case, the stack of containers Influx-Telegraf-Grafana are running on the default bridge. Each container has its own IP but as the Arista containers are in the different docker network, it needs to “route” so the IP of telegraf container will be NAT-ed to 172.17.0.1 from the switches point of view.
Next
I would like to manage all this process via Ansible… Something like this.. but will take me time
In the past, I have had to use Centos systems a lot at work and there was something I really liked from rpm, it is “yum provides” that tells you which package you need to install based on the command you need.
I always struggle to do that in Debian. I hope I remember it for the next time. Based on this link:
# aptitude install apt-file
# apt-file update
# apt-file search snmpwalk
libnet-snmp-perl: /usr/share/doc/libnet-snmp-perl/examples/snmpwalk.pl
libsnmp-session-perl: /usr/share/doc/libsnmp-session-perl/examples/snmpwalkh.pl
python3-pysnmp4-apps: /usr/bin/pysnmpwalk
python3-pysnmp4-apps: /usr/share/man/man1/pysnmpwalk.1.gz
snmp: /usr/bin/snmpwalk <=== THIS IS WHAT I WANT !!!!
snmp: /usr/share/man/man1/snmpwalk.1.gz
snmpsim: /usr/share/doc/snmpsim/examples/data/foreignformats/linux.snmpwalk.gz
snmpsim: /usr/share/doc/snmpsim/examples/data/foreignformats/winxp1.snmpwalk.gz
# aptitude install snmp
Today I was trying to write a playbook to push config to Arista devices.
Initially I wanted to use napalm module to push the config (as I have done with nornir) but it seems the napalm-ansible module requires napalm3 and netmiko3 and that breaks my nornir2.4 ( that requires napalm<3) So I uninstalled napalm-ansible and restored the other packages. Good thing i chekced the version before hand.
So I had to check the eos_config module. I think the napalm-ansible module is more powerful as it uses diff and sessions provided by Arista. As far as I can see, there is no option to say to the module to just make a dry run.
At the end I managed to put everything together but the eos_config was failing:
So I had to find out where that task was looking for the file. It seems “assemble“, “template” and “file” tasks use as pwd where I am calling the script (xxx/testdir2/ceos-testing/ansible). But “eos_config” is using where the playbook is (xxx/testdir2/ceos-testing/ansible/playbook) based on my running command “…/ansible master$ ansible-playbook playbooks/gen-config.yaml“.
So I was searching for some help and I found the playbook path and ansible search paths. So now I needed to verify that. I found some ansible debugger and examples that were really useful!
So I used “debugger: on_failed” for my task 11. And could see the path:
So it is clear it was looking at the playbook dir.
So after fixing the path, I realised that I didn’t want to run everything and wanted to use tags so only the last part was executed.
/ansible master$ cat playbooks/gen-config.yaml
...
- name: 12- display result debug: msg: "Backup file is {{ load_config.shortname }} and result is: {{ load_config }}" tags: push_config
...
/ansible master$ ansible-playbook playbooks/gen-config.yaml --limit="r1" -vvv --tags "push_config"
One more thing, the output of ansible when you have dictionaries, it is not great. I checked this link and it is good for failures and with -vvvv. But for green outputs still not great:
A good refresh about traceroute. It is a very common tool for network troubleshooting so it is important to use it wisely
Important points
ICMP vs UDP: most implementations do UDP (it can be blocked…)
Every probe is an independent trial!
Try to identify the characteristics and location of each hop
If there is a congestion/delay issue in one hop, it has to be carried out to the next hops, if not, it is just prioritizing of the ICMP generation by that router/hop.
You dont see the reverse path – Ask the other end (if possible) to send the traceroute from its end.
Border routers between providers can be a hot spot for issues.
Asymmetric paths can bite you. Try to set the source address in your tests (from the provider IP, from your own space, etc)
Spot ECMP (in the same hop, you see several different IPs). Multiple unequal length paths can be painful.
MPLS: most times is hidden (TTL is removed). It can be tricky to spot. But it can be funny when you see the hops (with private IPs 🙂
And if you are more interested in the paths than latency, this can be a good too:
I finished this book. It is quite good, I have read about Wim Hof through another book that mentioned the benefits of cold for physical recovery in sports. And after a bit of more searching I was interested in reading more.
The book shows the benefits of breathing and how disconnected we are from our environment. And we have the tools to change that.
In other books about depression and trauma, mentioned the disconnection from our environment as one main cause, and getting back to it (nature) gets improvements.
I have never been a strong swimmer neither have a big lung capacity but I remember that when I was swimming as a teenager in my hometown, after doing the breathing exercises I could hold much longer. And the second day I tried the breathing exercise from the book, I managed 4 minutes. I was quite surprised. So I am adding this to my meditation practice too.
Regarding the cold, the days I am going for a run, I finish my shower with cold water (based on the fittest book) so I can give better recovery to my legs (and knees). And to be honest, It feels very good afterwards. If you don’t fight the cold (don’t shiver) it is interesting how your body relaxes and heartbeat slows down.
So I will carry on with the cold and breath exercises.
Key presses for more visual people:
1- Enter Command Mode:
Escape
2- Move around to the start of the area to indent:
hjkl↑↓←→
3- Start a block:
v
4- Move around to the end of the area to indent:
hjkl↑↓←→
5- Type the number of indentation levels you want
0..9
6- Execute the indentation on the block:
>