Traceroute

A good refresh about traceroute. It is a very common tool for network troubleshooting so it is important to use it wisely

Important points

  • ICMP vs UDP: most implementations do UDP (it can be blocked…)
  • Every probe is an independent trial!
  • Try to identify the characteristics and location of each hop
  • If there is a congestion/delay issue in one hop, it has to be carried out to the next hops, if not, it is just prioritizing of the ICMP generation by that router/hop.
  • You dont see the reverse path – Ask the other end (if possible) to send the traceroute from its end.
  • Border routers between providers can be a hot spot for issues.
  • Asymmetric paths can bite you. Try to set the source address in your tests (from the provider IP, from your own space, etc)
  • Spot ECMP (in the same hop, you see several different IPs). Multiple unequal length paths can be painful.
  • MPLS: most times is hidden (TTL is removed). It can be tricky to spot. But it can be funny when you see the hops (with private IPs 🙂

And if you are more interested in the paths than latency, this can be a good too:

https://github.com/rucarrol/traceflow

Openconfig

I have struggled to get something working for learning a bit of openconfig.

At the end, all info my info comes from Anton’s Karneliuk blog series about Openconfig. So all credit to him. It is the best source about real testing of openconfig in different platforms.

I am not going to create the wheel and explain what openconfig is. In my head, it is attempt from several big vendors to standardise the network management (config) and monitoring (telemetry) via YANG (vendor-neutral) models. So OC uses YANG. And we interact with OC using a transport protocol like netconf, restcong and gNMI. So the network devices need to implement one of these protocols. Based on the blog Cisco, Nokia and Arista have netconf implementations and Ansible has a module for that!!! So the key words are openconfig, yang and netcong.

So in my case, based on my ceos lab, I have added a new playbook based on Anton’s to test openconfig/netcong with Arista cEOS:

https://github.com/thomarite/ceos-testing/blob/master/README.md#openconfig

This is quite basic as it only gets the interface config.

Following Anton’s Part3 Blog:

I tried to push config via openconfig to my ceos devices (all files are in github as per link above).

The blog is dense but it is good because there is a lot of info. In this case, you have to use an Ansible role so it is a new thing to learn. As well, I wanted to adapt that role to my env and found some ansible issues but managed to fix after reading ansible documentation and paying attention to the -vvv info.

From “oc-push-config.yaml” the first task “collect” it is fine. It just takes some 10 minutes or more to get all YANG modules from each device.

The issue is with task “configure”. It fails when trying to push the interface config. I have tried Anton’s config and the actual config generated from oc-get-interface-info.yaml but no joy.

Based on the blog, it seems Arista doesnt have much interest in Openconfig.

Anyway, there have been a couple of intense days looking at all this openconfig/netcong/yang thing. I have just touched the surface but I have learned some Ansible in the way too. So could be worse.

Will move on to something else.

Netbox – API Troubleshooting

Yesterday managed to get netbox and my lab connected. So today followed up with the original article, and found a new issue that took me several hours.

Initially I was seeing an error that I couldn’t undestand “

netbox.exceptions.CreateException: This field is required

From

(venv) /netbox-example/nornir-napalm-netbox-demo master$ python scripts/create_interfaces.py
nb_url = http://0.0.0.0:8080
Creating Netbox Interface for device r1, interface Loopback1
Traceback (most recent call last):
File "scripts/create_interfaces.py", line 42, in
task=create_netbox_interface,
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 146, in run
result = self._run_serial(task, run_on, **kwargs)
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 72, in _run_serial
result[host.name] = task.copy().start(host, self)
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/task.py", line 85, in start
r = self.task(self, **self.params)
File "scripts/create_interfaces.py", line 34, in create_netbox_interface
device_id=device_id,
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/dcim.py", line 431, in create_interface
return self.netbox_con.post('/dcim/interfaces/', required_fields, **kwargs)
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/connection.py", line 124, in post
raise exceptions.CreateException(resp_data)
netbox.exceptions.CreateException: This field is required.

So I started to follow the trace, adding “print” and using “ipdb” to see what was going on:

....
/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/connection.py(71)__request()
70 finally:
---> 71 self.close()
72
ipdb> dir(response)
['attrs', 'bool', 'class', 'delattr', 'dict', 'dir', 'doc', 'enter', 'eq', 'exit', 'format', 'ge', 'getattribute', 'getstate', 'gt', 'hash', 'init', 'init_subclass', 'iter', 'le', 'lt', 'module', 'ne', 'new', 'nonzero', 'reduce', 'reduce_ex', 'repr', 'setattr', 'setstate', 'sizeof', 'str', 'subclasshook', 'weakref', '_content', '_content_consumed', '_next', 'apparent_encoding', 'close', 'connection', 'content', 'cookies', 'elapsed', 'encoding', 'headers', 'history', 'is_permanent_redirect', 'is_redirect', 'iter_content', 'iter_lines', 'json', 'links', 'next', 'ok', 'raise_for_status', 'raw', 'reason', 'request', 'status_code', 'text', 'url']
ipdb> response.url
'http://0.0.0.0:8080/api/dcim/interfaces/'
ipdb> response.text
'{"type":["This field is required."]}'
ipdb> response.status_code
400
ipdb> response.content
b'{"type":["This field is required."]}'
ipdb> response.reason
'Bad Request'
ipdb> response.request

ipdb> prepared_request

ipdb> prepared_request.url
'http://0.0.0.0:8080/api/dcim/interfaces/'
ipdb> dir(prepared_request)
['class', 'delattr', 'dict', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', '_body_position', '_cookies', '_encode_files', '_encode_params', '_get_idna_encoded_host', 'body', 'copy', 'deregister_hook', 'headers', 'hooks', 'method', 'path_url', 'prepare', 'prepare_auth', 'prepare_body', 'prepare_content_length', 'prepare_cookies', 'prepare_headers', 'prepare_hooks', 'prepare_method', 'prepare_url', 'register_hook', 'url']
ipdb> prepared_request.path_url
'/api/dcim/interfaces/'
ipdb> response.__content
*** AttributeError: 'Response' object has no attribute '__content'
ipdb> response._content
b'{"type":["This field is required."]}'
ipdb> response.content
b'{"type":["This field is required."]}'
ipdb> response.headers
{'Server': 'nginx', 'Date': 'Wed, 08 Jul 2020 12:36:35 GMT', 'Content-Type': 'application/json', 'Content-Length': '36', 'Connection': 'keep-alive', 'Vary': 'Accept, Cookie, Origin', 'Allow': 'GET, POST, HEAD, OPTIONS, TRACE', 'API-Version': '2.8', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN'}
ipdb> response.reason
'Bad Request'
ipdb> response.request

ipdb> response.test
*** AttributeError: 'Response' object has no attribute 'test'
ipdb> response.text
'{"type":["This field is required."]}'
ipdb> response.url
'http://0.0.0.0:8080/api/dcim/interfaces/'
ipdb> quit
Create Netbox Interfaces
r1 ** changed : False
vvvv Create Netbox Interfaces ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ERROR
---- napalm_get ** changed : False --------------------------------------------- INFO
(venv) go:1.12.5|py:3.7.3|tomas@athens:~/storage/technology/netbox-example/nornir-napalm-netbox-demo master$ python scripts/create_interfaces.py
nb_url = http://0.0.0.0:8080
url3=http://0.0.0.0:8080/api/dcim/interfaces?limit=0
Creating Netbox Interface for device r1, interface Loopback1
url3=http://0.0.0.0:8080/api/dcim/devices/?name=r1&limit=0
device_id = 1
url3=http://0.0.0.0:8080/api/dcim/interfaces/
resp_ok=False resp_status=400
body_data= {'name': 'Loopback1', 'form_factor': 1200, 'device': 1}
params= /dcim/interfaces/
resp_data= {'type': ['This field is required.']}
Traceback (most recent call last):
File "scripts/create_interfaces.py", line 43, in
task=create_netbox_interface,
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 146, in run
result = self._run_serial(task, run_on, **kwargs)
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 72, in _run_serial
result[host.name] = task.copy().start(host, self)
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/task.py", line 85, in start
r = self.task(self, **self.params)
File "scripts/create_interfaces.py", line 35, in create_netbox_interface
device_id=device_id,
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/dcim.py", line 431, in create_interface
return self.netbox_con.post('/dcim/interfaces/', required_fields, **kwargs)
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/connection.py", line 130, in post
raise exceptions.CreateException(resp_data)
netbox.exceptions.CreateException: This field is required.

So it seems that at the end I realised that I was missing the parameter “type” !!!

I was checking the documentation from netbox in github but I couldnt see clearly what kind of config I had to provide…

I checked the “type” value for the only interfaces I already had in netbox: “http://0.0.0.0:8080/api/dcim/interfaces/

So I tried to pass exactly that but it was still failing…

(venv) go:1.12.5|py:3.7.3|tomas@athens:~/storage/technology/netbox-example/nornir-napalm-netbox-demo master$ python scripts/create_interfaces.py
nb_url = http://0.0.0.0:8080
url3=http://0.0.0.0:8080/api/dcim/interfaces?limit=0
Creating Netbox Interface for device r1, interface Loopback1
url3=http://0.0.0.0:8080/api/dcim/devices/?name=r1&limit=0
device_id = 1
url3=http://0.0.0.0:8080/api/dcim/interfaces/
resp_ok=False resp_status=400
body_data= {'name': 'Loopback1', 'form_factor': 1200, 'device': 1, 'type': {'value': '1000base-t', 'label': '1000BASE-T (1GE)', 'id': 1000}}
params= /dcim/interfaces/
resp_data= {'type': ['Value must be passed directly (e.g. "foo": 123); do not use a dictionary or list.']}
Traceback (most recent call last):
File "scripts/create_interfaces.py", line 50, in
task=create_netbox_interface,
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 146, in run
result = self._run_serial(task, run_on, **kwargs)
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 72, in _run_serial
result[host.name] = task.copy().start(host, self)
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/task.py", line 85, in start
r = self.task(self, **self.params)
File "scripts/create_interfaces.py", line 42, in create_netbox_interface
**interface_type,
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/dcim.py", line 431, in create_interface
return self.netbox_con.post('/dcim/interfaces/', required_fields, **kwargs)
File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/connection.py", line 130, in post
raise exceptions.CreateException(resp_data)
netbox.exceptions.CreateException: Value must be passed directly (e.g. "foo": 123); do not use a dictionary or list.
(venv) go:1.12.5|py:3.7.3|tomas@athens:~/storage/technology/netbox-example/nornir-napalm-netbox-demo master$

Somehow the API had to be documented… by chance, looking at the bottom of the netbox page, there was an”API” link….

So, now I needed to look up the correct API call. Based on the script and logs, it was a “POST” for “/dcim/interfaces/”. Here we go!

So finally, I had the info. I confirmed what fields were mandatory and the value they needed!

interface_type = {}
interface_type["type"] = "1000base-t"
for interface_name in interfaces.keys():
    if not is_interface_present(nb_interfaces, f"{task.host}", interface_name):
        print(
            f"* Creating Netbox Interface for device {task.host}, interface {interface_name}"
        )
        device_id = get_device_id(f"{task.host}", netbox)
        print("device_id = %s" % device_id)
        netbox.dcim.create_interface(
           name=f"{interface_name}",
           form_factor=1200,  # default
           device_id=device_id,
           **interface_type,
        )

So the script ran fine for all my devices:

netbox-example/nornir-napalm-netbox-demo master$ python scripts/create_interfaces.py
nb_url = http://0.0.0.0:8080
url3=http://0.0.0.0:8080/api/dcim/interfaces?limit=0
Create Netbox Interfaces
r1 ** changed : False
vvvv Create Netbox Interfaces ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
---- napalm_get ** changed : False --------------------------------------------- INFO
^^^^ END Create Netbox Interfaces ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
r2 ** changed : False
vvvv Create Netbox Interfaces ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
---- napalm_get ** changed : False --------------------------------------------- INFO
^^^^ END Create Netbox Interfaces ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
r3 ** changed : False
vvvv Create Netbox Interfaces ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
---- napalm_get ** changed : False --------------------------------------------- INFO
^^^^ END Create Netbox Interfaces ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

And it is updated in GUI:

Netbox

Another thing I wanted to play with is Netbox and found a good article to follow. So credits to the authors.

Using my current ceos lab from https://github.com/thomarite/ceos-testing

I followed Rick’s article to install netbox-docker and his own repo with the nornir examples using netbox. In this case nornir is going to use netbox as inventory. Normally I use local files. I created a python venv for 3.7.3

mkdir netbox-example; cd netbox-example
pyenv local 3.7.3
python -m virtualenv venv
source venv/bin/activate
git clone https://github.com/netbox-community/netbox-docker.git
cd netbox-docker
vim docker-compose.yml  --> so it always expose 8080
      nginx:
      ...
        ports:
          - 8080:8080
docker-compose pull
docker-compose up

When installing the requirements for “nornir-napalm-netbox-demo” I had to modify the version of some packages. So I removed the required version and I left pip to install the latest. I didnt use the makefile.

git clone https://github.com/rickdonato/nornir-napalm-netbox-demo
cd nornir-napalm-netbox-demo
python -m pip install -r requirements.txt

I struggled quite a bit with the management IP in netbox and the meaning of “platform”

  • Create Manufacturers under Device Types: I created “Arista”
  • Create Device Types under Device Types: I created “ceos”
  • Create Platforms under Devices: This is VERY important as it has to be a supported NAPALM platform!!! So for Arista, I need “eos”.
  • Create Device Roles under Devices. I created “pe”
  • Create Devices under Devices.
  • Within each device: add a management interface. Here, I got confused as I was adding the interface in the inventory section. The inventory section is info to/from the device using NAPALM. So you need to go to the bottom of the page, add the interface
  • and then add an IP to that interface and mark it as primary.

Keep in mind that initially, I was using “0.0.0.0” for each device as that’s the IP I have be using for all my scripts lately.

Keep in mind (II) that we are using docker twice (from different commands…) one to get netbox and the other via docker(-topo) to get the Arista ceos containers…. and we have iptables rules under the hood created by both…

But, let’s go step by step. Now we need to confirm that our nornir scrip can connect to netbox. So follow “Nornir-to-Netbox Configuration” section. This is my file. I updated the nb_url and nb_token. Notice the usage of “transform_function“.

---
core:
num_workers: 20
inventory:
plugin: nornir.plugins.inventory.netbox.NBInventory
options:
nb_url: 'http://0.0.0.0:8080'
nb_token: '<NETBOX_API_TOKEN>'
ssl_verify: False
transform_function: "helpers.adapt_user_password"

You need to update “scripts/secrets.py” with the devices you have in your inventory and the user/pass:

creds = {
"r1": {"username": "user", "password": "pas123"},
"r2": {"username": "user", "password": "pas123"},
"r3": {"username": "user", "password": "pas123"},
}

So now we can test if nornir can connect to netbox:

/netbox-example/nornir-napalm-netbox-demo master$ python scripts/helpers.py --inventory
{'defaults': {'connection_options': {},
'data': {},
'hostname': None,
'password': None,
'platform': None,
'port': None,
'username': None},
'groups': {},
'hosts': {'r1': {'connection_options': {},
'data': {'asset_tag': 'r1',
'model': 'ceos',
'role': 'pe',
'serial': 'r1',
'site': 'lab1',
'vendor': 'Arista'},
'groups': [],
'hostname': '192.168.16.2',
'password': 'pas123',
'platform': 'eos',
'port': None,
'username': 'user'},
'r2': {'connection_options': {},
'data': {'asset_tag': 'r2',
'model': 'ceos',
'role': 'pe',
'serial': 'r2',
'site': 'lab1',
'vendor': 'Arista'},
'groups': [],
'hostname': '192.168.16.3',
'password': 'pas123',
'platform': 'eos',
'port': None,
'username': 'user'},
'r3': {'connection_options': {},
'data': {'asset_tag': 'r3',
'model': 'ceos',
'role': 'pe',
'serial': 'r3',
'site': 'lab1',
'vendor': 'Arista'},
'groups': [],
'hostname': '192.168.16.4',
'password': 'pas123',
'platform': 'eos',
'port': None,
'username': 'user'}}}

All good. Let’s see if we can get backups from the devices.

netbox-example/nornir-napalm-netbox-demo master$ python scripts/backup_configs.py
Backup Device configurations**
r1 ** changed : True *
vvvv Backup Device configurations ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
---- napalm_get ** changed : False --------------------------------------------- INFO
---- write_file ** changed : True ---------------------------------------------- INFO
^^^^ END Backup Device configurations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
r2 ** changed : True *
vvvv Backup Device configurations ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
---- napalm_get ** changed : False --------------------------------------------- INFO
---- write_file ** changed : True ---------------------------------------------- INFO
^^^^ END Backup Device configurations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
r3 ** changed : True *
vvvv Backup Device configurations ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvv INFO
---- napalm_get ** changed : False --------------------------------------------- INFO
---- write_file ** changed : True ---------------------------------------------- INFO
^^^^ END Backup Device configurations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(venv) /netbox-example/nornir-napalm-netbox-demo master$

Great all good.

Now, let’s see netbox using NAPALM. If you click on “Status” for any device, netbox will use NAPALM to get the facts from the device. If netbox is not configured properly with NAPALM, it will fail. This is a working scenario:

The tabs “LLDP neighbors” and “Configuration” relay too in NAPALM.

So for configuring netbox with napalm you need to tell netbox the user/pass that NAPALM needs:

netbox-example$ vim netbox-docker/env/netbox.env
...
NAPALM_USERNAME=user
NAPALM_PASSWORD=pas123
NAPALM_TIMEOUT=10
...

Very likely you will have to restart netbox:

/netbox-docker release$ docker-compose down
/netbox-docker release$ docker-compose up

As mentioned before, I had an issue when I was using “0.0.0.0” as IP. By default (as It seems I can’t think) I was using the exposed IP/port from docker-topo to reach the ceos switches. I haven’t had an issue until using netbox.

I am using docker for netbox and docker(-topo) for my arista cEOS switches. So the connectivity between netbox and ceos is via the IPs/interfaces/bridges created by docker. And remember… you have iptables under the hood. My first mistake was telling netbox to use 0.0.0.0 as it is the one I am using to testing from my scripts when connecting to ceos. Netbox needs to point to the IP assigned by docker :facepalm: 192.16.16.x in my case. Second one, the port, same thing docker exposes the port 443 as 900x for external connections and I use 900x in my scripts. From netbox point of view, it is still 443 :facepalm: And finally, I am calling docker twice for building my lab, one for netbox-docker and the other for ceos. You need to keep an eye on iptables changes when restarting netbox via docker-compose because you can be in the situation that netbox traffic is dropped in DOCKER-ISOLATION-STAGE-1 :facepalm: (need to try to write a docker-compose to build everything in one go)

So when I was having errors from netbox that it was being rejected when connecting to ceos devices via NAPALM, I couldnt understand it. My scripts were fine using those details (0.0.0.0:900x)

I ran tcpdump on one ceos on the “ethernet0” interface and NOTHING was hitting the interface from netbox on port 900x but my scripts could…..

Somehow netbox wasnt able to reach ceos r1??? In my head, netbox and ceos devices were all in 0.0.0.0….. so no routing, no firewalls, they are connected in the same network 0.0.0.0…..

At the end I waked up and realised that the docker devices are using the IPs provided by docker so it is following normal routing… and firewalling by iptables. The same for ceos devices, they have IPs (different from 0.0.0.0)

So I updated netbox with the correct management IPs for r1, r2 and r3 ceos.

When I filtered by the real netbox IP in r1 tcpdump ethernet0, I was seeing traffic on 900x!!! Good. Then I realised that it has to be 443. So I removed my hack to update the port to 900x.

For a different reason I had to restart docker-topo (for ceos) and then docker netbox. And now, I coudnt see any traffic from netbox in r1….. I “didn’t” change anything. So the routing didnt change, there was something else “cutting” the connection: iptables

docker uses iptables very heavily. I realised that after restart docker-netbox, iptables changed…

before restart:

# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-94a8183a4fb1 ! -o br-94a8183a4fb1 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-0d4ec9aba9bd ! -o br-0d4ec9aba9bd -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-609619313dc8 ! -o br-609619313dc8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-61d32350cb58 ! -o br-61d32350cb58 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-384488acbc99 ! -o br-384488acbc99 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN

after restart:

# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -i br-381cdff63d2f ! -o br-381cdff63d2f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-94a8183a4fb1 ! -o br-94a8183a4fb1 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-0d4ec9aba9bd ! -o br-0d4ec9aba9bd -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-609619313dc8 ! -o br-609619313dc8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-61d32350cb58 ! -o br-61d32350cb58 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN

With the restart, docker created a new bridge interface for netbox (old: br-384488acbc99, new: br-381cdff63d2f) and it wasnt hitting anymore the “DOCKER-ISOLATION-STAGE-1 -j ACCEPT”

So I had to make an iptables change:

# iptables -t filter -D DOCKER-ISOLATION-STAGE-1 -j ACCEPT
# iptables -t filter -I DOCKER-ISOLATION-STAGE-1 -j ACCEPT
# iptables -t filter -S DOCKER-ISOLATION-STAGE-1
Warning: iptables-legacy tables present, use iptables-legacy to see them
-N DOCKER-ISOLATION-STAGE-1
-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i br-381cdff63d2f ! -o br-381cdff63d2f -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-94a8183a4fb1 ! -o br-94a8183a4fb1 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-0d4ec9aba9bd ! -o br-0d4ec9aba9bd -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-609619313dc8 ! -o br-609619313dc8 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i br-61d32350cb58 ! -o br-61d32350cb58 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
#

And finally, netbox could use napalm to contact the ceos devices…. Calling docker twice is not a great idea….

BTW, this is my docker ps with netbox and ceos devices:

(venv) /netbox-example$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c23be76ffd54 nginx:1.17-alpine "nginx -c /etc/netbo…" 4 hours ago Up 4 hours 80/tcp, 0.0.0.0:8080->8080/tcp netbox-docker_nginx_1
5a0b89f18578 netboxcommunity/netbox:latest "/opt/netbox/docker-…" 4 hours ago Up 4 hours netbox-docker_netbox_1
528948de329b netboxcommunity/netbox:latest "python3 /opt/netbox…" 4 hours ago Up 4 hours netbox-docker_netbox-worker_1
29529302ba1c redis:5-alpine "docker-entrypoint.s…" 4 hours ago Up 4 hours 6379/tcp netbox-docker_redis_1
5e975ec2aa70 redis:5-alpine "docker-entrypoint.s…" 4 hours ago Up 4 hours 6379/tcp netbox-docker_redis-cache_1
6158672a4ae6 postgres:11-alpine "docker-entrypoint.s…" 4 hours ago Up 4 hours 5432/tcp netbox-docker_postgres_1
34841aa098d4 ceos-lab:4.23.3M "/sbin/init systemd.…" 5 hours ago Up 5 hours 0.0.0.0:2002->22/tcp, 0.0.0.0:9002->443/tcp 3node_r03
4ca92c6a3b09 ceos-lab:4.23.3M "/sbin/init systemd.…" 5 hours ago Up 5 hours 0.0.0.0:2001->22/tcp, 0.0.0.0:9001->443/tcp 3node_r02
67e8b7ab84e0 ceos-lab:4.23.3M "/sbin/init systemd.…" 5 hours ago Up 5 hours 0.0.0.0:2000->22/tcp, 0.0.0.0:9000->443/tcp 3node_r01

I was painful but I learned a couple of things about netbox, nornir and docker/iptables!!!

Nornir

Nornir is a python framework mainly for network automation. Instead of using another tool like Ansible (that you need to learn), you can do the same just using pure python all the way. Ansible doesnt scale well and can be very slow, with nornir you have threading from day zero, so if you have to run tasks in 100 devices, you will feel and see the difference.

I learnt about nornir via Kirk Byers’ course. Unfortunately I didnt have the chance/time to use it in my former day job so now I have had time to review things and do a small project.

From https://github.com/thomarite/ceos-testing in the nornir section you can find the whole environment. I tested on the 3-node topology.

It is nothing special. The script builds the config for BGP or ISIS using jinj2 and yaml files. I have the feeling that my jinja2 is a bit difficult to follow. Then using napalm connects to the devices to push or check the config.

Just one issue, as it seems due to the nature of cEOS relaying on docker and my filesystem, if you decide to push the config (dry_run=False == commit=True) the task will fail (while trying to write startup config) but it is actually executed.

(testdir2) /testdir2/ceos-testing/nornir master$ python buid-config.py -b isis -c
hostname: r1
task: deploy_config for isis
failed: True
logs: Traceback (most recent call last):
...
File ".../testdir2/lib/python3.7/site-packages/pyeapi/eapilib.py", line 469, in send
raise CommandError(code, msg, command_error=err, output=out)
pyeapi.eapilib.CommandError: Error [1000]: CLI command 5 of 5 'write memory' failed: could not run command [Error copying system:/running-config to flash:/startup-config (Operation not permitted)]
changed: False
diff:

hostname: r2
task: deploy_config for isis
failed: False
logs: None
changed: False
diff:

hostname: r3
task: deploy_config for isis
failed: False
logs: None
changed: False
diff:

This shouldn’t happen on vEOS or the real hardware (if you have the correct aaa config of course)

Route Reflectors – Notes

Reflect:
client -> clients and non-clients
non-client -> clients
No Reflect:
non-client -> no-client (normal ibgp)

always advertises to eBGP peers (normal ebgp)
eBGP learned prefixes, advertised to client and non-clients (normal ebgp)

Full Mesh iBGP:

  • between RRs
  • RRs and non-clients
  • clients just need iBGP to RRs.

Reflects ONLY the best route

RRs dont modify on reflected routes: NEXT_HOP, AS_PATH, LP and MED.

Prevent routing information loops: ORIGINATOR_ID and CLUSTER_LIST

Clustering:

  • ORIGINATOR_ID: The first RR creates the Originator_ID and sets it to the BGP router ID of the router that originates the route. So when a client receives a route with its own Originator ID, it is dropped.
  • CLUSTER_LIST: if the local CLUSTER_ID is found in the list, the route is discarded.
    *This is done ONLY in RRs.
    This is ONLY created or updated on a RR during Reflection.
Hierarchical Route Reflection:
2 levels:
- level1 RRs are clients of level2 RR -> level1 RR dont need full mesh between them
- level2 RRs are full mesh between them.

Consider:
size of top-level mesh
number of alternative paths

Hierarchical Route Reflection:
2 levels:

– level1 RRs are clients of level2 RR -> level1 RR dont need full mesh between them
– level2 RRs are full mesh between them.

Consider:

  • size of top-level mesh
  • number of alternative paths

RR Design Priciples:

  • keep logical and physical topologies congruent to increase redundancy and path optimization and prevent loops
    — follow physical topology
    — change physical topology
    — modify logical topology
    — follow physical topology
    — session between RR and non-client shouldnt traverse a client
    — session between RR and client shouldnt traverse a non-client
  • use comparable metrics in route selection to avoid convergence oscillations
    (ie MED – only used between prefixes from same neighbor AS – not used for different AS)
    — full ibgp mess –> no
    — always-compare-med –> no
    — deterministic-med -> ok
    — med=0 (via RM in) -> ok
    — bgp communities ->
  • set proper intra and inter cluster IGP metrics to avoid convergence oscillations
    — multicluster RR architecture: intracluster metrics lower than intercluster.
  • modify next-hop with care. Do so only to bring the RRs into forwarding path.
  • use peer groups with RR to reduce convergence time.

GCP Networking 101 – IP Forwarding

I had my shiny and tiny GCP network for EVE-NG to test vEOS. I built a new VM (vm2) to be my center for automation so I can test stuff like ansible/napalm/nornir etc… But I couldn’t ping from vm2 to the vEOS instances in eve-ng (vm1). Those instances where in a different network attached to vm1 so it had to “route”.

As usual, I missed one step when I created the EVE-NG VM. The official documentation doesnt mention anything regarding enabling routing in the VM. As I am not used to Cloud environments, I assume that any simple Linux VM can forward traffic if configured.

Surprise Surprise. In GCP (not sure in other cloud providers), you need to enable “forwarding” during the VM creation and you can’t change that afterwards in any way.

After checking the second guide I followed, I realised that guide mentioned the point to enable forwarding to avoid the same problem I was facing…

So I had to gave up and had to build both VMs from scratch….

But at the end, I have routing enabled in both VMs and I can ping to the vEOS images.

And another annoying thing. I couldnt update the next hop in a static route defined in the VPC. So I had to delete it and create again pointing to the new VM with the vEOS.

And dealing with the internal IPs…

Moving on, quite frustrating day. But learned several things about GCP netwoking.

IPv6 EH

I was reading a chat today, and people were talking about issues with EH. As usual, I didn’t pay attention to the very beginning of the conversation. At the end, after reading the initial link from the conversation, this was all related to IPv6 Extended Headers. And it seems they can cause issues even showed in a rfc7872.

This is the agenda from “NPS/CAIDA 2020 Virtual IPv6 Workshop” last week. And this Geoff Huston’s presentation that started the conversation. And one more link from Geoff about measuring IPv6.

And this is an old issue about IPv6, fragmentation, load balancers, anycast networks that was very interesting to read. IPv6 MTU is 1280.

I don’t have production experience with IPv6 so I try to learn from others. At some point I need to create a proper IPv6 lab with IPv6 services (NTP, DNS, DHPC, HTTPs, etc)

EVE-NG: Arista Lab

As my last attempt to build a MPLS-SR Arista lab failed usin cEOS. I decided to try a different approach as I need more resources that my laptop has. For sometime, I wanted to use tesuto but I am not sure if it is still on business. From the main page, you can’t find any link to register (and pay) for the service. Although if you search for “pricing” you can find a link to that. That’s it.

The other option was to use EVE-NG. You can use it in your own bare-metal server or in the cloud.

So finally, I decided to spend some money. I signed up for GCP with a $300 free computing offer. So at least I dont pay for GCP yet and then I bought one year of EVE-NG professional. Let’s see how it goes.

Before buying the license, you need to install eve-ng. So I followed the official documentation to use it in GCP as it is quite up to date.

I consulted other links too just to compare other users experiences like these:

https://github.com/NetDevNotes/Eve-NG-in-Google-Cloud

I had an issue during the process. When I had to configure DHCP, the IP wizard was showing garbage in the script. Hopefully I didnt have to add anything just accept all default values.

So once it is done, you need to https to the VM…. it didnt work. Somehow “apache” was started. So after startup, got access. I can login and change the default password.

root@eveng01:/var/www/html# service apache2 start
root@eveng01:/var/www/html# service apache2 status

So far, I am not planning to give it a static IP to the VM and a FQDN from my domain. Maybe in the future if I use it often.

Now, I need to create the Arista lab. I followed one of the links earlier, it was quite handy.

I created my small 3 nodes lab, apply the config. All this with a couple of reboots in each device and you have the lab up and running!

It is nice to work in a system with plenty of RAM. The VM has 60GB of RAM and 16vCPU. So I should be able to create a lab with 14 vEOS (each one needs 4GB and 1CPU).

$ top
top - 13:00:27 up 1:33, 1 user, load average: 2.12, 1.37, 1.04
Tasks: 266 total, 1 running, 168 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.3 us, 5.9 sy, 0.0 ni, 83.4 id, 0.0 wa, 0.0 hi, 0.4 si, 0.0 st
KiB Mem : 10.2/61838576 [ ]
KiB Swap: 0.0/0 [ ]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27623 root 20 0 3034100 1.992g 25696 S 100.4 3.4 11:21.40 qemu-system-x86
26120 root 20 0 3034100 1.951g 26068 S 100.0 3.3 8:54.66 qemu-system-x86
24536 root 20 0 3034100 1.915g 26072 S 43.3 3.2 9:16.11 qemu-system-x86
245 root 25 5 0 0 0 S 8.2 0.0 2:05.36 uksmd
7500 www-data 20 0 377908 30744 12732 S 4.5 0.0 0:17.27 apache2
4262 root 20 0 1138416 15732 13508 S 0.8 0.0 0:25.40 janus
5526 tomcat8 20 0 5925452 348168 17676 S 0.8 0.6 0:43.17 java
159 root 20 0 0 0 0 I 0.4 0.0 0:01.13 kworker/6:1-eve
4363 mysql 20 0 2493932 85712 20408 S 0.4 0.1 0:10.80 mysqld
7210 www-data 20 0 377900 31024 12724 S 0.4 0.1 0:07.08 apache2

Unfortunately, I am hitting the same problem, and this time, the MAC addresses are the ones you expect to see based on the interface outputs:

I have asked again Arista if this is expected…

In the main time, I need to learn how to map the devices in the VM to external ports so I can access directly from my laptop.

UPDATE

My Arista SE confirmed that cEOS doesnt support MPLS Data Plane. And this should work with vEOS. So I asked in Arista forum about this problem with vEOS and turns out that this works but you need to be sure that a “physical” interface is attached to the VRF, a Loopback or SVI is not enough.

This seems to be the original post about the problem:

https://eos.arista.com/forum/see-bgp-routes-unable-to-ping/

So I just added a VPC to et3 in each device in CUST-A VRF and I can ping across VRFs!!!

r4#ping vrf CUST-A 192.168.0.2 source 192.168.0.1
PING 192.168.0.2 (192.168.0.2) from 192.168.0.1 : 72(100) bytes of data.
80 bytes from 192.168.0.2: icmp_seq=1 ttl=65 time=70.9 ms
80 bytes from 192.168.0.2: icmp_seq=2 ttl=65 time=64.3 ms
80 bytes from 192.168.0.2: icmp_seq=3 ttl=65 time=58.2 ms
80 bytes from 192.168.0.2: icmp_seq=4 ttl=65 time=50.6 ms
80 bytes from 192.168.0.2: icmp_seq=5 ttl=65 time=58.6 ms
--- 192.168.0.2 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 47ms
rtt min/avg/max/mdev = 50.613/60.554/70.943/6.786 ms, pipe 5, ipg/ewma 11.817/65.414 ms
r4#

And the funny thing. I can’t see anymore the MPLS packets in the tcpdump 🙂

Anyway, good news, I can carry on creating more complex labs and test some scripting/automation stuff.

This is the latest diagram:

FTP Passive

I have a supplier at my employer that requires to use a FTP server to send big files when you open a support ticket. For a long time (a couple of years) whenever I had to upload big files, I had to use my personal VM because my ftp connections failed from the office. I always blamed the super-smart firewall.

One day, I decided to fix the issue and allow the connection in our corporate firewall. I failed. Still couldnt upload files from the office. So keep using my personal VM.

This week I had to upload again a big file. This time I am working from home, so pretty much it is going to work the upload. Wrong! It fails. Ok, I checked a bit and got to the conclusion that it is my ISP or modem at home that is blocking FTP. Most ISP use CGN to stretch as much as possible the limited IPv4. I have IPv6 at home and my VM has IPv6 too… but the ftp server doesnt.

I checked the internet if there was any know issue with my ISP and FTP connections. No luck. I connected to my modem, nothing obvious messing around with FTP.

I decided to give it a proper go to this issue. I knew that it worked from my VM and it didnt from home. I noticed that I was running the same ftp client version in the VM and at home. So let’s debug the ftp client and take a packet capture in both locations.

CLI from the VM:

$ ftp -vd b.b.b.b
ftp: setsockopt: Bad file descriptor
Name: ftp
---> USER ftp
331 Please specify the password.
Password:
---> PASS XXXX
230 Login successful.
---> SYST
215 UNIX Type: L8
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd support
---> CWD support
250 Directory successfully changed.
ftp> cd 211211
---> CWD 211211
250 Directory successfully changed.
ftp> put TEST.txt
local: TEST.txt remote: TEST.txt
---> TYPE I
200 Switching to Binary mode.
ftp: setsockopt (ignored): Permission denied
---> PORT a,a,a,a,162,57
200 PORT command successful. Consider using PASV.
---> STOR TEST.txt
150 Ok to send data.
226 Transfer complete.
28 bytes sent in 0.00 secs (854.4922 kB/s)
ftp> quit
---> QUIT

And this is the packet capture:

After typing “put” in packet 33, I see a “PASV” message from the server and a new connection (initiated by the server!) is established for the data transfer. All good.

So now, make the same from home and compare.

CLI from home without debug:

$ ftp b.b.b.b
Connected to b.b.b.b.
Name: ftp
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd support
250 Directory successfully changed.
ftp> cd 211211
250 Directory successfully changed.
ftp> put TEST.txt
local: TEST.txt remote: TEST.txt
500 Illegal PORT command.
ftp: bind: Address already in use
ftp> quit
221 Goodbye.

CLI from home with debug:

$ ftp -vd b.b.b.b
ftp: setsockopt: Bad file descriptor
Name: ftp
---> USER ftp
331 Please specify the password.
Password:
---> PASS XXXX
230 Login successful.
---> SYST
215 UNIX Type: L8
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd support
---> CWD support
250 Directory successfully changed.
ftp> cd 211211
---> CWD 211211
250 Directory successfully changed.
ftp> put TEST.txt
local: TEST.txt remote: TEST.txt
---> TYPE I
200 Switching to Binary mode.
ftp: setsockopt (ignored): Permission denied
---> PORT 192,168,1,158,202,145
500 Illegal PORT command.
ftp: bind: Address already in use
ftp> quit
---> QUIT
221 Goodbye.

So with and without debug I keep seeing “ftp: bind: Address already in use”…..

And this is the packet capture from home:

So after I type “put” in packet 32, the answer from the server is a “500”.

I wasnt clearly paying attention to the clues. I was still banging my head why the server was sending a “500 Ilegal PORT command”.

I was comparing both captures and both debug outputs… but still didnt it.

I thought I understood FTP. I knew that you use port TCP 21 to establish the control session and the data session / transfer is via new TCP session using a random port. That’s one of the reasons that using NAT or CGN can screw up your FTP sessions.

So I assumed that the issues wasnt my ISP. So it had to be my side (or me).

So finally, I decided to search for “ftp: bind: Address already in use” as it was the message that came up with and without debugging.

Oh boy, first entry in the face!

https://www.linuxquestions.org/questions/linux-distributions-5/problems-with-ftp-server-bind-address-allready-in-use-213509/

An entry from 2004…. it can’t fix my problem for sure…. keep reading and update from 2020… it says it works…. oh boy II

try using a passive connection with "ftp -p" instead, see if it helps...

There we go:

$ ftp -vdp b.b.b.b
ftp: setsockopt: Bad file descriptor
Name: ftp
---> USER ftp
331 Please specify the password.
Password:
---> PASS XXXX
230 Login successful.
---> SYST
215 UNIX Type: L8
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd support
---> CWD support
250 Directory successfully changed.
ftp> cd 211211
---> CWD 211211
250 Directory successfully changed.
ftp> put TEST.txt
local: TEST.txt remote: TEST.txt
---> TYPE I
200 Switching to Binary mode.
ftp: setsockopt (ignored): Permission denied
---> PASV
227 Entering Passive Mode (b,b,b,b,46,248).
---> STOR TEST.txt
150 Ok to send data.
226 Transfer complete.
26 bytes sent in 0.00 secs (12.5386 kB/s)
ftp> quit
---> QUIT
221 Goodbye.

it worked !!!

I felt embarrassed. Time to search for FTP passive vs active…

Really good explanation. I hope I will never forget it.

  • FTP Active: The client issues a PORT command to the server signalling that it will “actively” provide an IP and port number so the server opens the Data Connection back to the client.
  • FTP Passive: The client issues a PASV command to indicate that it will wait “passively” for the server to supply an IP and port number, after which the client opens a Data Connection to the server.

So it worked in my VM because somehow the ftp server sent a PASV command (maybe because it detects there is no NAT as I have a public IP???).

From home, it failed because, by default, the connection is ftp active, so when the server tried to open the new data connection to me(something I couldnt see in the packet capture…) it failed as my ADSL modem wouldnt allow inbound connections.

Once I enabled “-p” in my connection to the server, all worked because it was me who started the new data connection and my firewall allows everything outbound.

Happy to solve the problem after a couple of years, and after a couple of hours of “serious” troubleshooting. It was shocking how blind I was. I had the ftp error message and the PASV from the trace.

Anyway, I learned something new.