Very handy:
https://www.digitalocean.com/community/cheatsheets/how-to-use-ansible-cheat-sheet-guide
Today I Learned
This a continuation of the last blog entry. This time we are going to gather syslog messages from the monitoring containers and it is going to be deployed by ansible!
As usual, all this is based on Anton’s Karneliuk blog post. All credits to him.
So initially we built a monitoring stack with InfluxDB, Telegraf and Grafana manually to gather and visualise SNMP infor form the Arista cEOS switches.
This time, we are going to send SYSLOG from the monitoring stack containers to a new Telegraf instance.
Ideally, we would like to send Syslog from the cEOS devices but as Anton mentions, the syslog rfc3164 that most network kit implements, it is not supported (yet) by telegraf, that supports rfc5424.
You can read more info about this in all these links:
https://github.com/influxdata/telegraf/issues/4593
https://github.com/influxdata/go-syslog/pull/27
https://github.com/influxdata/telegraf/issues/7023
https://github.com/influxdata/telegraf/issues/4687
https://medium.com/@leodido/from-logs-to-metrics-f38854e3441a
https://itnext.io/metrics-from-kubernetes-logs-82cb1dcb3551
So the new ansible role for building influx-telegraf-grafana instances is “monitoring_stack”:
├── ansible.cfg
├── ansible-hosts
├── group_vars
│ ├── ceoslab.yaml
│ └── monitoring.yaml
└── playbooks
├── monitoring.yaml
└── roles
├── monitoring_stack
│ ├── tasks
│ │ ├── container_grafana.yml
│ │ ├── container_influxdb.yml
│ │ ├── container_telegraf_snmp.yml
│ │ ├── container_telegraf_syslog.yml
│ │ └── main.yml
│ └── templates
│ ├── telegraf_snmp_template.j2
│ └── telegraf_syslog_template.j2
We will have four monitoring containers:
As the containers are running locally, we define them in the inventory like this:
$ cat ansible-hosts .... [monitoring] localhost
We define some variables too in group_vars for the monitoring containers that will be used in the jinja2 templates and tasks
$ cat group_vars/monitoring.yaml # Defaults for Docker containers docker_mon_net: name: monitoring subnet: 172.18.0.0/16 gateway: 172.18.0.1 path_to_containers: /PICK_YOUR_PATH/monitoring-example var_influxdb: username: xxx password: xxx123 snmp_community: xxx123 db_name: snmp: snmp syslog: syslog var_grafana: username: admin password: xxx123 var_telegraf: …
So we execute the playbook like this:
ansible master$ ansible-playbook playbooks/monitoring.yaml -vvv --ask-become-pass
The very first time, if you pay attention to the ansible logging, everything should success. If for any reason you have to make changes or troubleshoot, and execute again the full playbook, some tasks will fail, but not the playbook (this is done with ignore_errors: yes inside a task). For example, the docker network creation will fail as it is already there. The same if you try to create the user and dbs in a already running influx instance.
That playbook just calls the role “monitoring_stack“. The main playbook in that role will create the docker network where all containers will be attached, all the containers and do something hacky with iptables.
As the cEOS lab is built (using docker-topo) independently of this playbook, there are already some iptables rules in place, and somehow, when executing the role, the rules change and it blocks the new network for any outbound connectivity.
Before the iptables change in the playbook:
# iptables -t filter -S DOCKER-ISOLATION-STAGE-1 Warning: iptables-legacy tables present, use iptables-legacy to see them -N DOCKER-ISOLATION-STAGE-1 -A DOCKER-ISOLATION-STAGE-1 -i br-4bd17cfa19a8 ! -o br-4bd17cfa19a8 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j ACCEPT -A DOCKER-ISOLATION-STAGE-1 -i br-94c1e813ad6f ! -o br-94c1e813ad6f -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -i br-13ab2b6a0d1d ! -o br-13ab2b6a0d1d -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -i br-00db5844bbb0 ! -o br-00db5844bbb0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -i br-121978ca0282 ! -o br-121978ca0282 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-1 -j RETURN # # iptables -t filter -S DOCKER-ISOLATION-STAGE-2 Warning: iptables-legacy tables present, use iptables-legacy to see them -N DOCKER-ISOLATION-STAGE-2 -A DOCKER-ISOLATION-STAGE-2 -o br-4bd17cfa19a8 -j DROP -A DOCKER-ISOLATION-STAGE-2 -o br-94c1e813ad6f -j DROP -A DOCKER-ISOLATION-STAGE-2 -o br-13ab2b6a0d1d -j DROP -A DOCKER-ISOLATION-STAGE-2 -o br-00db5844bbb0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -o br-121978ca0282 -j DROP -A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP -A DOCKER-ISOLATION-STAGE-2 -j RETURN
I want to avoid DOCKER-ISOLATION-STAGE-2 so I want the “-A DOCKER-ISOLATION-STAGE-1 -j ACCEPT” on top of that chain.
This is not the first (neither last) time that this issue bites me. I need to review carefully the docker-topo file and really get me head around the networking expectations from docker.
Another thing about docker networking that bites me very often. In my head, each monitoring has an IP. For example influxdb is 172.18.0.2 and telegraf-syslog is 172.18.0.4. We have configured influxdb to send syslog to telegraf-syslog container so I would expect the influxdb container to use its 0.2 and everything is local (no forwarding, no firewall, etc0. But not, it uses the host ip, 172.18.0.1.
Apart from that, there are several things that I had to review while adapting the role to my environment regarding docker and ansible.
docker documentation:
how to create network: https://docs.docker.com/engine/reference/commandline/network_create/
how to configure container logs: https://docs.docker.com/engine/reference/commandline/container_logs/
how to configure the logging driver in a container: https://docs.docker.com/config/containers/logging/configure/
how to configure syslog in a container: https://docs.docker.com/config/containers/logging/syslog/
how to run commands from a running container: https://docs.docker.com/engine/reference/commandline/exec/
ansible documentation:
become – run comamnds with sudo in a playbook: https://docs.ansible.com/ansible/latest/user_guide/become.html (–ask-become-pass, -K)
docker container module: https://docs.ansible.com/ansible/latest/modules/docker_container_module.html
grafana data source module: https://docs.ansible.com/ansible/latest/modules/grafana_datasource_module.html
This is important because via ansible, I had to workout the meaning of become, how to add the syslog config in the containers and add grafana datasources via a module.
All my ansible code is here.
Another thing I had to hardcode in the code, it is the IP for the telegraf-syslog container in each container playbook:
syslog-address: “udp://172.18.0.4:6514”
$ cat container_influxdb.yml
---
...
- name: 4- CONTAINER INFLUXDB // LAUNCHING CONTAINER
docker_container:
name: influxdb
image: influxdb
state: started
command: "-config /etc/influxdb/influxdb.conf"
networks:
- name: "{{ docker_mon_net.name }}"
purge_networks: yes
ports:
- "8086:8086"
volumes:
- "{{ path_to_containers }}/influxdb/influxdb.conf:/etc/influxdb/influxdb.conf:ro"
- "{{ path_to_containers }}/influxdb/data:/var/lib/influxdb"
log_driver: syslog
log_options:
syslog-address: "udp://172.18.0.4:6514"
tag: influxdb
syslog-format: rfc5424
become: yes
tags:
- tag_influx
...
Once you have all containers running:
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
dd519ff01d6e telegraf "/entrypoint.sh -con…" 4 hours ago Up 4 hours 8092/udp, 0.0.0.0:161->161/udp, 8125/udp, 8094/tcp telegraf_snmp
869f158046a6 grafana/grafana "/run.sh" 4 hours ago Up 4 hours 0.0.0.0:3000->3000/tcp grafana
dc68f261746b influxdb "/entrypoint.sh -con…" 4 hours ago Up 4 hours 0.0.0.0:8086->8086/tcp influxdb
3662c3c69b21 telegraf "/entrypoint.sh -con…" 6 hours ago Up 6 hours 8092/udp, 0.0.0.0:6514->6514/udp, 8125/udp, 8094/tcp telegraf_syslog
ada1f884f1b7 ceos-lab:4.23.3M "/sbin/init systemd.…" 28 hours ago Up 4 hours 0.0.0.0:2002->22/tcp, 0.0.0.0:9002->443/tcp 3node_r03
22d9c4ae9043 ceos-lab:4.23.3M "/sbin/init systemd.…" 28 hours ago Up 4 hours 0.0.0.0:2001->22/tcp, 0.0.0.0:9001->443/tcp 3node_r02
fe7046b1f425 ceos-lab:4.23.3M "/sbin/init systemd.…" 28 hours ago Up 4 hours 0.0.0.0:2000->22/tcp, 0.0.0.0:9000->443/tcp 3node_r01
You should verify that syslog messages are stored in influxdb:
$ curl -G 'https://localhost:8086/query?db=syslog&pretty=true&u=xxx&p=xxx123' --data-urlencode "q=SELECT * FROM syslog limit 2" --insecure
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "syslog",
"columns": [
"time",
"appname",
"facility",
"facility_code",
"host",
"hostname",
"message",
"msgid",
"procid",
"severity",
"severity_code",
"timestamp",
"version"
],
"values": [
[
"2020-07-21T12:08:16.216632823Z",
"influxdb",
"daemon",
3,
"3662c3c69b21",
"athens",
"ts=2020-07-21T12:08:16.169711Z lvl=info msg=\"InfluxDB starting\" log_id=0O8KE_AG000 version=1.8.1 branch=1.8 commit=af0237819ab9c5997c1c0144862dc762b9d8fc25",
"influxdb",
"11254",
"err",
3,
1595333296000000000,
1
],
We can create the new queries in grafana for SYSLOG. The datasources are already created by ansible so we dont have to worry about that.
For creating a query about the number of syslog messages we receive. This is what I did:
Most of the entries come from “influxdb”.
For creating a query with the content of each syslog message:
Here I struggled a bit. I can’t really change much in the table view.
And this is the dashboard with the syslog queries and snmp from the last blog entry:
So at the end, I have an ansible role working!
Need to learn more about how to backup stuff from grafana. I have been playing with this:
https://github.com/ysde/grafana-backup-tool
Next thing I want to try is telemetry.
Today I was trying to write a playbook to push config to Arista devices.
Initially I wanted to use napalm module to push the config (as I have done with nornir) but it seems the napalm-ansible module requires napalm3 and netmiko3 and that breaks my nornir2.4 ( that requires napalm<3) So I uninstalled napalm-ansible and restored the other packages. Good thing i chekced the version before hand.
$ python -m pip list | grep -E 'nornir|napalm|netmiko|ansible' ansible 2.9.10 napalm 2.5.0 netmiko 2.4.2 nornir 2.4.0
So I had to check the eos_config module. I think the napalm-ansible module is more powerful as it uses diff and sessions provided by Arista. As far as I can see, there is no option to say to the module to just make a dry run.
At the end I managed to put everything together but the eos_config was failing:
TASK [11- push config] task path: xxx/testdir2/ceos-testing/ansible/playbooks/gen-config.yaml:60 fatal: [r1]: FAILED! => { "changed": false, "msg": "path specified in src not found" }
The funny thing is all other tasks that needed to use templates were using the same path and were fine:
- name: 10- merge all configs in one file
assemble:
src: "CFGS/{{ inventory_hostname }}/"
dest: "CFGS/{{ inventory_hostname }}-full.txt"
- name: 11- push config
debugger: on_failed
eos_config:
#src: "{{playbook_dir}}/../CFGS/{{ inventory_hostname }}-full.txt"
src: "../CFGS/{{ inventory_hostname }}-full.txt"
backup: yes
So I had to find out where that task was looking for the file. It seems “assemble“, “template” and “file” tasks use as pwd where I am calling the script (xxx/testdir2/ceos-testing/ansible). But “eos_config” is using where the playbook is (xxx/testdir2/ceos-testing/ansible/playbook) based on my running command “…/ansible master$ ansible-playbook playbooks/gen-config.yaml“.
So I was searching for some help and I found the playbook path and ansible search paths. So now I needed to verify that. I found some ansible debugger and examples that were really useful!
So I used “debugger: on_failed” for my task 11. And could see the path:
TASK [11- push config] task path: /home/tomas/storage/technology/arista/testdir2/ceos-testing/ansible/playbooks/gen-config.yaml:60 fatal: [r1]: FAILED! => { "changed": false, "msg": "path specified in src not found" } [r1] TASK: 11- push config (debug)> p task.args {'backup': True, 'src': '/home/tomas/storage/technology/arista/testdir2/ceos-testing/ansible/playbooks/CFGS/r1-full.txt'} [r1] TASK: 11- push config (debug)> quit User interrupted execution
So it is clear it was looking at the playbook dir.
So after fixing the path, I realised that I didn’t want to run everything and wanted to use tags so only the last part was executed.
/ansible master$ cat playbooks/gen-config.yaml ...- name: 12- display result
debug: msg: "Backup file is {{ load_config.shortname }} and result is: {{ load_config }}"
tags: push_config
... /ansible master$ ansible-playbook playbooks/gen-config.yaml --limit="r1" -vvv --tags "push_config"
One more thing, the output of ansible when you have dictionaries, it is not great. I checked this link and it is good for failures and with -vvvv. But for green outputs still not great:
TASK [12- - display result] * task path: /home/tomas/storage/technology/arista/testdir2/ceos-testing/ansible/playbooks/gen-config.yaml:61 ok: [r1] => msg: 'Backup file is /home/tomas/storage/technology/arista/testdir2/ceos-testing/ansible/playbooks/backup/r1_config and result is: {''changed'': True, ''commands'': [''interface Ethernet1'', ''no shutdown'', ''interface Ethernet2'', ''no shutdown'', ''router bgp 100'', ''neighbor AS100-CORE password mpls-sr''], ''updates'': [''interface Ethernet1'', ''no shutdown'', ''interface Ethernet2'', ''no shutdown'', ''router bgp 100'', ''neighbor AS100-CORE password mpls-sr''], ''session'': ''ansible_1594920727'', ''backup_path'': ''/home/tomas/storage/technology/arista/testdir2/ceos-testing/ansible/playbooks/backup/r1_config.2020-07-16@18:32:07'', ''date'': ''2020-07-16'', ''time'': ''18:32:07'', ''shortname'': ''/home/tomas/storage/technology/arista/testdir2/ceos-testing/ansible/playbooks/backup/r1_config'', ''filename'': ''r1_config.2020-07-16@18:32:07'', ''failed'': False}' META: ran handlers META: ran handlers PLAY RECAP r1 : ok=3 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 (testdir2) go:1.12.5|py:3.7.3|tomas@athens:~/storage/technology/arista/testdir2/ceos-testing/ansible master$
Nice Jinja2 parser online! Great job for the author!!!
From this link:
https://ttl255.com/jinja2-tutorial-part-3-whitespace-control/
A couple of years a go I wrote a playbook with ansible to use napalm for configuring some switches.
I wanted to test again ansible as I am quite rusty and there is always demand for that.
I started with just something basic with my ceos lab.
All my code is here:
https://github.com/thomarite/ceos-testing/tree/master/ansible
Initially I was following the official documentation for Arista EOS Ansible modules:
https://ansible-arista-howto.readthedocs.io/en/latest/COLLECTING_STATUS.html
https://github.com/titom73/ansible-arista-module-howto
Installing ansible was fine with pip in my venv.
But I hit the wall with just the first example using “eos_facts”. Initially I wasnt adding debugging flags so was even worse. Fortunately I remembered “-vvv”. I was seeing this:
The full traceback is: Traceback (most recent call last): File "/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py", line 102, in _ansiballz_main() File "/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py", line 94, in _ansiballz_main invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS) File "/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py", line 40, in invoke_module runpy.run_module(mod_name='ansible.modules.network.eos.eos_facts', init_globals=None, run_name='main', alter_sys=True) File "/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 205, in run_module return _run_module_code(code, init_globals, run_name, mod_spec) File "/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/modules/network/eos/eos_facts.py", line 206, in File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/modules/network/eos/eos_facts.py", line 197, in main File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/facts/facts.py", line 23, in init File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py", line 213, in get_resource_connection File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py", line 229, in get_capabilities File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/connection.py", line 121, in init AssertionError: socket_path must be a value fatal: [r3]: FAILED! => { "changed": false, "module_stderr": "Traceback (most recent call last):\n File \"/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py\", line 102, in \n _ansiballz_main()\n File \"/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py\", line 94, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File \"/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py\", line 40, in invoke_module\n runpy.run_module(mod_name='ansible.modules.network.eos.eos_facts', init_globals=None, run_name='main', alter_sys=True)\n File \"/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py\", line 205, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File \"/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py\", line 96, in _run_module_code\n mod_name, mod_spec, pkg_name, script_name)\n File \"/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py\", line 85, in _run_code\n exec(code, run_globals)\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/modules/network/eos/eos_facts.py\", line 206, in \n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/modules/network/eos/eos_facts.py\", line 197, in main\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/facts/facts.py\", line 23, in init\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py\", line 213, in get_resource_connection\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py\", line 229, in get_capabilities\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/connection.py\", line 121, in init\nAssertionError: socket_path must be a value\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1 }
So, “socket_path” was defined. I checked all the python files mentioned in the stack but couldnt find anything.
It was clear that I wasn’t providing enough info to ansible to establish the socket for connection to the devices (ip:port)
And the example from the documentation didnt work neither:
https://docs.ansible.com/ansible/latest/network/user_guide/platform_eos.html#using-eapi-in-ansible
I knew my old ansible script was working before I left my job. But I knew as well that I was using the latest version of ansible so very likely things have changed since then.
$ ansible --version ansible 2.9.10
So I had to read about the “eos_fact” and “eos_config” module searching here:
https://docs.ansible.com/ansible/latest/modules/list_of_network_modules.html
https://docs.ansible.com/ansible/latest/modules/eos_facts_module.html#eos-facts-module
https://docs.ansible.com/ansible/latest/modules/eos_config_module.html
After some time, I managed to fix the playbook and my environment and I could run the playbook using the ssh connector (but I was ignoring a warning about “provider” not needed…)
(testdir2)/ansible master$ cat ansible-hosts [all:vars] ansible_python_interpreter=/home/tomas/storage/technology/arista/testdir2/bin/python ansible_user='tomas' ansible_password='tomas123' [ceoslab] r1 ansible_host=127.0.0.1 ansible_port=2000 r2 ansible_host=127.0.0.1 ansible_port=2001 r3 ansible_host=127.0.0.1 ansible_port=2002 (testdir2)/ansible master$ cat group_vars/ceoslab.yaml ansible_network_os: eos
The output:
/ansible master$ ansible-playbook playbooks/collect-facts-cli.yaml PLAY [Run commands on ceos lab] TASK [Collect all facts from device] *** [WARNING]: provider is unnecessary when using network_cli and will be ignored [WARNING]: default value forgather_subset
will be changed tomin
from!config
v2.11 onwards ok: [r1] ok: [r3] ok: [r2] TASK [Display result] **** ok: [r2] => { "msg": "Model is cEOSLab and it is running 4.23.3M" } ok: [r1] => { "msg": "Model is cEOSLab and it is running 4.23.3M" } ok: [r3] => { "msg": "Model is cEOSLab and it is running 4.23.3M" } PLAY RECAP * r1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 r2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 r3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Ok, so getting the playbook using the API shouldnt be that difficult? It was.
The full traceback is: File "/tmp/ansible_eos_facts_payload_vz7c7ipu/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py", line 229, in get_capabilities capabilities = Connection(module._socket_path).get_capabilities() File "/tmp/ansible_eos_facts_payload_vz7c7ipu/ansible_eos_facts_payload.zip/ansible/module_utils/connection.py", line 185, in rpc raise ConnectionError(to_text(msg, errors='surrogate_then_replace'), code=code) fatal: [r1]: FAILED! => { "changed": false, "invocation": { "module_args": { "auth_pass": null, "authorize": null, "gather_network_resources": null, "gather_subset": [ "all" ], "host": null, "password": null, "port": null, "provider": null, "ssh_keyfile": null, "timeout": null, "transport": null, "use_ssl": null, "username": null, "validate_certs": null } }, "msg": "Could not connect to http://127.0.0.1:80/command-api: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1056)" }
I was surprised that it was using port 80. I was pretty sure I was providing the correct port (900x) so somehow my data wasnt being processed.
I wasn’t clearly paying attention to the documentation:
https://docs.ansible.com/ansible/latest/modules/eos_facts_module.html#eos-facts-module
It says clearly that “provider” is deprecated since 2.5! I am using 2.9
As well, I have a very poor knowledge of ansible and I didnt understand the concept of “connection”. The SSH was using “network_cli” and API was using “httpapi”.
I was very close to give up the API connection when somehow I searched for “ansible network_cli” and I found documentation for that plugging. Then I searched for “httpapi” and it was gold!
https://docs.ansible.com/ansible/latest/plugins/connection/network_cli.html
https://docs.ansible.com/ansible/latest/plugins/connection/httpapi.html
I realised that I need to pass specific vars for getting the HTTPS connection. So at the end, managed to get the config right for both SSH and HTTPS:
/ansible master$ cat ansible-hosts [all:vars] ansible_python_interpreter=/home/tomas/storage/technology/arista/testdir2/bin/python ansible_user='tomas' ansible_password='tomas123' [ceoslab] r1 ansible_host=127.0.0.1 ansible_port=2000 ansible_httpapi_port=9000 r2 ansible_host=127.0.0.1 ansible_port=2001 ansible_httpapi_port=9001 r3 ansible_host=127.0.0.1 ansible_port=2002 ansible_httpapi_port=9002 /ansible master$ cat group_vars/ceoslab.yaml ansible_network_os: eos #start - eapi config ansible_httpapi_use_ssl: 'yes' ansible_httpapi_validate_certs: 'no' ansible_httpapi_password: "{{ ansible_password }}" #end - eapi config
The output:
ansible master$ ansible-playbook playbooks/collect-facts-eapi.yaml PLAY [Run commands on remote ceos lab] * TASK [Collect all facts from device] *** [WARNING]: default value forgather_subset
will be changed tomin
from!config
v2.11 onwards ok: [r3] ok: [r1] ok: [r2] TASK [Display result] **** ok: [r2] => { "msg": "Model is cEOSLab and it is running 4.23.3M" } ok: [r1] => { "msg": "Model is cEOSLab and it is running 4.23.3M" } ok: [r3] => { "msg": "Model is cEOSLab and it is running 4.23.3M" } PLAY RECAP * r1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 r2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 r3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
At the end of the day, the scripts are identical apart from the “connection” var:
/ansible/playbooks master$ diff collect-facts-cli.yaml collect-facts-eapi.yaml 4c4 < connection: network_cli --- > connection: httpapi
I think you can pass a var to the playbook via the CLI so I will try later.
My recommendation is always to use “-vvv”.
BTW, A good ansible summary I found:
https://gist.github.com/andreicristianpetcu/b892338de279af9dac067891579cad7d
In summary, I found ansible more difficult to troubleshoot that nornir. In nornir, is pure python, I can run ipdb wherever a I want.
But anyway, I learned things. I will add try to write a bit more complex playbooks.
Yesterday managed to get netbox and my lab connected. So today followed up with the original article, and found a new issue that took me several hours.
Initially I was seeing an error that I couldn’t undestand “
netbox.exceptions.CreateException: This field is required
From
(venv) /netbox-example/nornir-napalm-netbox-demo master$ python scripts/create_interfaces.py nb_url = http://0.0.0.0:8080 Creating Netbox Interface for device r1, interface Loopback1 Traceback (most recent call last): File "scripts/create_interfaces.py", line 42, in task=create_netbox_interface, File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 146, in run result = self._run_serial(task, run_on, **kwargs) File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 72, in _run_serial result[host.name] = task.copy().start(host, self) File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/task.py", line 85, in start r = self.task(self, **self.params) File "scripts/create_interfaces.py", line 34, in create_netbox_interface device_id=device_id, File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/dcim.py", line 431, in create_interface return self.netbox_con.post('/dcim/interfaces/', required_fields, **kwargs) File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/connection.py", line 124, in post raise exceptions.CreateException(resp_data) netbox.exceptions.CreateException: This field is required.
So I started to follow the trace, adding “print” and using “ipdb” to see what was going on:
.... /home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/connection.py(71)__request() 70 finally: ---> 71 self.close() 72 ipdb> dir(response) ['attrs', 'bool', 'class', 'delattr', 'dict', 'dir', 'doc', 'enter', 'eq', 'exit', 'format', 'ge', 'getattribute', 'getstate', 'gt', 'hash', 'init', 'init_subclass', 'iter', 'le', 'lt', 'module', 'ne', 'new', 'nonzero', 'reduce', 'reduce_ex', 'repr', 'setattr', 'setstate', 'sizeof', 'str', 'subclasshook', 'weakref', '_content', '_content_consumed', '_next', 'apparent_encoding', 'close', 'connection', 'content', 'cookies', 'elapsed', 'encoding', 'headers', 'history', 'is_permanent_redirect', 'is_redirect', 'iter_content', 'iter_lines', 'json', 'links', 'next', 'ok', 'raise_for_status', 'raw', 'reason', 'request', 'status_code', 'text', 'url'] ipdb> response.url 'http://0.0.0.0:8080/api/dcim/interfaces/' ipdb> response.text '{"type":["This field is required."]}' ipdb> response.status_code 400 ipdb> response.content b'{"type":["This field is required."]}' ipdb> response.reason 'Bad Request' ipdb> response.request ipdb> prepared_request ipdb> prepared_request.url 'http://0.0.0.0:8080/api/dcim/interfaces/' ipdb> dir(prepared_request) ['class', 'delattr', 'dict', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', '_body_position', '_cookies', '_encode_files', '_encode_params', '_get_idna_encoded_host', 'body', 'copy', 'deregister_hook', 'headers', 'hooks', 'method', 'path_url', 'prepare', 'prepare_auth', 'prepare_body', 'prepare_content_length', 'prepare_cookies', 'prepare_headers', 'prepare_hooks', 'prepare_method', 'prepare_url', 'register_hook', 'url'] ipdb> prepared_request.path_url '/api/dcim/interfaces/' ipdb> response.__content *** AttributeError: 'Response' object has no attribute '__content' ipdb> response._content b'{"type":["This field is required."]}' ipdb> response.content b'{"type":["This field is required."]}' ipdb> response.headers {'Server': 'nginx', 'Date': 'Wed, 08 Jul 2020 12:36:35 GMT', 'Content-Type': 'application/json', 'Content-Length': '36', 'Connection': 'keep-alive', 'Vary': 'Accept, Cookie, Origin', 'Allow': 'GET, POST, HEAD, OPTIONS, TRACE', 'API-Version': '2.8', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN'} ipdb> response.reason 'Bad Request' ipdb> response.request ipdb> response.test *** AttributeError: 'Response' object has no attribute 'test' ipdb> response.text '{"type":["This field is required."]}' ipdb> response.url 'http://0.0.0.0:8080/api/dcim/interfaces/' ipdb> quit Create Netbox Interfaces r1 ** changed : False vvvv Create Netbox Interfaces ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ERROR ---- napalm_get ** changed : False --------------------------------------------- INFO (venv) go:1.12.5|py:3.7.3|tomas@athens:~/storage/technology/netbox-example/nornir-napalm-netbox-demo master$ python scripts/create_interfaces.py nb_url = http://0.0.0.0:8080 url3=http://0.0.0.0:8080/api/dcim/interfaces?limit=0 Creating Netbox Interface for device r1, interface Loopback1 url3=http://0.0.0.0:8080/api/dcim/devices/?name=r1&limit=0 device_id = 1 url3=http://0.0.0.0:8080/api/dcim/interfaces/ resp_ok=False resp_status=400 body_data= {'name': 'Loopback1', 'form_factor': 1200, 'device': 1} params= /dcim/interfaces/ resp_data= {'type': ['This field is required.']} Traceback (most recent call last): File "scripts/create_interfaces.py", line 43, in task=create_netbox_interface, File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 146, in run result = self._run_serial(task, run_on, **kwargs) File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 72, in _run_serial result[host.name] = task.copy().start(host, self) File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/task.py", line 85, in start r = self.task(self, **self.params) File "scripts/create_interfaces.py", line 35, in create_netbox_interface device_id=device_id, File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/dcim.py", line 431, in create_interface return self.netbox_con.post('/dcim/interfaces/', required_fields, **kwargs) File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/connection.py", line 130, in post raise exceptions.CreateException(resp_data) netbox.exceptions.CreateException: This field is required.
So it seems that at the end I realised that I was missing the parameter “type” !!!
I was checking the documentation from netbox in github but I couldnt see clearly what kind of config I had to provide…
I checked the “type” value for the only interfaces I already had in netbox: “http://0.0.0.0:8080/api/dcim/interfaces/“
So I tried to pass exactly that but it was still failing…
(venv) go:1.12.5|py:3.7.3|tomas@athens:~/storage/technology/netbox-example/nornir-napalm-netbox-demo master$ python scripts/create_interfaces.py nb_url = http://0.0.0.0:8080 url3=http://0.0.0.0:8080/api/dcim/interfaces?limit=0 Creating Netbox Interface for device r1, interface Loopback1 url3=http://0.0.0.0:8080/api/dcim/devices/?name=r1&limit=0 device_id = 1 url3=http://0.0.0.0:8080/api/dcim/interfaces/ resp_ok=False resp_status=400 body_data= {'name': 'Loopback1', 'form_factor': 1200, 'device': 1, 'type': {'value': '1000base-t', 'label': '1000BASE-T (1GE)', 'id': 1000}} params= /dcim/interfaces/ resp_data= {'type': ['Value must be passed directly (e.g. "foo": 123); do not use a dictionary or list.']} Traceback (most recent call last): File "scripts/create_interfaces.py", line 50, in task=create_netbox_interface, File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 146, in run result = self._run_serial(task, run_on, **kwargs) File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/init.py", line 72, in _run_serial result[host.name] = task.copy().start(host, self) File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/nornir/core/task.py", line 85, in start r = self.task(self, **self.params) File "scripts/create_interfaces.py", line 42, in create_netbox_interface **interface_type, File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/dcim.py", line 431, in create_interface return self.netbox_con.post('/dcim/interfaces/', required_fields, **kwargs) File "/home/tomas/storage/technology/netbox-example/venv/lib/python3.7/site-packages/netbox/connection.py", line 130, in post raise exceptions.CreateException(resp_data) netbox.exceptions.CreateException: Value must be passed directly (e.g. "foo": 123); do not use a dictionary or list. (venv) go:1.12.5|py:3.7.3|tomas@athens:~/storage/technology/netbox-example/nornir-napalm-netbox-demo master$
Somehow the API had to be documented… by chance, looking at the bottom of the netbox page, there was an”API” link….
So, now I needed to look up the correct API call. Based on the script and logs, it was a “POST” for “/dcim/interfaces/”. Here we go!
So finally, I had the info. I confirmed what fields were mandatory and the value they needed!
interface_type = {}
interface_type["type"] = "1000base-t"
for interface_name in interfaces.keys():
if not is_interface_present(nb_interfaces, f"{task.host}", interface_name):
print(
f"* Creating Netbox Interface for device {task.host}, interface {interface_name}"
)
device_id = get_device_id(f"{task.host}", netbox)
print("device_id = %s" % device_id)
netbox.dcim.create_interface(
name=f"{interface_name}",
form_factor=1200, # default
device_id=device_id,
**interface_type,
)
So the script ran fine for all my devices:
netbox-example/nornir-napalm-netbox-demo master$ python scripts/create_interfaces.py nb_url = http://0.0.0.0:8080 url3=http://0.0.0.0:8080/api/dcim/interfaces?limit=0 Create Netbox Interfaces r1 ** changed : False vvvv Create Netbox Interfaces ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO ---- napalm_get ** changed : False --------------------------------------------- INFO ^^^^ END Create Netbox Interfaces ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ r2 ** changed : False vvvv Create Netbox Interfaces ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO ---- napalm_get ** changed : False --------------------------------------------- INFO ^^^^ END Create Netbox Interfaces ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ r3 ** changed : False vvvv Create Netbox Interfaces ** changed : False vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv INFO ---- napalm_get ** changed : False --------------------------------------------- INFO ^^^^ END Create Netbox Interfaces ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
And it is updated in GUI:
Nornir is a python framework mainly for network automation. Instead of using another tool like Ansible (that you need to learn), you can do the same just using pure python all the way. Ansible doesnt scale well and can be very slow, with nornir you have threading from day zero, so if you have to run tasks in 100 devices, you will feel and see the difference.
I learnt about nornir via Kirk Byers’ course. Unfortunately I didnt have the chance/time to use it in my former day job so now I have had time to review things and do a small project.
From https://github.com/thomarite/ceos-testing in the nornir section you can find the whole environment. I tested on the 3-node topology.
It is nothing special. The script builds the config for BGP or ISIS using jinj2 and yaml files. I have the feeling that my jinja2 is a bit difficult to follow. Then using napalm connects to the devices to push or check the config.
Just one issue, as it seems due to the nature of cEOS relaying on docker and my filesystem, if you decide to push the config (dry_run=False == commit=True) the task will fail (while trying to write startup config) but it is actually executed.
(testdir2) /testdir2/ceos-testing/nornir master$ python buid-config.py -b isis -c hostname: r1 task: deploy_config for isis failed: True logs: Traceback (most recent call last): ... File ".../testdir2/lib/python3.7/site-packages/pyeapi/eapilib.py", line 469, in send raise CommandError(code, msg, command_error=err, output=out) pyeapi.eapilib.CommandError: Error [1000]: CLI command 5 of 5 'write memory' failed: could not run command [Error copying system:/running-config to flash:/startup-config (Operation not permitted)] changed: False diff: hostname: r2 task: deploy_config for isis failed: False logs: None changed: False diff: hostname: r3 task: deploy_config for isis failed: False logs: None changed: False diff:
This shouldn’t happen on vEOS or the real hardware (if you have the correct aaa config of course)
For some time I wanted to learn a bit about CI/CD. Today I have given a go to Travis.
All this is based on Kirk Byers python course and his git repo.
So I just created an empty repo and started working on it:
$ git clone https://github.com/thomarite/test-ci.git $ cd test-ci $ pyenv local 3.7.3 $ python -m venv virt_env $ source virt_env/bin/active $ python -m pip install pylama $ python -m pip install black $ python -m pip install pytest $ python -m pip install tox $ mkdir tests $ vim tests/test_sample.py def increment(x): return x + 1 def test_answer(): assert increment(4) == 5 $ vim requirements.txt pytest==5.4.3 pylama==7.7.1 black==19.10b0 $ vim .travis.yml language: python python: "3.7" # command to install dependencies install: pip install -r requirements.txt # command to run tests script: pylama . black --check . py.test -s -v tests/
Then you create an account with Travis-ci.org that is “free” and you link up to your repo. As soon as you commit, you will how the tests run and if they are successful.
As I have now a basic setup, I hope I carry on using it to any new python stuff I try.