Ansible – Troubleshooting

A couple of years a go I wrote a playbook with ansible to use napalm for configuring some switches.

I wanted to test again ansible as I am quite rusty and there is always demand for that.

I started with just something basic with my ceos lab.

All my code is here:

https://github.com/thomarite/ceos-testing/tree/master/ansible

Initially I was following the official documentation for Arista EOS Ansible modules:

https://ansible-arista-howto.readthedocs.io/en/latest/COLLECTING_STATUS.html

https://github.com/titom73/ansible-arista-module-howto

Installing ansible was fine with pip in my venv.

But I hit the wall with just the first example using “eos_facts”. Initially I wasnt adding debugging flags so was even worse. Fortunately I remembered “-vvv”. I was seeing this:

The full traceback is:
Traceback (most recent call last):
File "/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py", line 102, in
_ansiballz_main()
File "/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py", line 94, in _ansiballz_main
invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)
File "/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py", line 40, in invoke_module
runpy.run_module(mod_name='ansible.modules.network.eos.eos_facts', init_globals=None, run_name='main', alter_sys=True)
File "/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 205, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/modules/network/eos/eos_facts.py", line 206, in
File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/modules/network/eos/eos_facts.py", line 197, in main
File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/facts/facts.py", line 23, in init
File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py", line 213, in get_resource_connection
File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py", line 229, in get_capabilities
File "/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/connection.py", line 121, in init
AssertionError: socket_path must be a value
fatal: [r3]: FAILED! => {
"changed": false,
"module_stderr": "Traceback (most recent call last):\n File \"/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py\", line 102, in \n _ansiballz_main()\n File \"/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py\", line 94, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File \"/home/tomas/.ansible/tmp/ansible-tmp-1594296522.1539829-295453-189146847007138/AnsiballZ_eos_facts.py\", line 40, in invoke_module\n runpy.run_module(mod_name='ansible.modules.network.eos.eos_facts', init_globals=None, run_name='main', alter_sys=True)\n File \"/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py\", line 205, in run_module\n return _run_module_code(code, init_globals, run_name, mod_spec)\n File \"/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py\", line 96, in _run_module_code\n mod_name, mod_spec, pkg_name, script_name)\n File \"/home/tomas/.pyenv/versions/3.7.3/lib/python3.7/runpy.py\", line 85, in _run_code\n exec(code, run_globals)\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/modules/network/eos/eos_facts.py\", line 206, in \n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/modules/network/eos/eos_facts.py\", line 197, in main\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/facts/facts.py\", line 23, in init\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py\", line 213, in get_resource_connection\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py\", line 229, in get_capabilities\n File \"/tmp/ansible_eos_facts_payload_r5gz8rov/ansible_eos_facts_payload.zip/ansible/module_utils/connection.py\", line 121, in init\nAssertionError: socket_path must be a value\n",
"module_stdout": "",
"msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
"rc": 1
}

So, “socket_path” was defined. I checked all the python files mentioned in the stack but couldnt find anything.

It was clear that I wasn’t providing enough info to ansible to establish the socket for connection to the devices (ip:port)

And the example from the documentation didnt work neither:

https://docs.ansible.com/ansible/latest/network/user_guide/platform_eos.html#using-eapi-in-ansible

I knew my old ansible script was working before I left my job. But I knew as well that I was using the latest version of ansible so very likely things have changed since then.

$ ansible --version
ansible 2.9.10

So I had to read about the “eos_fact” and “eos_config” module searching here:

https://docs.ansible.com/ansible/latest/modules/list_of_network_modules.html

https://docs.ansible.com/ansible/latest/modules/eos_facts_module.html#eos-facts-module

https://docs.ansible.com/ansible/latest/modules/eos_config_module.html

After some time, I managed to fix the playbook and my environment and I could run the playbook using the ssh connector (but I was ignoring a warning about “provider” not needed…)

(testdir2)/ansible master$ cat ansible-hosts
[all:vars]
ansible_python_interpreter=/home/tomas/storage/technology/arista/testdir2/bin/python
ansible_user='tomas'
ansible_password='tomas123'
[ceoslab]
r1 ansible_host=127.0.0.1 ansible_port=2000
r2 ansible_host=127.0.0.1 ansible_port=2001
r3 ansible_host=127.0.0.1 ansible_port=2002

(testdir2)/ansible master$ cat group_vars/ceoslab.yaml
ansible_network_os: eos

The output:

/ansible master$ ansible-playbook playbooks/collect-facts-cli.yaml
PLAY [Run commands on ceos lab]
TASK [Collect all facts from device] ***
[WARNING]: provider is unnecessary when using network_cli and will be ignored
[WARNING]: default value for gather_subset will be changed to min from !config v2.11 onwards
ok: [r1]
ok: [r3]
ok: [r2]
TASK [Display result] ****
ok: [r2] => {
"msg": "Model is cEOSLab and it is running 4.23.3M"
}
ok: [r1] => {
"msg": "Model is cEOSLab and it is running 4.23.3M"
}
ok: [r3] => {
"msg": "Model is cEOSLab and it is running 4.23.3M"
}
PLAY RECAP *
r1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
r2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
r3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

Ok, so getting the playbook using the API shouldnt be that difficult? It was.

The full traceback is:
File "/tmp/ansible_eos_facts_payload_vz7c7ipu/ansible_eos_facts_payload.zip/ansible/module_utils/network/common/network.py", line 229, in get_capabilities
capabilities = Connection(module._socket_path).get_capabilities()
File "/tmp/ansible_eos_facts_payload_vz7c7ipu/ansible_eos_facts_payload.zip/ansible/module_utils/connection.py", line 185, in rpc
raise ConnectionError(to_text(msg, errors='surrogate_then_replace'), code=code)
fatal: [r1]: FAILED! => {
"changed": false,
"invocation": {
"module_args": {
"auth_pass": null,
"authorize": null,
"gather_network_resources": null,
"gather_subset": [
"all"
],
"host": null,
"password": null,
"port": null,
"provider": null,
"ssh_keyfile": null,
"timeout": null,
"transport": null,
"use_ssl": null,
"username": null,
"validate_certs": null
}
},
"msg": "Could not connect to http://127.0.0.1:80/command-api: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1056)"
}

I was surprised that it was using port 80. I was pretty sure I was providing the correct port (900x) so somehow my data wasnt being processed.

I wasn’t clearly paying attention to the documentation:

https://docs.ansible.com/ansible/latest/modules/eos_facts_module.html#eos-facts-module

It says clearly that “provider” is deprecated since 2.5! I am using 2.9

As well, I have a very poor knowledge of ansible and I didnt understand the concept of “connection”. The SSH was using “network_cli” and API was using “httpapi”.

I was very close to give up the API connection when somehow I searched for “ansible network_cli” and I found documentation for that plugging. Then I searched for “httpapi” and it was gold!

https://docs.ansible.com/ansible/latest/plugins/connection/network_cli.html

https://docs.ansible.com/ansible/latest/plugins/connection/httpapi.html

I realised that I need to pass specific vars for getting the HTTPS connection. So at the end, managed to get the config right for both SSH and HTTPS:

/ansible master$ cat ansible-hosts
[all:vars]
ansible_python_interpreter=/home/tomas/storage/technology/arista/testdir2/bin/python
ansible_user='tomas'
ansible_password='tomas123'
[ceoslab]
r1 ansible_host=127.0.0.1 ansible_port=2000 ansible_httpapi_port=9000
r2 ansible_host=127.0.0.1 ansible_port=2001 ansible_httpapi_port=9001
r3 ansible_host=127.0.0.1 ansible_port=2002 ansible_httpapi_port=9002

/ansible master$ cat group_vars/ceoslab.yaml
ansible_network_os: eos
#start - eapi config
ansible_httpapi_use_ssl: 'yes'
ansible_httpapi_validate_certs: 'no'
ansible_httpapi_password: "{{ ansible_password }}"
#end - eapi config

The output:

ansible master$ ansible-playbook playbooks/collect-facts-eapi.yaml
PLAY [Run commands on remote ceos lab] *
TASK [Collect all facts from device] ***
[WARNING]: default value for gather_subset will be changed to min from !config v2.11 onwards
ok: [r3]
ok: [r1]
ok: [r2]
TASK [Display result] ****
ok: [r2] => {
"msg": "Model is cEOSLab and it is running 4.23.3M"
}
ok: [r1] => {
"msg": "Model is cEOSLab and it is running 4.23.3M"
}
ok: [r3] => {
"msg": "Model is cEOSLab and it is running 4.23.3M"
}
PLAY RECAP *
r1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
r2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
r3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0

At the end of the day, the scripts are identical apart from the “connection” var:

/ansible/playbooks master$ diff collect-facts-cli.yaml collect-facts-eapi.yaml
4c4
< connection: network_cli
---
> connection: httpapi

I think you can pass a var to the playbook via the CLI so I will try later.

My recommendation is always to use “-vvv”.

BTW, A good ansible summary I found:

https://gist.github.com/andreicristianpetcu/b892338de279af9dac067891579cad7d

In summary, I found ansible more difficult to troubleshoot that nornir. In nornir, is pure python, I can run ipdb wherever a I want.

But anyway, I learned things. I will add try to write a bit more complex playbooks.