BIND performance – LACP Troubleshooting – Chiplets – AI/HPC Networking – Spray ML/AI workloads

BIND: Interesting links about BIND performance and the lab setup. DNS is the typical technology that looks straightforward but as soon as you dig a bit, it is a world itself

LACP: Interesting blog about troubleshooting details. As above, this is the typical tech that you give for granted that works but then, you need to really understand how it works to troubleshoot it. So I learned a bit (although the blog is “old”)

Chiplets: Very good blog. Explaining the origin of getting to chiplets. Interesting evolution and good touch to mention the network industry, and not just CPU/GPU.

As the process node shrank, manufacturing became more complex and expensive, leading to a higher cost per square millimeter of silicon. Die cost does not scale linearly with die area. The cost of the die more than doubles with doubling the die area due to reduced yields (number of good dies in a wafer).

Instead of packing more cores inside a large die, it may be more economical to develop medium-sized CPU cores and connect them inside the package to get higher core density at the package level. These packages with more than one logic die inside are called multi-chip modules (MCMs). The dies inside the multi-chip modules are often referred to as chiplets.

AI/HPC Networking: Nice summery about AI vs HPC, and what each hyperscaler and vendors are doing. For me is quite interesting how to get proper loadbalacing of flows like AWS SDR. This should be an actual standard by any network vendor or software to aim to that goal. I guess it is not easy.

High performance requirements can create a vendor lock-in. Doesn’t matter if it is IB or Ethernet. So pick your evil.

Spray ML/AI worloads: Based on the above regarding the loadbalacing, this is an interesting article about how to generate loadbalancing in ML workloads when it is based in just one elephant flow. So you need Adaptive routing in your fabric/switches, NICs that support it and support from your code/library.

VimGPT – Maia AI – Mirai – Reptar – Mellanox Debian – RISC-V DC – Mojo – Moors Law

VimGTP: Very interesting project. I haven’t used it. But thinking aloud, you could use it to interact with sites that dont have API (couriers)? I think with Selenium you can do things like that?

Maia AI: CLoud providers like to be masters of their own destiny so try to build as many things by themselves as possible. So now MS has developed its GPU for AI. It is quite interesting the custom rack they had to built with the sidekick for cooling down the new chips. There are no many figures about the chip (5nm, 105b transistors) to compare with other things in the market.

Reptar: new Intel CPU vulnerability. It looks like is a feature from Ice Lake architecture. It looks like you can crash the cores but no yet take over. Still interesting.

I am not affected 馃檪

$ grep fsrm /proc/cpuinfo
$

Mellanox with Debian: Interesting how you can install a nearly standard Debian into a Mellanox SN2700 switch.

RISC-V into datacenter: Happy to see RISC-V chips in the datacenter. But not clear who is going to use them?

Mirai history: I think most of wired articles read like a holywood movie 馃檪 Although 2016 security issues are “old” school, still interesting how teenagers got that far.

Mojo: Interesting because of the people behind of it… really impressive.

Moor’s law analysis: I liked the part about networks, that is not very common mentioned in these type of analysis.

RFKill

Somehow my linux laptop sometimes disables WIFI when I upgrade it. It doesnt really bother me as I can enable it by an icon in the UI but one day my UI lost the panel with that icon after another upgrade. So I had to learn how to enable the wifi. Via this page, I learned about the different status and then checking the options of rkfill command got my WIFI enabled back again.

# rfkill list
0: phy0: Wireless LAN
	Soft blocked: yes
	Hard blocked: no
1: hci0: Bluetooth
	Soft blocked: yes
	Hard blocked: no
# 
# rfkill unblock wifi
# 
# rfkill list
0: phy0: Wireless LAN
	Soft blocked: no
	Hard blocked: no
1: hci0: Bluetooth
	Soft blocked: yes
	Hard blocked: no
# 

At some point, I would like to test bluetooth in my laptop.

Curl, Yaml, scalars, Elixir, git stash

I haven’t watched this video, but looks like the holly book of curl!!!

I'd recommend starting at ~34 minutes.

路You can specify multiple URLS with multiple output options in a single command. Doing this or using globbing (see below) to the same host will use persistent connections and greatly improve performance because the same L5 session is used

路trurl is also made by the project and allows you to programmatically manipulate URLs (change server, path, query parameters, etc.). Pretty neat: https://github.com/curl/trurl

路curl supports URL globbing: curl https://{ftp,www,test}.example.com/img[1-22].jpg -o "foo_#2_#1.jpg"

路By default, curl will resolve requests serially when multiple URLS or globbing is specified, but curl is capable of doing parallel transfers with the -Z or --parallel option. And can do anywhere from 2-300 transfers in parallel. This also has the potential to parallel-ize HTTP/3 transfers even from single URLs.

路You can do curl --help category to get a list of help categories for narrowing down options by categories like http or output

路 Long commands for curl can be specified in a file and given to curl either via stdin or -K / --config - These files are essentially just command lines in a file

路You can use the --trace option to provide tcpdump type output from curl. Saving the need to to start tcpdump in the background if you just want to see what's happening from curl

路You can use --connect-to to specify a different DNS name to go to (instead of the one specified in the URL) which is similar to the --resolve option, but doesn't require the user to lookup the IP address ahead of time

路You can override the DNS server that you use to resolve URLs via --dns-ipv4-addr 8.8.8.8 for example

路You can add --libcurl to any curl command and it will spit out C source-code that implements the same command line in C via the library libcurl

路You can set the environment variable SSLKEYLOGFILE to a file name and it will save the runtime TLS secrets to that file, and use that file in WireShark along with a dump of the traffic from tcpdump to see the contents of encrypted HTTP streams

路You can choose to only download files that have changed since the last time they were downloaded with curl via --etag-save <etag_file> and --etag-compare <etag_file>

路You can skip adding the extra -H "Content-Type: application/json" when getting or posting JSON data (with -d), by specifying --json instead of just -d

路You can create JSON easily from the command line with the tool jo: https://github.com/jpmens/jo (basically a reverse jq)

Rant about yaml. And something I learned about yaml some months ago and forgot about it: scalars for making multiline work in yaml.

Elixir: a programming language based on Erlang. Really impressive reports! But still I would like to learn golang (if I ever learn properly python 馃檪

git stash: I didnt know about this git command until last week, very handy.

Mobile Phone + SSH server

I have tried many times to connect my mobile phones to my laptop. It always looks easy if you use M$ but with Linux I always fail, I can’t get to work MTP. So now I really want to take mainly all my pictures from a phone and be able to back them up and transfer to a new one. I dont want to use cloud services or tools from the manufacturers. I want to use old school methods. So after struggling for some time, I somehow decided to use something as old school as SSH/SCP. Android is based in linux, isn’t it? So I searched for a free SSH server app, found this one. And it worked! I managed to understand it, created my user, my mounting points, enable it… and was able to SCP all my photos from my mobile to my laptop. It worked with Samsung and Huawei.

I am pretty sure that people know have better ways to do this… but that’s me.

ARP Storms – EVPN

We have had an issue with broadcast storms in our network. Checking the CoPP setup in the switches, we could see massive drops of ARP. This is a good link to know how to check CoPP drops in NXOS.

N9K:# show copp status
N9K# show policy-map interface control-plane | grep 'dropped [1-9]' | diff

Having so many ARP drops by CoPP is bad because very likely good ARP requests are going to be dropped.

Initially i thought it was related to ARP problems in EVPN like this link. But after taking a packet capture in a switch from an interface connected to a server, I could see that over 90% ARP traffic coming from the server was not getting a reply…. Checking in different switches, I could see the same pattern all over the place.

So why the server was making so many ARP requests?

After some time, managed to help help from a sysadmin with access to the servers so could troubleshoot the problem.

But, how do you find the process that is triggering the ARP requests? I didnt make the effort to think about it and started to search for an easy answer. This post gave me a clue.

ss does show you connections that have not yet been resolved by arp. They are in state SYN-SENT. The problem is that such a state is only held for a few seconds then the connection fails, so you may not see it. You could try rapid polling for it with

while ! ss -p state syn-sent | grep 1.1.1.100; do sleep .1; done

Somehow I couldnt see anything anything with “ss” so tried netstat as it shows you too the status of the TCP connection (I wonder what would happen is the connection was UDP instead???)

Initially I tried “netstat -a” and it was too slow to show me “SYN-SENT” status

Shame on me, I had to search how to get to show the ports quickly here:

watch netstat -ntup | grep -i syn_sent | awk '{print $4,$5,$6,$7}'

It was slow because it was trying to resolve all IPs to hostname…. :facepalm. Tha is fixed with “-n” (no-resolve)

Anyway, with the command above, finally managed to see the process that were in “SYN_SENT” state

This is not the real thing, just an example:

#  netstat -ntup | grep -i syn_sent 
tcp        0      1 192.168.1.203:35460     4.4.4.4:23              SYN_SENT    98690/telnet        
# 

We could see that the destination port was TCP 179, so something in the node was trying to talk BGP! They were “bird” processes. As the node belonged to a kubernetes cluster, we could see a calico container as CNI. Then we connected to the container and tried to check the bird config. We could see clearly the IPs that dont get ARP reply were configured there.

So in summary, basic TCP:

Very summarize, TCP is L4, then goes down to L3 IP. For getting to L2, you need to know the MAC of the IP, so that triggers the ARP request. Once the MAC is learned, it is cached for the next request. For that reason the first time you make a connection is slow (ping, traceroute, etc)

Now we need to workout why the calico/bird config is that way. Fix it to only use IPs of real BGP speakers and then verify the ARP storms stop.

Hopefully, I will learn a bit about calico.

Notes for UDP:

If I generate an UDP connection to a non-existing IP

$ nc -u 4.4.4.4 4000

netstat tells me the UDP connection is established and I can’t see anything in the ARP table for an external IP, for an internal IP (in my own network) I can see an incomplete entry. Why?

#  netstat -ntup | grep -i 4.4.4.4
udp        0      0 192.168.1.203:42653     4.4.4.4:4000            ESTABLISHED 102014/nc           
# 
#  netstat -ntup | grep -i '192.168.1.2:'
udp        0      0 192.168.1.203:44576     192.168.1.2:4000        ESTABLISHED 102369/nc           
# 
#
# arp -a
? (192.168.1.2) at <incomplete> on wlp2s0
something.mynet (192.168.1.1) at xx:xx:xx:yy:yy:zz [ether] on wlp2s0
# 

# tcpdump -i wlp2s0 host 4.4.4.4
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on wlp2s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
23:35:45.081819 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
23:35:45.081850 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
23:35:46.082075 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
23:35:47.082294 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
23:35:48.082504 IP 192.168.1.203.50186 > 4.4.4.4.4000: UDP, length 1
^C
5 packets captured
5 packets received by filter
0 packets dropped by kernel
# 
  • UDP is stateless so we can’t have states…. so it is always going to be “established”. Basic TCP/UDP
  • When trying to open an UDP connection to an external IP, you need to “route” so my laptop knows it needs to send the UDP connection to the default gateway, so when getting to L2, the destination MAC address is not 4.4.4.4 is the default gateway MAC. BASIC ROUTING !!!! For that reason you dont see 4.4.4.4 in ARP table
    • When trying to open an UDP connection to a local IP, my laptop knows it is in the same network so it should be able to find the destination MAC address using ARP.

Convert Images

I thought it would be easier to save a PNG file as JPG but I failed. I was pretty sure it should be a standard linux command for that. Naive.

Ok, so found something that does the job:

$ sudo aptitude install imagemagick
$ convert pic.png pic.jpg

apt-key deprecation

While updating Debian, I have seen this warning in the last days:

Fetched 11.4 kB in 3s (3,605 B/s)
W: http://www.deb-multimedia.org/dists/testing/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
W: http://deb.torproject.org/torproject.org/dists/testing/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details.
                          

I did read the apt-key manual but I wasn’t very clear how to proceed. So I searched for a bit and found this article. And it was exactly what I needed.

$ sudo apt-key list
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
/etc/apt/trusted.gpg
--------------------
pub   rsa4096 2014-03-05 [SC]
      A401 FF99 368F A1F9 8152  DE75 5C80 8C2B 6555 8117
uid           [ unknown] Christian Marillat <marillat@debian.org>
uid           [ unknown] Christian Marillat <marillat@free.fr>
uid           [ unknown] Christian Marillat <marillat@deb-multimedia>
uid           [ unknown] Christian Marillat <marillat@deb-multimedia.org>
sub   rsa4096 2014-03-05 [E]

pub   rsa2048 2009-09-04 [SC] [expires: 2024-11-17]
      A3C4 F0F9 79CA A22C DBA8  F512 EE8C BC9E 886D DD89
uid           [ unknown] deb.torproject.org archive signing key
sub   rsa2048 2009-09-04 [S] [expires: 2022-06-11]
...
...

Export the keys:

$ sudo apt-key export 65558117 | sudo gpg --dearmour -o /usr/share/keyrings/repo-debian-multimedia-testing.gpg 
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
$
 
 
$ sudo apt-key export 886DDD89 | sudo gpg --dearmour -o /usr/share/keyrings/repo-torproject-testing.gpg 
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
$ 

BTW, something I keep forgetting is what part of the pub key I needed. It is the last 8 digits (that you can see in the output of apt-key list). And that was mentioned in the article but I didnt pay attention…

Now update “/etc/apt/sources.list” adding “signed-by=/path to file created above” for each repo:

###Debian Multimedia
deb [arch=amd64 signed-by=/usr/share/keyrings/repo-debian-multimedia-testing.gpg] http://www.deb-multimedia.org testing main non-free

###TOR
deb [arch=amd64 signed-by=/usr/share/keyrings/repo-torproject-testing.gpg] http://deb.torproject.org/torproject.org testing main

Update and see if warning is gone:

# aptitude update 
Hit http://security.debian.org/debian-security testing-security InRelease
Hit http://deb.debian.org/debian testing InRelease                                                         
Ign https://apt.fury.io/netdevops  InRelease
Ign https://apt.fury.io/netdevops  Release
Hit http://www.deb-multimedia.org testing InRelease
Hit https://dl.google.com/linux/chrome/deb stable InRelease                                                                                       
Hit https://packages.cloud.google.com/apt cloud-sdk InRelease        
Hit http://deb.torproject.org/torproject.org testing InRelease
Get: 1 https://apt.fury.io/netdevops  Packages
Ign https://apt.fury.io/netdevops  Translation-en_GB
Ign https://apt.fury.io/netdevops  Translation-en
Ign https://apt.fury.io/netdevops  Contents (deb)
Ign https://apt.fury.io/netdevops  Contents (deb)
Fetched 11.4 kB in 3s (3,650 B/s)
                                         
# 

All good

And clean-up before finishing:

$ sudo apt-key del 65558117
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
OK
$ sudo apt-key del 886DDD89
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
OK
$ 

youtube-dl extract specific audio portion

I was watching a concert and I wanted to take just the audio of a song, no video. I knew you could download the full audio from videos pretty easily with youtube-dl but now just wanted an specific portion. Thanks to these links (link1 and link2) I managed to get what I wanted:

$ youtube-dl --youtube-skip-dash-manifest -g "VIDEO_URL"

# copy the second url (audio) from the above command output

$ audio_url="AUDIO_URL_FROM_ABOVE"

$ ffmpeg -i "$audio_url" -ss 00:00:30 -t 00:05:20.0 -q:a 0 -map a sample.mp3

# PLAY IT!

$ vlc sample.mp3