Meta GenAI Infra, Oracle RDMA, Cerebras, Co-packaged optics, devin, figure01, summarize youtube videos, pdf linux cli, levulinic acid

Meta GenAI infra: link. Interesting they have built two cluster one Ethernet and the other Infiniband, both without bottlenecks. I don’t understand if Gran Teton is where they install the NVIDIA GPUs? And for storage, I would expect something based on ZFS or similar. For performance, “We also optimized our network routing strategy”. And it is critical the “debuggability” for a system of this size. How quick you can detect a faulty cable, port, gpu, etc?

Oracle RDMA: This is an ethernet deployment with RDMA. The interesting part is the development DC-QCN (some ECN enhancement)

Cerebras WSE-3: Looks like outside NVIDIA and AMD, this is the only other option. I wonder how much you need to change your code to work in this setup? They say it is easier… I like the pictures about the cooling and racks.

Co-packaged optics: Interesting to see if this becomes a new “normal”. No more flapping links anybody? It is the fiber or replace the whole switch….

I have been watching several videos lately and I would like to be able to get a tool to give a quick summary of the video so I can have notes (and check if the tool is good). Some tools: summarize.tech, sumtubeai

video1, video2, video3, video4, video5, video6, video7, video8, video9

Devin and Figure01: Looks amazing and scary. I will need one robot for my dream bakery.

I wanted to “extract” some pages from different pdfs in just one file. “qpdf” looks like the tool for it.

qpdf --empty --pages first.pdf 1-2 second.pdf 1 -- combined.pdf

levulinic acid: I learnt about it from this news.

Rebuild VPS/Blog

Quite overdue. I need to fill all the gaps properly

I went from Apache to NGINX to make it more challenging for me…

Make a proper backup of your VPS.

More details

Get latest Debian stable

Or try to upgrade…link:

SSL Cert

Be sure you have Let’s Encrypt setup for getting your free. Good thing, the DNS side was already done so it was just to configure NGINX

WordPress

Link.

phpMyAdmin

Link. I struggled with this. I had to make a minimal config and then put my backup. After that, I had my blog fully recover

NGINX config

Link. I struggled here because I had to serve my blog and phpadmin from nginx. I knew how to do it via Apache but was failing with nginx. I asked ChatGPT and at the end it gave me the solution

This is the final config:

server {
    server_name thomarite.uk blog.thomarite.uk;
    root /var/www/html/wordpress;
    index index.php;

    access_log /var/log/nginx/thomarite.uk.access.log;
    error_log /var/log/nginx/thomarite.uk.error.log;

    client_max_body_size 100M;

    location / {
        try_files $uri $uri/ /index.php?$args;
    }

    location ~ \.php$ {
        include snippets/fastcgi-php.conf;
        fastcgi_pass unix:/var/run/php/php8.2-fpm.sock;
        include fastcgi_params;
        fastcgi_intercept_errors on;
    }

    location /phpmyadmin/ {
        alias /usr/share/phpmyadmin/;
        index index.php;

        location ~ \.php$ {
            include snippets/fastcgi-php.conf;
            fastcgi_pass unix:/var/run/php/php8.2-fpm.sock;
            include fastcgi_params;
            fastcgi_intercept_errors on;

            fastcgi_param SCRIPT_FILENAME $request_filename;
            fastcgi_param PATH_INFO $fastcgi_path_info;
        }
    }

    location ~* \.(?:svgz?|ttf|ttc|otf|eot|woff2?)$ {
        add_header Access-Control-Allow-Origin "*";
        expires 90d;
        access_log off;
    }

    location ~ /\.ht {
        access_log off;
        log_not_found off;
        deny all;
    }

    listen [::]:443 ssl ipv6only=on; # managed by Certbot
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/thomarite.uk-0001/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/thomarite.uk-0001/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}

server {
    if ($host = blog.thomarite.uk) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    if ($host = thomarite.uk) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    listen 80;
    listen [::]:80;
    server_name thomarite.uk blog.thomarite.uk;

    location / {
        return 404;
    }
}


Then

sudo nginx -t
sudo service nginx restart

IRCD

I need to check how this is installed properly. Check this.

Sales Psychology, BERT testing, EVPN asymmetric/symmetric, git sync fork

Sales Psychology: I have noticed with myself lately, since I subscribed to a youtube channel, everything is a “negativity bias”. I can’t see any video with a positive message. I subscribed because I want to learn and improve but the publicity is wrong.

BERT Testing: I wonder if there is anything opensource.

Git sync fork. This something I have never tried before

1- Add remote

0) check your remote
git remote -v
1) Add new remote
git remote add upstream URL
2) git fetch/pull from the upstream
git pull upstream

EVPN VXLAN Asymmetric/Symmetric routing: blog1

Asymmetric IRB
– Ingress VTEP does both L2 and L3 lookup
– Egress VTEP does L2 lookup only
– Bridge – Route – Bridge
– Pros: “easy” to configure – just copy/paste. Identical config with the only difference in SVI IP addresses.
– Cons: on the way back, traffic will be reversed => all VXLANs need to be configured on all VTEPs => increased ARP cache and CAM table sizes and control plane scaling issue => not very efficient.

Symmetric IRB
– Ingress VTEP does both L2 and L3 lookup
– Egress VTEp does both L3 and L2 lookup
– Bridge – Route – Route – Bridge
– L3 VNI should be configured on all VTEPS, L2 VNIs only where local ports exist

Other things about EVPN: link1 link2

Gaming Latency, LLM course, Anycast ipv6

Another LLM course: and looks quite good. But dont think I will have time to use it.

Nice video about Gaming Latency:

How to curl an ipv6:

$ curl -v -g -k -6 'https://[2603:1061:13f1:4c06::]:443/'
Trying [2603:1061:13f1:4c06::]:443...
Connected to 2603:1061:13f1:4c06:: (2603:1061:13f1:4c06::) port 443
ALPN: curl offers h2,http/1.1

The destination address is indeed IPv6 anycast: 2603:1061:13f1:4c06:: (notice the “::” at the end)

According to RFC4291 https://www.rfc-editor.org/rfc/rfc4291.html#section-2.6

Image

So it is indeed an anycast address.

According to Cisco (haven’t been able to find the RFC, haven’t looked much), this shouldn’t happen:

https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/ipv6_basic/configuration/xe-3se/5700/ip6-anycast-add-xe.html

Image

So how I can curl and ipv6 anycast address from MS as it were a host??

Discord – 68 Bits – HC2023 Google – Huge – Terrapin

Discord Scale: I think I read something about Elixir (and BEAM). So It was nice to side a successful product built with it. And how Discord has managed to keep pushing the scale of their platform. Everything is high level but gives you an idea.

68 Bits of advice: From Kevin Kelly

HotChips 2023: I received an email with all presentations and videos. Some picked my curiosity (although ALL of them are out of my understanding

  • Exciting Directions for ML Models and the Implications for Computing Hardware: video and pdf. A lot of focus in power consumption and reduce CO2. The optical I am still struggling. But it is interesting that they say they go for liquid cooling and beyond Ethernet for the supercomputer.
  • Inside the Cerebras Wafer-Scale Cluster: video and pdf. I have read about Cerebras before so it was nice to read/see something directly from them.

They made Google Huge: based on link. From the google presentation above, and the end there are a lot of references about the authors. I think I read about it in the past but It was nice to re-read it again.

Terrapin: SSH vulnerability. I need to patch 🙁

Bluetooth on Linux

I have never used bluetooth in Linux before. I have a bluetooth headphone from work that works fine with my phone and macos but I wanted to try it in Linux.

So I give it a quick go last night. This was my initial link. I had already installed the driver:

# dpkg -l | grep bluez
ii bluez 5.70-1.1 amd64 Bluetooth tools and daemons
ii bluez-obexd 5.70-1.1 amd64 bluez obex daemon
#

I had to install “blueman” that is the frontend to manage your bluetooth devices later:

# dpkg -l | grep blueman
ii blueman 2.3.5-3 amd64 Graphical bluetooth manager
#

The service was already enabled:

root@athens:/boot# systemctl status bluetooth.service
● bluetooth.service - Bluetooth service
     Loaded: loaded (/usr/lib/systemd/system/bluetooth.service; enabled; preset: enabled)
     Active: active (running) since Sat 2024-01-06 11:58:33 GMT; 54min ago
       Docs: man:bluetoothd(8)
   Main PID: 1137 (bluetoothd)
     Status: "Running"
      Tasks: 1 (limit: 9334)
     Memory: 3.1M ()
     CGroup: /system.slice/bluetooth.service
             └─1137 /usr/libexec/bluetooth/bluetoothd

Jan 06 11:58:42 athens bluetoothd[1137]: Endpoint registered: sender=:1.43 path=/MediaEndpoint/A2DPSource/opus_05_duplex
Jan 06 11:58:42 athens bluetoothd[1137]: Failed to add UUID: Failed (0x03)
Jan 06 11:58:42 athens bluetoothd[1137]: Failed to add UUID: Failed (0x03)
Jan 06 11:58:44 athens bluetoothd[1137]: Failed to add UUID: Failed (0x03)
Jan 06 11:58:44 athens bluetoothd[1137]: Failed to add UUID: Failed (0x03)
Jan 06 11:58:44 athens bluetoothd[1137]: Failed to add UUID: Failed (0x03)
Jan 06 11:58:44 athens bluetoothd[1137]: Failed to add UUID: Failed (0x03)
Jan 06 11:58:44 athens bluetoothd[1137]: Failed to add UUID: Failed (0x03)
Jan 06 11:58:44 athens bluetoothd[1137]: Failed to add UUID: Failed (0x03)
Jan 06 11:58:44 athens bluetoothd[1137]: Failed to add UUID: Failed (0x03)
root@athens:/boot# 

Then I had to enable bluetooth:

# rfkill list
0: hci0: Bluetooth
Soft blocked: yes
Hard blocked: no
1: phy0: Wireless LAN
Soft blocked: no
Hard blocked: no
#
# rfkill unblock bluetooth
#
# rfkill list
0: hci0: Bluetooth
Soft blocked: no
Hard blocked: no
1: phy0: Wireless LAN
Soft blocked: no
Hard blocked: no
#

Then I can test it:

$ blueman-manager &

But once I paired the device…. I had an error:

br-connection-profile-unavailable

I found several links and this is the only things that worked for me: link1 and link2. So I had to install “libspa-0.2-bluetooth” and reboot:

# dpkg -l | grep libspa-0.2-bluetooth
ii libspa-0.2-bluetooth:amd64 1.0.0-1 amd64 libraries for the PipeWire multimedia server - bluetooth plugins
#

So I managed to paired the headseats without error but then a new issue… I lost internet connection…. And after checking several things, it was just enabling bluetooth that caused the lost of internet access. There was nothing in the logs saying anything about my wifi disconnecting or anything similar….

If I disabled bluetooth, my connection was back…. so more work to do. So it seems there is some interference between the modules or the drivers? Searched things about it. I checked this. But no luck yet.

To be continued.

BIND performance – LACP Troubleshooting – Chiplets – AI/HPC Networking – Spray ML/AI workloads

BIND: Interesting links about BIND performance and the lab setup. DNS is the typical technology that looks straightforward but as soon as you dig a bit, it is a world itself

LACP: Interesting blog about troubleshooting details. As above, this is the typical tech that you give for granted that works but then, you need to really understand how it works to troubleshoot it. So I learned a bit (although the blog is “old”)

Chiplets: Very good blog. Explaining the origin of getting to chiplets. Interesting evolution and good touch to mention the network industry, and not just CPU/GPU.

As the process node shrank, manufacturing became more complex and expensive, leading to a higher cost per square millimeter of silicon. Die cost does not scale linearly with die area. The cost of the die more than doubles with doubling the die area due to reduced yields (number of good dies in a wafer).

Instead of packing more cores inside a large die, it may be more economical to develop medium-sized CPU cores and connect them inside the package to get higher core density at the package level. These packages with more than one logic die inside are called multi-chip modules (MCMs). The dies inside the multi-chip modules are often referred to as chiplets.

AI/HPC Networking: Nice summery about AI vs HPC, and what each hyperscaler and vendors are doing. For me is quite interesting how to get proper loadbalacing of flows like AWS SDR. This should be an actual standard by any network vendor or software to aim to that goal. I guess it is not easy.

High performance requirements can create a vendor lock-in. Doesn’t matter if it is IB or Ethernet. So pick your evil.

Spray ML/AI worloads: Based on the above regarding the loadbalacing, this is an interesting article about how to generate loadbalancing in ML workloads when it is based in just one elephant flow. So you need Adaptive routing in your fabric/switches, NICs that support it and support from your code/library.

VimGPT – Maia AI – Mirai – Reptar – Mellanox Debian – RISC-V DC – Mojo – Moors Law

VimGTP: Very interesting project. I haven’t used it. But thinking aloud, you could use it to interact with sites that dont have API (couriers)? I think with Selenium you can do things like that?

Maia AI: CLoud providers like to be masters of their own destiny so try to build as many things by themselves as possible. So now MS has developed its GPU for AI. It is quite interesting the custom rack they had to built with the sidekick for cooling down the new chips. There are no many figures about the chip (5nm, 105b transistors) to compare with other things in the market.

Reptar: new Intel CPU vulnerability. It looks like is a feature from Ice Lake architecture. It looks like you can crash the cores but no yet take over. Still interesting.

I am not affected 🙂

$ grep fsrm /proc/cpuinfo
$

Mellanox with Debian: Interesting how you can install a nearly standard Debian into a Mellanox SN2700 switch.

RISC-V into datacenter: Happy to see RISC-V chips in the datacenter. But not clear who is going to use them?

Mirai history: I think most of wired articles read like a holywood movie 🙂 Although 2016 security issues are “old” school, still interesting how teenagers got that far.

Mojo: Interesting because of the people behind of it… really impressive.

Moor’s law analysis: I liked the part about networks, that is not very common mentioned in these type of analysis.

RFKill

Somehow my linux laptop sometimes disables WIFI when I upgrade it. It doesnt really bother me as I can enable it by an icon in the UI but one day my UI lost the panel with that icon after another upgrade. So I had to learn how to enable the wifi. Via this page, I learned about the different status and then checking the options of rkfill command got my WIFI enabled back again.

# rfkill list
0: phy0: Wireless LAN
	Soft blocked: yes
	Hard blocked: no
1: hci0: Bluetooth
	Soft blocked: yes
	Hard blocked: no
# 
# rfkill unblock wifi
# 
# rfkill list
0: phy0: Wireless LAN
	Soft blocked: no
	Hard blocked: no
1: hci0: Bluetooth
	Soft blocked: yes
	Hard blocked: no
# 

At some point, I would like to test bluetooth in my laptop.

Curl, Yaml, scalars, Elixir, git stash

I haven’t watched this video, but looks like the holly book of curl!!!

I'd recommend starting at ~34 minutes.

·You can specify multiple URLS with multiple output options in a single command. Doing this or using globbing (see below) to the same host will use persistent connections and greatly improve performance because the same L5 session is used

·trurl is also made by the project and allows you to programmatically manipulate URLs (change server, path, query parameters, etc.). Pretty neat: https://github.com/curl/trurl

·curl supports URL globbing: curl https://{ftp,www,test}.example.com/img[1-22].jpg -o "foo_#2_#1.jpg"

·By default, curl will resolve requests serially when multiple URLS or globbing is specified, but curl is capable of doing parallel transfers with the -Z or --parallel option. And can do anywhere from 2-300 transfers in parallel. This also has the potential to parallel-ize HTTP/3 transfers even from single URLs.

·You can do curl --help category to get a list of help categories for narrowing down options by categories like http or output

· Long commands for curl can be specified in a file and given to curl either via stdin or -K / --config - These files are essentially just command lines in a file

·You can use the --trace option to provide tcpdump type output from curl. Saving the need to to start tcpdump in the background if you just want to see what's happening from curl

·You can use --connect-to to specify a different DNS name to go to (instead of the one specified in the URL) which is similar to the --resolve option, but doesn't require the user to lookup the IP address ahead of time

·You can override the DNS server that you use to resolve URLs via --dns-ipv4-addr 8.8.8.8 for example

·You can add --libcurl to any curl command and it will spit out C source-code that implements the same command line in C via the library libcurl

·You can set the environment variable SSLKEYLOGFILE to a file name and it will save the runtime TLS secrets to that file, and use that file in WireShark along with a dump of the traffic from tcpdump to see the contents of encrypted HTTP streams

·You can choose to only download files that have changed since the last time they were downloaded with curl via --etag-save <etag_file> and --etag-compare <etag_file>

·You can skip adding the extra -H "Content-Type: application/json" when getting or posting JSON data (with -d), by specifying --json instead of just -d

·You can create JSON easily from the command line with the tool jo: https://github.com/jpmens/jo (basically a reverse jq)

Rant about yaml. And something I learned about yaml some months ago and forgot about it: scalars for making multiline work in yaml.

Elixir: a programming language based on Erlang. Really impressive reports! But still I would like to learn golang (if I ever learn properly python 🙂

git stash: I didnt know about this git command until last week, very handy.