TPUv6, Alphafold, OOB design, OpenInterpreter, Walkie-Talkies, Zero Trust SSH, Videos, Finger Strength

Google TPUv6 Analysis: “… cloud infrastructure and which also is being tuned up by Google and Nvidia to run Google’s preferred JAX framework (written in Python) and its XLA cross-platform compiler, which speaks both TPU and GPU fluently.” So I guess this is a cross-compiler for CUDA?

“The A3 Ultra instances will be coming out “later this year,” and they will include Google’s own “Titanium” offload engine paired with Nvidia ConnectX-7 SmartNICs, which will have 3.2 Tb/sec of bandwidth interconnecting GPUs in the cluster using Google’s switching tweaks to RoCE Ethernet.” So again custom ethernet tweaks for RoCE, I hope it makes to the UEC? Not sure I understand having a Titanium offload and a connectx-7, are they not the same?

Alphafold: It is open to be used. Haven’t read properly the license.

OOB Design:

Open Interpreter: The next step in LLMs is to control/interact with your system.

In my laptop fails because I have the free version 🙁 need to try a different one, but looks promising!

open-interpreter main$ interpreter --model gpt-3.5-turbo



Welcome to Open Interpreter.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

▌ OpenAI API key not found

To use gpt-4o (recommended) please provide an OpenAI API key.

To use another language model, run interpreter --local or consult the documentation at docs.openinterpreter.com.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

OpenAI API key: ********************************************************************************************************************************************************************


Tip: To save this key for later, run one of the following and then restart your terminal.
MacOS: echo 'export OPENAI_API_KEY=your_api_key' >> ~/.zshrc
Linux: echo 'export OPENAI_API_KEY=your_api_key' >> ~/.bashrc
Windows: setx OPENAI_API_KEY your_api_key

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

▌ Model set to gpt-3.5-turbo

Open Interpreter will require approval before running code.

Use interpreter -y to bypass this.

Press CTRL-C to exit.

> what is my os?
Traceback (most recent call last):

Walkie-Talkies: Out of James Bond world.

Zero Trust SSH. From Cloudflare. And this video I watched some months ago (and it is already 4y).

Finger Strength: I follow similar protocol, although not everyday, for warm up and I think it works. I am not getting that super results but at least my fingers are stronger…. and I am not getting injuries!!!! \o/

Cisco AI/ML DC Infra Challenges: I am not quiet fan of Cisco products but this is a good overview.

Key points:

  • Create different networks (inter-GPU, front-end, storage, mgmt),
  • Inter-GPU:
    • – non-blocking, rails-optimized (fig.3)
  • Inter-GPU challenges:
  • – Packet loss: Use PFC +ECN (flow aware)
  • – Network delay: “Rich” QoS – proprietary QoS to handle mice flows. Needs good telemetry
  • – Network congestion: Some kind of communication switch-NIC
  • – Non-uniform utilization: Most vendors have something proprietary here, some dynamic LB and static-pinning?
  • – Simultaneous Elephant flows with large bursts: dynamic buffer protection (proprietary)

Videos:

  • Raoul Pal: Crypto Investment. His company. Go long run, invest a bit you can lose
  • Scott Galloway: Interesting his political analysis. Trump won and it seems Latins voted massively for him.
  • Bruce Dickinson: I read Bruce’s books some years ago so I was surprised to see him in a podcast. Need to finish it.
  • Eric Schmidt: I read one of his books some time ago so again, surprised to find him in a podcast. Still think Google has become evil and most of the good things he says are gone.
  • Javier Milei: I am not economist but it “seems” things are improving in Argentina. He is a character nonetheless. Need to finish it.
  • Matthew McConaughey: His book was really refreshing, and seeing him talking is the same. Raw, real.
  • Alex Honnold: You have to try hard if you want to do hard things.

Huberman: Social Connection, Optimal protocols for studying and learning

Social Connection: Long video but worth it. Listen to it in several parts. Send a text to your close friends, connections… daily. Oh, so hard.

Some other notes (copied too)

Stress: boosts immune system short term, crashes brain+body long term.

Sleep. Water. Movement. Sun, Meditation. Rest. Rhythm. Love. Gratitude.

Porn and dopamine: scary

“If I go to my phone, I’m a consumer. If I go to my journal, I’m a creator.”

Optical Protocols for Studying and learning: This is a current video and some months ago I read something similar, but I dont mind to repeat it. It is very good. And a summary

Good sleep, dedicated time, focus, meditations to improve focus, similar time of day, teach it, test it frequently, create engagement. Reduce knowledge decay a.k.a = learning

Cloudflare backbone 2024, Cisco AI, Leetcode, Alibaba HPN, Altman UBI, xAI 100k GPU, Crowdstrike RCA, Github deleted data, DGX SuperPod, how ssh works, Grace Hooper Nvidia

Cloudflare backbone 2024: Everything very high level. 500% backbone capacity increase since 2021. Use of MPLS + SR-TE. Would be interesting to see how the operate/automate those many PoPs.

Cisco AI: “three of the top four hyperscalers deploying our Ethernet AI fabric” I assume it is Google, Microsoft and Meta? AWS is the forth and biggest.

Huawei Cloud Monitor: Haven’t read the paper RD-Probe. I would expect a git repo with the code 🙂 And refers to AWS pdf and video.

Automated Leetcode: One day, I should have time to use it a learn more programming, although AI can solve them quicker than me 🙂

Alibaba Cloud HPN: linkedin, paper, AIDC material

LLM Traffic Pattern: periodically burst flows, few flows (LB harder)

Sensitive to failures: GPU, link, switch, etc

Limitations of Traditional Clos: ECMP (hash polarization) and SPOF in TORs

HPN goals:

-Scalability: up to 100k GPU

-Performance: low latency (minimum amount of hops) and maximum network utilization

-Reliability: Use two TORs with LACP from the host.

Tier1

– Use single-chip switch 51.2Tbps. They are more reliable. Dual TOR

– 1k GPUs in a segment (like nv-link) Rail-optimized network

Tier2: Eliminating load imbalance: Using dual plane. It has oversubscription

Tier3: connects several pod. Can reach 100k GPUs. Independent front-end network

Altman Universal Base Income Study: It doesnt fixt all problems, but in my opinion, it helps, and it is a good direction.

xAI 100k GPU cluster: 100k liquid-cooled H100s on single RDMA fabric. Looks like Supermicro involved for servers and Juniper only front-end network. NVIDIA provides all ethernet switches with Spectrum-4. Very interesting. Confirmation from NVIDIA (Spectrum used = Ethernet). More details with a video.

Crowdstrike RCA:

Github access deleted data: Didn’t know about it. Interesting and scary.

Nvidia DGX SuperPod: reference architecture. video. 1 pod is 16 racks with 4 DGX each (128×8=1024 GPU per pod), 2xIB fabric: compute + storage, fat tree, rail-optimized, liquid cooling. 32k GPU fills a DC.

How SSH works: So powerful, and I am still so clueless about it

Chips and Cheese GH200: Nice analysis for Nvidia Grace CPU (ARM Neoverse) and Hopper H100 GPU

LLM n C, 1.6nm, xz vul, turing 2024, let’s encrypt, chatdev, Ethernet vs IB, Slingshot, Tailscale ssh, videos, 42 rules, CNI, Cilium

Origins of deep learning: interesting post. At the beginning all was Matlab and CPU bounded. repo

LLM in C: post and repo.

A16: 1.6nm process for 2026. More frequency, less power.

xz vulnerability repo: Something I need to check in the VP

Turing Award 2024: zero-knowledge-proof.

Cloudflare and Let’s Encrypt’s certificate change: I haven’t heard of this until recently. I use Let’s Encrypt so as far as I can read, makes sense what they are doing. But didnt know 2% Cloudflare customer were using the “cert”

ChatDev: Communicate agents for software development. I am a not a developer but I would use this just as a starting point If I have any idea for a project. I would remove the C-suite agents, at least for low level projects.

IB vs Ethernet: A bit of bias here (the author is from Broadcom -> Ethernet). I have no hands-on experience with IB, but I have read the cables are not cheap… Let’s see when UltraEthernet gets into the market. Another view.

Slingshot and Juniper: A bit of bias again as HP bought Juniper. So how will these interconnects fade inside the company? As far as I know, most supercomputers use some “special” interconnect so not much ethernet there. But the money nowadays is in AI infra… Paper for slingshot (haven’t read it)

Tailscale SSH, wireguard throughput: These are things I should a spend a bit of time one day and consider if I should use them (I dont like it is not opensource though). This netmaker?

Videos:

Jocko Willink: Discipline = Freedom. Remember but not dwell. Good leader, delegate. Be a man -> take action, bonding (pick your activity)

Jimmy Carr: Imposter syndrome each 18 months, so you have to stand-up. People crave the success not the journey. Teaching comedy good for communicating.

Sam Altman – Stanford 2024: First time I see him talking. It has some funny moments. More powerful computers. I missed a question about opensource LLM and closed ones.

Find a girlfriend: I know just a little bit about the person (I want to read one of his books) from other books and videos. I would think he would have already a girlfriend or family. From the three methods, definitely, the face to face approach in the street looks so much better (and that’s what I would like to do)

Jordan Peterson original 42 rules

CNI performance: I have used kubernetes since I studied for CKAD but still I am interested in the networks side. I didn’t know about Kube-router and it did great! I am bit surprised with Calico as I have read more and more about Cilium.

Cilium for network engineers. I have to read this fully (worried that Cisco bought it…)

rsync go, NASA SP287, git options, Undersea cable failures in Africa, Quotes, Log4j, done list, Dan Lynch, Systems-based Productivity, Run Africa

rsync go: Interesting talk about rsync, as it explains how it works and it is something I didnt know. But then, all other things/projects mentioned are cool and related. I need to try to install rsync go in my vm. ccc slides and repo

NASA to the moon: This is an engaging and provocative video regarding the Artemis III (project back to the moon II). He makes some hard questions to the people in charge (I have no clue about physics) and it seems he has a point. Not sure it this will get any effect but again, looks “smart”. When he mention the NASA SP287 (What made Apollo a success) document as the grial for going back to the moon, I wanted to get a copy (here) so I could read it one day.

Git options: Nice post about popular git config options. I am a very basic git user (and still sometimes I screw up) but the options to improve diff looks interesting so I will give it a go at work.

Undersea cable failures in Africa: It is clear that Africa relays heavily in submarine cables (it doesnt look like there are many cable systems intra continent). And the Red Sea is becoming a hot area due to different conflicts…

Quotes: I like the ones regarding simplicity:

A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system. (John Gall)

In programming, simplicity and clarity are a crucial matter that decides between success and failure. (Edsger Dijktra)

Log4j: This is old news but when it came out I tried to run the PoC but I failed 🙁 This is just a reminder. It was annoying because I manged to install all tools but never managed to exploit it.

Done List: I feel totally identified. The to-do list is never done and you feel guilty. Done-list, much healthier.

Dan Lynch: He passed away, and as usual on my ignorance, it seems he is one of the unsung heroes of Internet, migrating ARPANET to TCP/IP.

Systems-Based Productivity: TEMPO refers to five dimensions of productivity: T (Time Management), E (Energy Management), M (Mindset), P (Proficiency) and O (Organization).

Run Africa: very jealous.

Infraops challenge, Devika, Daytona, NTP 2038, Linux Crisis Tools, videos, Chocolonely, LLM, Transformers, Enforce-first

InfraOps challenge: A bit beyond me, but interesting If you could try without applying for the job.

Devika: Agent AI. Another thing I would like to have time to play with it. If you have API keys for some LLMs, looks like it shouldn’t be difficult to run and you dont need a powerful laptop (?)

Daytona: My development environment is a joke, just python envs. But I guess for more serious devs, could be interesting

NTP and year 2038: Agree, when it is not DNS, it is likely NTP (seen this with VPNs and SSL certs in boxes with NTP unsync), or something blocking UDP.

Linux crisis tools: I haven’t got my hands dirty with BPF but I am surprised with so many tools. I would add nc, netstat, lsof, traceroute, ping, vim, openssl etc but because I do pure networks.

Jim Kwik: How to improve your reading speed. One improvement is you use your finger or a ruler. Need to watch again.

Rich Roll: The guy is super chill. I would like to be able to do some ultra at some point in life… Very personal conversation.

Ferran Adria: I didnt know much about the person apart from being one of the best Chefs in history. I like how he starts the interview and take over for 15 minutes. Haven’t watched till the end. But just the beginning is priceless.

Mark Manson: I have read all his books and his emails. Interesting his story.

Chocolonely: I didnt know it was a dutch company and interesting history behind. I want to try one day, but I haven’t found a dark choco version.

LLM in 1000 lines puce C: I was always crap at C. But interesting this project as something educational and intro in LLM.

Visual intro to transformers: The easy joke, unfortunately, this is not about Optimus Prime.

Indonesia Heavy Metal Girls: Unexpected. Respect.

Enforce-first-as: I dint know about this until last week. Cisco defined by default. Juniper disabled by default. And this makes sense with Route Servers.

Love Languages, imposter syndrome, self-compasion, GTC-2024, Juniper Express 5

Love Languages: I read this book in 2018. The conclusion I took at that time (and a bit late…) it is that you have to F*! communicate…

Interesting story about imposter syndrome:

We’d like to believe that if we only had the adulation, market success, and fan support of superstars like these, then we’d finally be comfortable and able to do our best.

In fact, it seems the opposite is true. Imposter syndrome shows up because we are imposters, imposters acting ‘as if’ in search of making something better.

Perhaps the best plan is to show up and not walk out.

Self-compassion: Something I have learnt the hard way, and I think at the beginning works but long term doesn’t. I practice it often while climbing and honestly, I feel the difference, and sometimes is mindblowing. Nobody is going to cheer me up so I better off doing it myself.

GTC-2024: Like last year, I registered to watch some conferences. As a network engineer, I haven’t been able to see any (good) recording, just pdfs…. so quite disappointing. This is a summary from somebody that was on site and says it was great. And some other notes that they look interesting: keynote (nvlink and infiniband at 800G), nvdia dgx gb200 (indeed we need nuclear energy to feed all this…)

Juniper Express 5: Looks quite an interesting ASIC. But as far as I can see most ASICs for DC and AI/ML come from Broadcom and the main players are Cisco/Arista. I like the feature of deep buffers.. this is still a bit of a religious dilema… deep vs shallow buffers. And looks like it was announced in HotChips 2022.. so it is not very new? And only in PTX platform. What is the future of QFX?

Sales Psychology, BERT testing, EVPN asymmetric/symmetric, git sync fork

Sales Psychology: I have noticed with myself lately, since I subscribed to a youtube channel, everything is a “negativity bias”. I can’t see any video with a positive message. I subscribed because I want to learn and improve but the publicity is wrong.

BERT Testing: I wonder if there is anything opensource.

Git sync fork. This something I have never tried before

1- Add remote

0) check your remote
git remote -v
1) Add new remote
git remote add upstream URL
2) git fetch/pull from the upstream
git pull upstream

EVPN VXLAN Asymmetric/Symmetric routing: blog1

Asymmetric IRB
– Ingress VTEP does both L2 and L3 lookup
– Egress VTEP does L2 lookup only
– Bridge – Route – Bridge
– Pros: “easy” to configure – just copy/paste. Identical config with the only difference in SVI IP addresses.
– Cons: on the way back, traffic will be reversed => all VXLANs need to be configured on all VTEPs => increased ARP cache and CAM table sizes and control plane scaling issue => not very efficient.

Symmetric IRB
– Ingress VTEP does both L2 and L3 lookup
– Egress VTEp does both L3 and L2 lookup
– Bridge – Route – Route – Bridge
– L3 VNI should be configured on all VTEPS, L2 VNIs only where local ports exist

Other things about EVPN: link1 link2

Life, Love, Sex, Negative Beliefs, startup regrets, nanog90, Groq LPU, LLM from scratch, ssh3, eBFP BGP, RPKI, TIANHE-3

I hit rock bottom this week. I hope I finally closed one door in my life so I give myself the chance to open others. Made the wrong decision? It is easy when you look back. Do I regret it? The most annoying thing is these are failures so you can’t go back and recover. But I was so bloody newbie!!!…. At least after 5 years…

“For every reason it’s not possible, there are hundreds of people who have faced the same circumstances and succeeded.” Jack Canfield

Head down, crying, cursing, whatever, but forwards. As it has always been.

—-

Somehow managed to list to long videos, something I normally can’t manage (because lack of time, etc)

Negative Beliefs, avoid bitterness, aim for greatness (remarkable things), scape the darkness: Jordan B Peterson with Modern Wisdom: video, podcast.

Find and keep Love: video. 1st Get your shit together. Communication is critical. Be careful with your shopping list….

Good Sex: video. Communicate….

Orgasm: video. Haven’t seen it completely yet but very interesting. Use your tongue wisely.

— Other things:

Startup decisions and regrets: page. Interesting. I think most of things are very specific but still good to read.

Nanog90: agenda I didnt want the videos but I reviewed several pdfs and these ones look interesting:

Abstract Ponderings: A ten-year retrospective. Rob Shakir – Google: video

https://rob.sh/post/reimagining-network-devices/
https://rob.sh/post/coaching/
https://cdn.rob.sh/files/the-next-spring-forward_2018.pdf
https://research.google/research-areas/networking/

AI Data Center networks – Juniper – video

Using gNOI capabilities to simplify software upgrade use case: video – I had to idea about gNOI so looks interesting. It is crazy that still in XXI, automating a network device is so painful. Thanks to all vendors to make your life miserable.

Go lang for network engineers: video slides– I always thought that Golang had a massive potential for network automation but there was always lack of support and python is the king. So nice to see that Arista has things to offer.

PTP in Meta: video and blog.

There are more things, but havent had the chance to review them.

—-

It looks there is new chatbot that is not using the standard NVIDIA GPU. Groq uses LPU (Language Processing Unit). And they say it is better than a GPU. They have this paper but I can’t really see feature of that LPU.

Slurp’it: Show this blog, and the product looks interesting but although is free, it is not opensource and at the end of they you dont want a new vendor-lockin

Container lab in kubernetes: Clabernetes. I would like to play with this one day.

NetDev0x17: videos and sessions. link This is quite low details and most of the time beyond my knowledge. Again, something to take a look at some point.

LLM from scratch: repo. Looks very interesting. But the book it is going to take a long time to hit the market.

ssh3: repo. Interesting experiment.

eBFP and BGP: blog. Really interesting. Another thing that always wanted to play with.

Orange RPKI: old news but still interesting to see how much damaged can cause RPKI in the wrong hands…

China TIANHE-3 Supercomputer: Very interesting. Link.