Spectrum-X with Cisco Silicon: I dont understand this move much. You are selling your Ethernet solution is the best for AI and then you bring a different one?
Quantum Computing: Several news lately from MS Majorana (official)and AWS Ocelot. Still, is being used in real problems? Just PR?
Build your own DC: good intro, I dont think you can find many books about this in amazon?
AWS HPC: Didn’t know AWS offered HPC services (articule from 2021). I liked to find more details about SDR: Multipath LB, Out of Order delivery, congestion control similar to BBR. I wonder, this is not the same as UltraEthernet consortium is trying to achieve?
Multipath-tcp: The above probably works in “close” networks (managed by one entity) but maybe it is not going to work in the Wild internet. Still this looks still quite far from production. I believe this like QUID. Somebody like google deploys it and the rest jump in the wagon (more or less)
OCSP death: “OCSP is not making anyone more secure. Browsers are either not checking it or are implementing it in a way that provides no security benefits. “
AlphaChip: As far as I have read, designing chip is one of the most complex things and getting help from AI can even increase the advances in chip design. I read that NVIDIA had something similar. And this should be applied to ASICs too so networking is benefited
Crawl4AI: Interesting for digestion your local knowledge base sites and using with your local LLM….
Run your locally AI: I tried this in my work MacBook and it worked! I want to create an AI agent for a work project (actually i am dreaming to be able to achieve it….)
Open Web UI + Ollama: I tested this too in my MacBook and works like magic! You can even use DeepSeek 🙂
Training your AI: My idea is to get an open-source LLM trained with my data so I can use it to do my “job” But in the video there was too much publicity and I dont have access to a GPU… but I dont much data neither (or that’s what I think)
Bob Bowman (Michael Phelps coach): Show up, do the job.
AWS re:Invent 2024 – NET201: The only interesting thing is minute 29 with the usage of hollow core fiber, to improve latency. I assume it is used in very specific parts of the network, looks a bit fragile. Elastic Fabric Adapter, not really good explanation what it is, where doest it run: network, server, nic? but it seems important. Looks like SIDR?
AWS re:Invent 2024 – NET403: I think 401 and 402 were more interesting. There were repeated things from the two other talks. Still worth watching and hopefully there is a new one in 2025.
GenCast: weather predict by Google Mind. Not sure until what point, this can be used by anybody? And how much hardware you need to run it?
we’ve made GenCast an open model and released its code and weights, as we did for our deterministic medium-range global weather forecasting model.
Videos:
510km nonstop – Ross Edgley: I have read several of his books and it is the first time I watch a full interview. Still I am not clear what his dark side is.
Testa TCP replacement: Instead of buying and spending a lot of money, built what you need. I assume very smart people around and real network engineering taking place.It is like a re-write of TCP but doesnt break it so your switches can still play with it. It seems videos are not available in the hotchips webpage yet. And this link looks even better, even mentions Arista as the switching vendor. (video from hotchips24)
Cerebras Inference: From hotchips 2024. I am still blow away for the waferscale solution. Obviously, the presentation says its product is the best but I wonder, can you install a “standard” linux and run your LLM/Inference that easily?
Leopold AIG race: Via linkedin, then the source. I read the chapter 3 regarding the race to the Trillion-Dollar cluster. It all looks Sci-Fi, but I think it may be not that far from reallity.
Cursor + Sonet: Replacement for copilot? original I haven’t used Copilot but at some point I would like to get into the wagon and try things and decide for myself.
AI AWS Engineering Infra: low-latency and large-scale networking (\o/), energy efficiency, security, AI chips.
NVLink HGX B200: To be honest, I always forger the concept of NVLink and I told my self it is an “in-server” switch to connect all GPUs in a rack. Still this can help:
At a high level, the consortium’s goal (UltraEthernet/ UA) is to develop an open standard alternative to Nvidia’s NVLInk that can be used for intra-server or inter-server high-speed connectivity between GPU/Accelerators to build scale-up AI/HPC systems. The plan is to use AMD’s interconnect (Infinity Fabric) as the baseline for this standard.
Netflix encoding challenges: From encoding per quality of connection, to per-title, to per-shot. Still there are challenges for live streaming. Amazon does already live streaming for sports, have they “solved” the problem? I dont use Netflix or similar but still, the challenges and engineering behind is quite interesting.
Some career advice from AWS: I “get” the point but still you want to be up to speed (at certain level) with new technologies, you dont want to become a dinosaur (ATM, frame-relay, pascal, etc).
Again, it’s not about how much you technically know but how you put into use what you know to generate amazing results for a value chain.
Get the data – be a data-driven nerd if you will – define a problem statement, demonstrate how your solution translates to real value, and fix it.
“Not taking things personally is a superpower.” –James Clear
Because “no” is normal.
Before John Paul DeJoria built his billion-dollar empire with Patrón and hair products, he hustled door-to-door selling encyclopedias. His wisdom shared at Stanford Business School on embracing rejection is pure gold (start clip at 5:06).
You see, life is a numbers game. Today’s winners often got rejected the most (but persevered). They kept taking smart shots on goal and, eventually, broke through.
Cloudflare backbone 2024: Everything very high level. 500% backbone capacity increase since 2021. Use of MPLS + SR-TE. Would be interesting to see how the operate/automate those many PoPs.
Cisco AI: “three of the top four hyperscalers deploying our Ethernet AI fabric” I assume it is Google, Microsoft and Meta? AWS is the forth and biggest.
xAI 100k GPU cluster: 100k liquid-cooled H100s on single RDMA fabric. Looks like Supermicro involved for servers and Juniper only front-end network. NVIDIA provides all ethernet switches with Spectrum-4. Very interesting. Confirmation from NVIDIA (Spectrum used = Ethernet). More details with a video.
LaVague: There are web services that dont have API so this could help me to automate the interaction with them? I need to test. Another question, i am not sure if lavague has an API itself!
S3: I had this in my to-read list for a long time… and I after reading today I was a bit surprised because it wasn’t really technical as I expected. The takeouts are: Durability reviews, lightweight formal verification and ownership.
Stratego: I have never played this game but I was surprised that is more “complex” than chess and go. And how DeepNash can bluff and do unexpected things.
AWS Reinvent Intent-Driven Network Infra: Interesting video about Intent-driven networking in AWS. This is the paper he shows in the presentation. Same note as last year, leaf-spine, pizza boxes, all home made. The development of the SIDR as the control plane for scale. And somehow the talk about UltraCluster for AI (20k+ GPU). Maybe that is related to this collaboration NVIDIA-AWS. Interesting that there is no mention to QoS, he said no oversubscription. In general, everything is high level, and done in-house, and very likely they facing problems that very few companies in the world are facing. Still would be nice to open all those techs (like Google has done – but never for network infra). As well, I think he hits the nail on the head how he defines himself from Network Engineer to Technologist, as at the end of the day, you touch all topics.
Google view after 18 years: Very nice read about the culture shift in the company, from do not evil, to make lots of many at any cost.
GTP-Crawler: Negative thing, you need the pay version of chatgpt. I wonder, If I crawke cisco, juniper and arista, what would be nearly all network knowledge in the planet? If that crawler can get ALL that date.
Vendor Support API: Interesting how Telstra uses Juniper TAC API to handle power supplies replacement. I was surprised that they are able to get the RMA and just try to replace it. If they dont need it, they send it back… That saves time to Telstra for sure. The problem I can see here is when you need to open ticket for inbound/outbound deliveries in the datacenters, that dont have any API at all. If datacenters and big courier companies had API as 1st class citizends, incredible things could happens. Still just being able to have zero-touch replacement for power supplies is a start.
No Packet Behind – AWS: I think until pass the first 30 minutes, there is nothing new that hasnt been published in other NOG meeting between 2022 and 2023. At least the mention the name of the latest fabric, Final Cat. As well, they mention issues with IPv6 deployment.
There are other interesting talks but without video so the pdf only doesnt really give me much (like the AWS live premium talk)
From an email list, I read something about Gmail migration to Spanner. I was a bit surprised because I use gmail and didnt know anything about it. That email sent me to this page. That migration had to be a monster one! More details here. From the first page, I had a bit more info about Falcon. In summary, that is part of a bigger picture about building the “AI-driven” future infrastructure.
I watched very interesting videos about AWS networking. They are high level, so they dont tell you the magic sauce you would like to know but it is nice that this info is out in the public.
DKNOG – How AWS is evolving its peering-edge in 2023 and onwards link + event:
— Evolution from buying chassis to building your own devices: consume -> create (NOC-less, auto-remediation, active telemetry, etc)-> innovate (freedom to examine trade-offs, 1U devices). Clearly use of “Clos” networks and they linux-based software.
— Delighted: low complexity + high innovation
— Simplicity Scales
— It is interesting the view of a router/brick like a set of 1U devices (rack 102.8T – 200x400G ports for customers, non-blocking). An it is very good they have pictures of the concept of “bricks” and “spines”.
— Challenges with cabling (SN connector — no patching rack needed) and 400G ZR+ (heating!)
AWS re:Invent 2022 – Dive deep on AWS networking infrastructure (NET402)– link
— summary: This is “similar” to the DKNOG but with longer and some other details like:
— “We dont like chassis”. 1+million devices
— SDR at NIC level so one TCP flow is actually load-balanced in several paths
— Hybrid SDN approach: You have controllers to give you a big picture view (I guess it provides the visibility to say “just send 70% traffic to this device” – but not sure how) and the own device device capability to deal with changes.
— Telemetry, continuous monitoring, triangulation: Be able to detect the port/device is causing the problem.
AWS re:Invent 2022 – Leaping ahead: The power of cloud network innovation (NET211-L) – link:
— AWS Global Infrastructure: Backbone capacity
— Customer SW/HW
— Everything fails all the time
— GPS locations in fibers! + inject light in fiber to double check fault -> intelligent optical routing/failover -> better than BGP….
— Termite sheet fibers for Australia 🙂
— Nitro card = NIC (offload card)
— SDR: not need in-order packet deliver as required by TCP. 25Gbps flows allowed now.