Scale Systems 2024 (videos): GenAI Training: Short but interesting video. Main failures: GPU, memory and network cables 🙂 For the Network side, I liked this screenshot. Still they are able to build two 24k GPU cluster with IB and RoCE2.
MS: GenAI for beginners.
Federer: Effortless is a myth (without hard work there is nothing), It’s only a point (resilience, present), life is bigger than the court.
Whisper WebGPU: Real-time in-browser speech recognition
Free Matrix Multiplication: This looks like a big deal.
Starlink TCP: Very quick summary, control protocols with Selective Ack perform better. The ping analysis is quite good. Being able to see that each 15s you are changing satellite, is cool.