SemiAnalysis – 100k cluster

This is site that a friend shared with me some months ago. And it is PURE gold from my point of view. They share a lot info free but not all, you have to subscribe/pay for the whole report. I would pay for it if my job were in that “business”

This is the link for a 100k GPU cluster.

It covers all details for building such infrastructure up to the network/hardware side. So from power distribution, cooling, racking, network design, etc. All is there.

It is something to read slowly to try to digest all the info.

This report for electrical systems (p1) shows the power facilities can be as big as the datacenter itself! So it is not rear to read hyperscalers want nuclear reactors.

MS GB200 rack, Malaysia DC boom, Oracle DCs, FuriosaAI, OCP concrete, IBM Mainframe Telum II, NotebookLM youtube summary, EdgeShark, RSA Quantum, OCP24 Meta

It seems Malaysia is getting a DC boom, but it based on coal???

This is a MS NVIDIA GB200 based rack. I am quite impressed with the cooling systems being twice as big as the compute rack! And yes, MS is sticking with IB for AI networking.

I didnt know that Oracle OCI was getting that big in the DC/AI business. And they were related to xAI. Their biggest DC is 800 megwatts… and a new one will have three nuclear reactors??

FuriosaAI: A new AI accelerator in the market. Good: cheap, less power. Bad: memory size.

OCP concrete: Interesting how far can go the OCP consortium.

IBM Mainframe Telum II: You think the mainframes business doesnt exist. Well, it is not. Honestly, at some point, I would like to fully understand the differences between a “standard” CPU and a mainframe CPU.

NotoebookLM: It seems it is possible to make summary of youtube videos! (and free)

EdgeShark: wireshark for containers. This has to be good for troubleshooting

22-bit RSA broken with Quantum computer: I think Quantum computing is the underdog in the current “all-is-AI” world. Schneier says we are ok.

OCP24 Meta AI: It is interesting comparing the Catalina rack with the one from MS above. The MS has the power rack next to it but FB doesnt show it, just mention Orv4 supports 140kW and it is liquid cooled. I assume that will be next to Catalina like MS design. And AMD GPU are getting into the mix with NVIDIA. It mentions Disaggregated Scheduled Fabric (DSF), with more details here. And here from STH more pictures.