March 2024 – T.I.L

GPU Fabrics, Optimizations, Network Acceleration, Learning Cambridge, British Library

Several posts worth reading. There are plenty of things go over my knowledge. I already posted this, it is a good refresher.

GPU Fabrics: The first of the article is the one I am more lost as it about training and the communications between the GPU depending on the take to handle the models. There are several references to improvements as the use of FP8 and different topologies. As well, a bit more clear about NVLink (as internal switch for connecting GPUs inside the same server or rack)

When it moved to the inter-server traffic, I started to understand a bit more things like “rail-optimized” (it is like having a “plane” for my old job where the leaf only connects to a spine instead of all spines, in this case each GPU connects to just one leaf. If you cluster is bigger then you need spines). I am not keen of modular chassis from operations point of view but it is mentioned as an option. Fat-tree CLOS, Dragon-Fly: reminds me to Infiniband. Like all RDMA.

And Fabric congestion it is a big topic with many different approaches: adaptive LB (IB again), several congestion control protocols and mention to Google (CSIG) and Amazon (SDR) implementations.

In general I liked the article because I dont really feel any bias (she works for Juniper) and it is very open with the solutions from different players.

LLM Inference – HW/SW Optimizations: It is interesting the explanation about LLM inferencing (doubt I can’t explain it though) and all different optimizations. The hw optimization (different custom hw solutions vs GPU) section was a bit more familiar. My summary is you dont need the same infrastructure (and cost) for doing inference and there is an interest for companies to own that as it should be better and cheaper than hosting with somebody else.

Network Acceleration for AI/ML workloads: Nice to have a summary of the different “collectives”. “collectives” refer to a set of operations involving communication among a group of processing nodes (like GPUs) to perform coordinated tasks. For example, NCCL (Nvidia Collective Communication Library) efficiently implements the collective operations designed for their GPU architecture. When a model is partitioned across a set of GPUs, NCCL manages all communication between them. Network switches can help offload some or all of the collective operations. Nvidia supports this in their InfiniBand and NVLink switches using SHARP (Scalable Hierarchical Aggregation and Reduction Protocol – proprietary). This is call “in-network computing”. For Ethernet, there are no standards yet. The Ultra Ethernet Consortium is working on it but will take years until something is seen in production. And Juniper has the programmable architecture Trio (MX routers – paper) that can do this offloading (You need to program it though – language similar to C). Still this is not a perfect solution (using a switches). The usage of collectives in inference is less common than their extensive use during the training phase of deep learning models. This is primarily because inference tasks can often be executed on a single GPU

From a different topics:

Learning at Cambridge: Spend less hours studying, dont take notes (that’s hard for me), go wild with active learning (work in exercises until you fully understand them)

British Library CyberAttack: blog and public learning lesson. I know this is happening to often for many different institutions but this one caught my eye 🙁 I think is a recurrent theme in most government institutions were upgrading is expensive (because it is not done often), tight budgets and IT experts.

“Our major software systems cannot be brought back in their pre-attack form, either because they are no longer supported by the vendor or because they will not function on the new secure infrastructure that is currently being rolled out”

However, the first detected unauthorised access to our network was identified at the Terminal Services server. Likely a compromised account.

Personally, I wonder what you can get from “stealing” in a library ???

Google Networking, AI Cooling, MATx

OpenFlow at Google – 2012: Openflow to manage to network, to simulate your network. 2 backbones: first for customer traffic and second for inter-DC traffic

UKNOF32 – Google Datacenter networking 2015: Evolution until Jupiter. Moving from chassis based solutions to pizza boxes. Smaller blast radius than a chassis. This switches have small buffers but Google uses ECN (QoS) for dealing with it.

Google DC Network via Optical Circuit 2022: (other video paper google post) Adding optical circuit switches, no more Clos network !!! Full mesh connection of aggregation blocks. Spines are expensive and bottlenecks. Traffic flows are predictable at large scale. Not building for worse scenario. Drawback: complex topology and routing control! Shortest path routing is insufficient. TE: variable hedging allows operation on different points along the continuum to tradeoff optimality under correct prediction vs robustness under misprediction -> no more spikes. Hitless topology reconfig. It seems it has been running already for 5y…. To be honest, It goes a bit… beyond my knowledge.

Google TPUv4 + Optical reconfigurable AI Network 2023: Based on the above but for AI at scale. Although there is already TPUv5. From this page, the pictures help to get a view of the connectivity. Still complex though.

Open Computer Project 2023: AI Datacenter – Mainly about how to cool down the AI infra with some much requirement of GPU/power.

MATx: A new company to design hw for AI models

AI will save the world, Nutanix kernel upgrade, GPU Programming

AI will save the world: Positive view of the AI development. Interesting the attack to China/Karl Marx at the end. In general I feel confident this will be good.

Nutanix kernel upgrade story: This is a bit hardcore for me (and looks a bit old from 2021) but still quite interesting how they did the troubleshooting.

GPU programming: I have never read about how to code for a GPU and this looks interesting and quite different from what I would do in a CPU. From the “Execution Model of the GPU” I started to lose track. Still is nice to see a summary at the end and resources/books.

LaVague, S3, Stratego

LaVague: There are web services that dont have API so this could help me to automate the interaction with them? I need to test. Another question, i am not sure if lavague has an API itself!

S3: I had this in my to-read list for a long time… and I after reading today I was a bit surprised because it wasn’t really technical as I expected. The takeouts are: Durability reviews, lightweight formal verification and ownership.

Stratego: I have never played this game but I was surprised that is more “complex” than chess and go. And how DeepNash can bluff and do unexpected things.

Banana Bread v2

I have already a banana bread recipe that I like quite a lot but decided to try a new one. video. recipe. I adapted it to my kitchen because sometimes I think some vegan recipes use ingredients that too difficult and expensive to find… I didnt do it vegan, just used normal low-fat cow milk.

Ingredients:

3 ripe bananas, mashed
125ml coconut oil, melted (pretty sure you can use any non-flavour oil)
125ml milk
6 tbs dried coconut
6 tbs dried cranberries
6 tbs almond flakes
Half handful of pumpkin and sunflower seeds
240g white flour
1 tsp baking powder
125g coconut sugar (or any brown sugar)
1 tsp ground cinnamon
1 tsp ground ginger
1 pinch of fresh grated nutmeg
1/4 tsp ground cardamon
40g dark chocolate: chopped as chips

Topping
1 Banana, sliced length-ways
4 tbs maple syrup

Process:

Pre heat your at 180C and line a loaf tin with greaseproof paper.
Mix your mashed banana with milk, cranberries, seeds, coconut and melted coconut oil.
In a separate bowl, mix together the dry ingredients: flour, baking powder, coconut sugar, spices and chocolate chips.
Mix the wet and dry ingredients together until well incorporated. Do not over mix.
Pour the batter into the tin. Then lay the sliced banana on top and drizzle over a little maple.
Bake for 45 minutes aprox.
Check it after 35 minutes, if you can see it getting too caramelised on top, cover it over with foil and leave to cook for the remaining time.
Once golden on top leave it to cool down.

Not bad result! I am not sure the coconut oil or the coconut sugar gives any flavour although it smelled a bit of coconut while baking. Go soft with the cardamon, it can overpower the rest of flavours!

Love Languages, imposter syndrome, self-compasion, GTC-2024, Juniper Express 5

Love Languages: I read this book in 2018. The conclusion I took at that time (and a bit late…) it is that you have to F*! communicate…

Interesting story about imposter syndrome:

We’d like to believe that if we only had the adulation, market success, and fan support of superstars like these, then we’d finally be comfortable and able to do our best.

In fact, it seems the opposite is true. Imposter syndrome shows up because we are imposters, imposters acting ‘as if’ in search of making something better.

Perhaps the best plan is to show up and not walk out.

Self-compassion: Something I have learnt the hard way, and I think at the beginning works but long term doesn’t. I practice it often while climbing and honestly, I feel the difference, and sometimes is mindblowing. Nobody is going to cheer me up so I better off doing it myself.

GTC-2024: Like last year, I registered to watch some conferences. As a network engineer, I haven’t been able to see any (good) recording, just pdfs…. so quite disappointing. This is a summary from somebody that was on site and says it was great. And some other notes that they look interesting: keynote (nvlink and infiniband at 800G), nvdia dgx gb200 (indeed we need nuclear energy to feed all this…)

Juniper Express 5: Looks quite an interesting ASIC. But as far as I can see most ASICs for DC and AI/ML come from Broadcom and the main players are Cisco/Arista. I like the feature of deep buffers.. this is still a bit of a religious dilema… deep vs shallow buffers. And looks like it was announced in HotChips 2022.. so it is not very new? And only in PTX platform. What is the future of QFX?

Croissants

I forgot that I didnt have a recipe for my croissant here.

These are my best croissants eve! Important to use strong white flour (14% protein)

Ingredients:

500g strong white flour (14% protein)
12g fine salt
55g sugar
40g soft butter
15g instant dried yeast
140ml cold water
140ml milk
280g unsalted cold butter block
1 egg for glazing

Part-1

Place flour, salt and sugar in a bowl and combine.
Add the yeast to the flour and mix
Add the butter and crumb through with your fingers.
In another bowl, add water and milk.
Make a well in the center of the flour and pour in the liquid.
Mix together until all of the dry ingredients are incorporated. Dont overwork it
Push down the dough so it is spread out in the bottom of the bowl.
Cover and place in the fridge for 8-12h
Using baker paper, make a 20x15cm aprox “envelope”. The block of butter should be inside. With the rolling ping, flat the butter out as it fills the envelope.

Part-2

Take the dough out of the bridge and place it onto a lightly floured surface.
Roll it out to form a square that fits your butter envelope.
Then roll each side out so you have 4 thin sides and slightly raised square center.
Place the butter in the center, then start by pulling the top side of the dough across to totally encase the butter. Next pull the bottom flap up until the top edge. Then bring the two sides flaps across.
Tap the dough gently with the rolling pin to seal the dough laps.
With the seam running top to bottom, roll the dough out into a rectangle, roughly 1/2 pin wide and 1 +1/4 pin long.
Brush off any flour and fold the dough in thirds. This is a half fold.
With a sharp knife, cut the seams so you have only layer of dough.
Wrap the dough in film and put in the fridge for 1h to rest.
Make two more half folds. So take the dough out of the bridge, with seam running top to bottom, roll the dough in a rectangle and fold in thirds.
After the last fold, put back in the bridge for 12h.

Part-3

Take the dough out of the bridge and place onto a lightly floured surface, seam running top to bottom. Roll into a rectangle roughly 1/2 pin wide and 1 +1/4 pin long. The dough should be 4mm thick aprox.
With a sharp knife, trim the sides to be sure you have a perfect rectangle.
With a sharp knife, make triangles. The base of each triangle should be roughly 4 fingers wide. Cut a small slit about 1/4″ long at the base of each triangle.

Gently stretch your triangle out, then roll up your croissant making sure the tails on the bottom of the rolled up croissant.
Place onto a tray lined with baking paper leaving enough space to prove.
Put the tray into the oven, just lights on. They should double in size.
Once they are ready, whisk up the egg and brush over each croissant
Preheat the oven at 210C.
Place the tray in the oven and lightly spritz the oven chamber with a water spray. That makes them super crunchy and give a nice color.
Bake until golden dark. You may need to turn the tray. It should be around 20 minutes at least.

This is the result.

The Three Body Problem

This book was recommended by an ex-college and really enjoyed it. I didnt have any background of the book and it was interesting that there is actually a three body problem. I struggled a bit about the science at the end of the book but I was hooked from nearly the beginning.

In general, I am surprised that this book wasn’t censured in China but still I am quite happy to read something coming from China. Taking into account that I meanly read books from Western countries (mainly USA and UK/EU) that is a small portion of the world.

Bakery – P2

Garlic Chili Noodles

I tried this recipe this weekend.

Ingredients:
150g extra firm tofu (more is ok)
5-6 pieces garlic
small piece ginger
3 sticks green onion
olive oil for frying
3 tsp dark soy sauce
150g noodles (of your choice)
1 tbsp paprika (didnt have gochugaru)
1 tbsp plant-based oyster sauce (didnt use)

Process:

Mash the tofu into a crumble with a fork
Finely chop the garlic, ginger, and green onions
Heat up a nonstick pan to medium heat with olive oil. Add the crumbled tofu. Sauté for 3-4min
Add 2 tsp dark soy sauce and sauté for another couple of minutes. Set the tofu aside
Bring a pot of water to boil for the noodles
Place the nonstick pan back on medium low heat. Add 2 tbsp olive oil and the garlic and ginger. Cook for about 2-3min
When the water comes to a boil, cook the noodles for 1 min less to package instructions. Stir occasionally to keep them from sticking
Add the paprika to the garlic and ginger. Cook for about 1min. Then, add the oyster sauce.
Strain out the noodles and add to the pan. Add 1 tsp dark soy sauce and the crumbled tofu. Turn the heat up to medium and sauté for 2-3min. Add the green onions and sauté for another minute

This is my result:

As usual, doesn’t look like the video, but it is tasty!