flipaoXIX – T.I.L

Very Bad People

Very good book about the origin of Global Witness.

It is really humbling how three people started all this on their own. We capable of horrible things but as well, amazing things. So maybe your effort is just a drop of water but it can add up.

I had no idea about the deforestation mafia of Red Kehmr in Cambodia and how that financed the war. And worst, how both sides were profiting from it. As always, the population is the one that pays the consequences.

That follows the diamonds from Angola, the petrol from Guinea, the wars of Siorra Leone and Liberia, etc.

I think of several countries as corrupted, but as the book shows, it is the first work with the financial engineering and PR/Law firms that allow all the mone to be moved away from those countries

If we dont allow those people to buy property, ferraris, etc it would be less convenient…. and make all the companies accountable.

At the end of the day, while fiscal paradise exists, this will continue, but at least there are people like GW that will pull the carpet and show the ugliness…

The Courage To Be Disliked

I have been quite surprised by this book. It is based on the psychology work from Alfred Adler. And i take it more as philosophy than anything else.

It is based on teleology: the study of the purpose of a given phenomenon, instead of its cause (that is aetiology). We determine our own lives according to the meaning we give to those past experiences. So it negates the influence of the past and traumas. The important thing is not what one is born with, but what use one makes of that equipment. We need the courage to be happy, because that needs change (the lifestyle), and it is scary. As well, this makes you to focus in the present.

All problems are interpersonal relationship problems. Personally, I feel my goal is not to be hurt in relationships with other people. But it is impossible not to get hurt (or hurt somebody).

Feelings of inferiority are subjective assumptions (and excuses), and those we can change it. Boasting is an inverted feeling of inferiority. If one really has confidence, one doesnt need to boast.

Life is not a competition (winner vs looser) and this applies to Relationships too. This make you see people as comrades. Only compare with your ideal self. You are the only one worrying about your appearance. When there is competition, there is a power struggle. Avoid the conflict as soon as possible, dont answer the action with a reaction (this is not admitting defeat though), because this evolves to a revenge.

Our objectives are: self-reliant (I have the ability) and live in harmony with society (people are my comrades). Our life tasks are: tasks of work, frienship and love

Life lie = I am making up flaws in other people just so that I can avoid my life tasks, and more, I can avoid interpersonal relationships -> courage.

As we have our tasks, the other have their tasks. As we focus in our task, looking for recognition is debilitating, it creates a dependency, a vertical relationship (you want horizontal relationship). Do not live to satisfy the expectations of others. This means freedom. Freedom to be disliked by other people. The same way, you dont have to praise or debuke. Saying “Thank you” is good enough

This leads to the goal of interpersonal relationships, that is the feeling of community. This is acquired by your own efforts, active commitment.

Friction, Morning Routine, RoCe Meta Paper, AWS RNG, MRC, Slurm, Rail-Optimize, 800VDC, Phyllo, Approach

Manson AI: You need friction

AI Agent: narrow focus: goal, proof, steps

Bear Grylls’ Morning Routine: Cold (never get used to), bared foot, strength training, 30 minutes

RoCE networks for distributed AI training at scale: I have managed to read the paper ! Although in the AI word, two years is an eternity, I think it is still interesting.

1) Network Topology: backend network only for GPUs (RDMA nics), non-blocking. Frontend network: data ingestion, checkpointing, logging.

Pod = AI zone
 leaf = RTSW, DAC cables, shallow buffer
 spine = CTSW, deep buffers. fiber between leaf-spine.
SuperSpine = ATSW, oversubscribed, connect AI zones

intra-node -> nvlink
ROCE: cpu offloading, ethernet (standard)

collective communication library serves as the sw abstraction between training workloads and the NIC
                                 schedules verbs calls over QP (Queue Pairs)
 parallelism strategy determines collective: allreduce, allgather, alltoall
 choice logical topology:

------------------

2) Routing: work load. low entropy flows (few flows) -> ECMP bad (5-tuple udp: src/dst ip, src/dst port, protocol), burstiness, elephant flows
--

 RTSW uplinks 1:2 under-subscribed! -> expensive (short-term)
 1) QP scaling: use destination QP of Roce packet using the UDF capability in switch to increase entropy -> Enhanced ECMP -> short-term
 2) Central TE controller -> long-term: CP real-time topology end-to-end cluster, 
                                        flow matrix (flow bps) + CSPF (constrained SPF)
                                        write in switches dataplane
                                     DP: TE overrides default BGP routing policy in leaf. Use Exact Match table.
                           Not good with multiple link failures. Doesnt scale 
 3) Flowlet switching: try to improve 1 and 2. hw assistant schema. put packets in different ports in ECMP
     out-of-order: move packets only after 1/2 RTT
     load-aware path assignment: better than TE

------------------
                     
3) Transport: congestion management. Start with DCQCN. packet drops on ACK/NACK can cause prolonged Local ACK timeout (LAT)
--
 Tuning DCQCN not great (strict ECN -> minimize PFC (can lead to head-of-line blocking)

 200G, we stayed with relaxed ECN marking, allowing for buffer build up in the CTSW, while keeping default DCQCN settings.
 400G We proceeded without DCQCN. just PFC for flow control
 re-design collective library: two-stage copy

------------------

4) Operations:
 Change QoS priority of Clear to Send (CTS) messages. In RTSW ASIC, modify dsCP marking for ACK  messages
 Tuning VOQ in CTSW
 obeservability: OOS: out of seq.
                 Link flaps
                 Local ACK timeouts (LAT)
                 PFC watchdog: catch any long-duration PFC pause (>200ms)
                 buffer utilization RTSW
                 reachibility (pings)
                 constant latency monitoring loaded and unloaded (catch regressions)
                 base lines!!!

Perplexity: Hosting Qwen on Blackwel:

AWS RNG – Random Graph Network: The paper is totally out of my space, but the concept looks brutal. With an operations hat, how you troubleshot it? (ping, traceroute, link congestion, data flows patterns, etc)

MRC1 and MRC2 (OCI): Why we need planes (breakouts) and not just a big plane.

As SerDes speeds continue increasing, every microsecond of congestion creates much larger pressure inside the fabric. A 100G transport domain may be manageable. A 400G domain amplifies the same congestion into roughly 4x pressure. An 800G domain, and eventually a 1.6T domain, becomes much harder to coordinate.

This pressure appears as larger switch buffer requirements, larger congestion domains, harder retransmission coordination, larger cache pressure, larger synchronization storms, and harder thermal and power scaling inside ASICs.

At hyperscale, switch ASIC cache and transport coordination become fundamental scaling bottlenecks. Increasing switch buffer size is extremely difficult: high-speed SRAM is expensive, larger cache arrays consume significant power, thermal density rises quickly, die area scaling becomes inefficient, and routing complexity increases dramatically.

Splitting transport into many smaller lanes naturally reduces these pressures. Reliability improvements then emerge as a byproduct, because congestion, retransmission, and buffering become more distributed.

THE QUESTION: which breakout keeps the fabric at the shallowest practical Clos depth while keeping plane count and operations manageable? -> less hops, less switches, less latency

Slurm: I like the “Slurm vs. Kubernetes”

Slurm Workload Manager (short for Simple Linux Utility for Resource Management) has become a cornerstone of large-scale computing. Originally created in the early 2000s to support large-scale high-performance computing (HPC) environments, Slurm is now widely recognized as the de facto scheduler for HPC clusters. Today, it orchestrates jobs across thousands of servers and GPUs in some of the world’s most advanced computing environments.

Interview Question: 512 GPU, non-blocking (full bisection) and 2xUFM! I really liked this. I think for once I understand the rail-optimize (fat-tree = leaf-spine). Just break one leaf-spine link, beautiful!!!

800VDC: Next step in electrical infra in DC space.

Phyllo by hand

Approach woman: curiosity and no performance. Practice. Be at peace with uncomfortable and akwardness. Rejection as learning

Genghis Khan

Very interesting book. In Western Work we know a lot about Roman Empire, Alexander the Great, etc. But we dont look very often to Asia. And Gengish Khan and the following Mongol empires shaped much of the world society on the time and until know.

He was focused in meritocracy. As part of his war strategy, it was the elimination of the aristocracy of the conquered land. Very strong focus in integration. They never imposed their culture, they had full freedom for religious belief. They were brutal in war but never cruel. Torture was common in Europe and other empires, for them, it was against their belief.

They had a very clear war strategy: light travel, fast striking. They had few luggage and basic diet. They cleared the path for their horses for advancing and returning. So they destroyed and agriculture over their conquering paths.

It is interesting how Genghis Khan crumbled after his death because he didnt manage this family properly. But still, the new kingdoms kept a balance for a long time.

And something that reminded me to Rome, they had to keep expanding the empire just to keep happy the capital…. They introduced the paper money and women ruled when the men were fighting… and their campaigns lasted years!

Trade was critical for Mongols. They reached Hungary and the Balkans. They trade slaves with Venice and Genoa.

The climate was critical for their success, when the weather became warmer, their pastures were less productive in Mongolia, they had less horses, so the base of their strength was tilted.

They were master of propaganda, to spread fear so they conquering was easier. The empire was based on good army, good propaganda and good administration (just think of the sear size of the empire). They founded public education.

Mongols unified China. I didnt know that, they founded Beijing and started Forbidden City. They created the Chinese identity but they followed the mongol customes behind the curtines.

And the end of it was the Plague. The Plague stopped commerce and people. Without the fluid transit of people and goods, they couldnt keep it together.

And it is really shocking the bad reputation that has been written about Mongols after their incredible empire and success.

Portokalopita

This is a cake I wanted to try after I visited Greece with my friends. I never had the name of the cake, but after that holidays I did dome research and I think I found the name, portokalopita. I’ve got a receipe, but then I did nothing.

So finally, I tried. The taste is similar but the execution is not great. Mine is too runny.

Ingredients

Syrup

1 1/2 cup orange juice (just squeeze it from oranges…. I didnt do it)
1/2 cup water
1 1/2 cup sugar
1 cinnamon stick

Cake

180g phyllo sheets (I think I need double)
4 eggs
1/2 cup sugar
1 cup olive oil
2 tsp vanilla extract
200g yogurt
2 tsp baking powder
orange zest from 2 medium oragnes.

Instructions

Preheat oven at 120C.
Place the phillo sheets on a tray. Cut them in slices so it easy to fit. Get them in the oven, until hard and crunchy. Turn them over when needed. Remove from oven and let is cool down
Make the syrup. In a small sauce pan mix the orange juice, water, sugar and cinnamon. Bring to boil, then reduce heat and simmer for 7 minutes. Set aside to cool
Set oven at 180C
In a large bowl add the eggs, sugar, oil and vanilla. Beat until frothy.
In a smaller bowl, mix the yogurt and baking powderand set aside for 2-3 minutes. Add the yogurt to the egg mix
Add the orange zest and crumbled phyllo into the egg mix gradually.
Grease an oven dish, and pour the cake mix. Even it out
Bake for 35-40 minutes or until the top is dark golden.
Remove from oven and make a few slashes with a knife and immediately drizzle the syrup slowly.
Let the cake sit for 2-3 hours.
Keep in the fridge for 1h before serving.

The result:

I think the syrup is too much and I my cake mix needed more phyllo

I will try this next time

Cowboy commandments

This is a tiny book I found in a toilet during holidays a couple of years ago. I bought a while ago and can’t find it anymore, this is one from the same author.

Fall in love (only when you can’t help it)

Dont forget that there are always consequences

When you get bucked off, get back on

Skin you own deer

If it breaks, fix it.

Never cut, what you can untie

If you make a mess, clean it up

Talk less and say more

Never betray a Trust

Make apologies, not excuses

Don’t get even, get over it

Dont waste good money on cheap boots

Help what you can; endure what you can’t

Do it Today, tomorrow is promised to no one.

Never miss a chance to Dance

Act right, behave yourself, do your job, and things will turn out all right

Take care of your knees; you are going to need them all your life

Do your best, that’s all.

AWS SRD, MRC, OSPF, Manson, JEPA, Ultra Ethernet, NCCL, Calypso, GTC2026, ChiNOG12, TRAP, SWAP

AWS owns its networking stacks:

NVIDIA releases MRC: Multipath Reliable Connection – I assume they need to do something to compete with UltraEthernet

OSPF shutdown router: I would have test this. In my opinion, the key thing is although the router LSA1 is in the neighbors LSDB, SPF is ignoring it. Still quite interesting, you always learn/re-learn something.

Mark Manson – 10y therapy: I like the very beginning and then when Chris says you have to through shit to understand those rules and appreciate them.

JEPA (Joint Embedding Predictive Architecture): LeCun against LLM. Proof.

Microsoft Ultra Ethernet: The first is a bit more interesting as you can see the flows

NCCL cheat codes

Tech Field Day 2025 – Arista AI

GTC2026 CoreRabbit.

GTC2026 Summary

ChiNOG12 – Petr Lapukhow

AWS SRD: Why AWS doesnt use infiniband

Calypso Submarine Cable: Interesting to visualize your global network infrastructure

TRAP: How to remember/learn

Test: desirable difficulties. Testing helps to retain.
Retain: reviewing timing (RemNote)
Associate: with something you already know
Perform: Use it, build.

Linux SWAP: Because I have many open tabs, sometimes I kill my laptop, it is a bit old but I have 8GB RAM.

When Chrome spikes memory, the kernel may struggle to reclaim fast enough, leading to:

Many processes waking → “procs” spike
Heavy disk I/O → swap/page reclaim
System stalls → direct reclaim + possible OOM pressure

As I use ZFS, the recomendation is not create extra SWAP in there. So create just 1G from my main volume:

Create logical volume:

sudo lvcreate -L 1G -n swap athens-vg

Format + enable:

sudo mkswap /dev/athens-vg/swap
sudo swapon /dev/athens-vg/swap

Persist:

echo '/dev/athens-vg/swap none swap sw 0 0' | sudo tee -a /etc/fstab

result:


# swapon --show
NAME       TYPE      SIZE USED PRIO
/dev/dm-2  partition 976M   0B   -1
/dev/zram0 partition 3.8G 2.7G  100
# 
# lvs
  LV      VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home    athens-vg -wi-ao----  22.00g                                                    
  root    athens-vg -wi-ao---- <27.94g                                                    
  storage athens-vg -wi-ao---- 186.00g                                                    
  swap_1  athens-vg -wi-ao---- 976.00m                                                    
#

I installed too earlyoom (this avoids full system lockups by killing memory hogs earlier) and zram-tools (use compressed RAM as swap)

root@athens:/boot# dpkg -l | grep zram
ii zram-tools 0.3.7-1 all utilities for working with zram
root@athens:/boot# dpkg -l | grep earlyoom
ii earlyoom 1.9.0-1 amd64 Early OOM Daemon
# systemctl status earlyoom
● earlyoom.service - Early OOM Daemon
     Loaded: loaded (/usr/lib/systemd/system/earlyoom.service; enabled; preset: enabled)
     Active: active (running) since Thu 2026-04-30 08:19:50 BST; 1 week 3 days ago
 Invocation: e3dc68822a604ac6befe12fa2f44b650
       Docs: man:earlyoom(1)
             https://github.com/rfjakob/earlyoom
   Main PID: 1176 (earlyoom)
      Tasks: 1 (limit: 10)
     Memory: 628K (max: 50M, available: 49.3M, peak: 3M)
        CPU: 10.274s
     CGroup: /system.slice/earlyoom.service
             └─1176 /usr/bin/earlyoom -r 3600

May 08 09:06:59 athens earlyoom[1176]: mem avail:  4112 of  6185 MiB (66.48%), swap free: 3678 of 4899 MiB (75.09>
May 08 10:06:59 athens earlyoom[1176]: mem avail:  3741 of  6029 MiB (62.06%), swap free: 3996 of 4899 MiB (81.57>
May 08 15:21:37 athens earlyoom[1176]: mem avail:  3821 of  6005 MiB (63.63%), swap free: 3546 of 4899 MiB (72.39>
May 08 22:52:18 athens earlyoom[1176]: mem avail:  3909 of  5998 MiB (65.17%), swap free: 3719 of 4899 MiB (75.92>
May 09 09:46:15 athens earlyoom[1176]: mem avail:  3855 of  6055 MiB (63.66%), swap free: 3994 of 4899 MiB (81.53>
May 09 17:39:36 athens earlyoom[1176]: mem avail:  3211 of  5856 MiB (54.83%), swap free: 4054 of 4899 MiB (82.75>
May 09 18:39:37 athens earlyoom[1176]: mem avail:  2400 of  5291 MiB (45.37%), swap free: 3890 of 4899 MiB (79.41>
May 09 21:15:33 athens earlyoom[1176]: mem avail:  1925 of  4791 MiB (40.18%), swap free: 4398 of 4899 MiB (89.79>
May 09 22:15:34 athens earlyoom[1176]: mem avail:  2899 of  6040 MiB (48.00%), swap free: 4074 of 4899 MiB (83.17>
May 10 09:00:46 athens earlyoom[1176]: mem avail:  2563 of  5991 MiB (42.79%), swap free: 4315 of 4899 MiB (88.08>
# 
# sudo systemctl status zramswap
● zramswap.service - Linux zramswap setup
     Loaded: loaded (/usr/lib/systemd/system/zramswap.service; enabled; preset: enabled)
     Active: active (exited) since Thu 2026-04-30 08:19:50 BST; 1 week 3 days ago
 Invocation: 9f4e1a2534c8409292782da4512fbae9
       Docs: man:zramswap(8)
   Main PID: 1198 (code=exited, status=0/SUCCESS)
   Mem peak: 3.8M
        CPU: 58ms

Apr 30 08:19:50 athens systemd[1]: Starting zramswap.service - Linux zramswap setup...
Apr 30 08:19:50 athens zramswap[1248]: Setting up swapspace version 1, size = 3.8 GiB (4113920000 bytes)
Apr 30 08:19:50 athens zramswap[1248]: no label, UUID=0728b3a9-007b-4d71-8255-009f509bca63
Apr 30 08:19:50 athens systemd[1]: Finished zramswap.service - Linux zramswap setup.
# 
# zramctl
NAME       ALGORITHM DISKSIZE   DATA  COMPR  TOTAL STREAMS MOUNTPOINT
/dev/zram0 lz4           3.8G 886.9M 264.6M 449.4M         [SWAP]
#

Scale-up vs Scale-out: Still keep forgetting the diff

The Man Who Solved The Market

Interesting book about the “start” of quant trading by Jim Simons. Funny he was a strong smoker and quick sharp and active till the end, great Mathematician and was code breaker! I didnt know anything about Renaissance. In part, it reminds me the book from Edward O. Thorp. It was weird that with so much tech and algorithms developed, in key moments, he didnt trust them. Reminds me to Nassim Taleb and the dark swans. Still he was never crashed and always made money. I always feel uneasy with this subject. Is it moral? The thing that surprise the most was the connections with Donald Trump by members of his company and Brexit election. But he supports and finance Democrats.

Nepal, Hobby, Alibaba, Sewing, HRT, Potato Salad, SCION, Jets, Quantum, Octopus, Peel, Virginia, Fairwater, Microfluidics, Energy Storage, Infinibad HPC, Lose a nuke, ECH, NCCL for Network Engineers, RoCE Lossless, Nanog93 Networking AI

Ridgeline VII Nepal: Totally crazy, the views, the descent, the speed…

In the era of AI, take a hobby

Tofu + greens curry

Alibaba Crypto AI Agent

How a sewing machine works

HRT GTC2026 Modern Resource Responsible AI Factory: Interesting all the way to save watts. The emphasis in copper, water cooling.

German Potato Salad

SCION Routing: I didnt know that it was use in production

Jet Engines: crazy, amazing

Quantum companies I didnt know about:

qant.com (Germany)

ionq (USA)

Octopus can rewire RNA: Incredible animal.

Eat the peel: After reading this, I started eating the kiwi’s peel. Not going back. I would like to do something with banana peels and oranges. But dont want to use a tone of sugar neither. I will try my spicy banana bread next time with peel. The orange peels you can keep it as aromatic when dried.

Virginia Air Space Museum: I was there 3 years ago I think. Amazing. You have a blackbird SR71, Concorde, Space Shuttle, etc. Totally worth visiting.

Fairwater: This is already old news in the AI datacenter world. But still interesting at high level.

Microfluidics: “Tiny channels are etched directly on the back of the silicon chip, creating grooves that allow cooling liquid to flow directly onto the chip and more efficiently remove heat”.

Liquid air energy storage:

The process works in three stages. First, air is taken in from the surroundings and cleaned. Second, the air is repeatedly compressed until it is at very high pressure. Third, the air is cooled until it becomes liquid, using a multi-stream heat exchanger: a device that includes multiple channels and tubes carrying substances at different temperatures, allowing heat to be transferred between them in a controlled way.

“The energy that we’re pulling from the grid is powering this charging process,” says Cetegen.

When the grid needs extra energy, the liquid air is put to work. It is pumped out of storage and evaporated, becoming a gas again. It is then used to drive turbines, generating electricity for the grid. Afterwards, the air is released back into the atmosphere.

Infinibad HPC: This is a good intro for infiniband, it helped me to refresh the training I did two years ago

Designing HPC Cluster Infinibad: It seems more practicas as you have the different type of deployments based on required nodes. Avoid credit loops.

Lose a nuke: I need to visit one day Palomares

Encrypted Client Hello: ECH: I am quite naive, I would think having most browsers compatible, full support would be easy.

NCCL for Network Engineers: Good explanation. I would like to get more real/practical experience on these things.

RoCE Lossless: Good explanation of ECN and PFC with Arista example.

Nanog93 Networking AI: Interesting overview. I still missing the low level details.

Supercommunicators

Very good ebook.

Three types of conversation

Do you want to be helped? – What’s this really about? Decision Making mindset. Lean into data and reasoning

What does everyone want? How will we make choices together?

How to figure out what this is really about? First, recognize that this is a negotiation. Next determine what does everyone want? Then how will we make choices together?

Do you want to be hugged? – How do we feel? Emotional mindset. Lean into stories and compassion

Ask questions -> creates vulnerability -> triggers emotional contagion -> elicits connection -> prompt more questions ….

In a conflict, we learn why are fighting by discussing emotions.

In a conflict, we draw out emotions by proving we are listening.

In a conflict, we prove we are listening by looping for understanding: ask questions, summarize what you heard, ask if you got it right.

In a conflict, focus in controlling:

Yourself
Your environment
The conflict boundaries

Check mood and energy!

Do you want to be heard? – Who are we? Social mindset

We all posses social identities that shape how we speak and hear.

how to talk about who we are:

Draw out multiple identities
Put everyone on equal footing
Create a new group by building on existing identities

Be aware of this loop:

Telling someone they belong to a group they abhor -> triggers identity threat -> causes defensiveness -> prompts counter-attacks -> leads to telling someone they belong to a group they abhor -> loop

Before discussion:

What do you hope to accomplish?
How will this conversation start?
What obstacles might emerge?
When those obstacles appear, what’s the plan?
What are the benefits of this dialogue?

Rules:

Pay attention to what kind of conversation is occurring (above)
Share your goals, and ask what others are seeking: prepare for the conversation. Ask many questions!
Ask about others’ feelings, and share your own
Explore if identities are important to this discussion.

The examples of “The Big Bang Theory” (how do you hear emotions no one says aloud?,) the court case, guns ownership, anti-vax, COVID, football team, netflix (no-rules), etc are really good.