This week a colleague pass this link about running kubernetes cluster running on Cilium. The interesting point is the high throughput is achieved by BIG TCP and IPv6!
The summary (copied) is:
TCP segments in the OS are up to 65K, NIC hardware does the segmentation – we do this now, but the 65K is a limitation of IPv4 addressing. BIG TCP uses IPv6 and allows much large TCP segments within OS currently 512K but theoretically higher. End result – better perf (>20% higher in this video) and latency (2.2x faster through the OS).
Then I saw this other video from John Ousterhout. It is similar topic as the Kubernetes video above as K8S is used mainly in datacenters.
– data throughput: full link speed for large messages
– low tail latency: <10us for short messages? (DC)
– message throughput: 100M short messages per second? (DC)
TCP issues in DC:
1- stream oriented (no load balancing) -> message based
2- connection oriented (can break infiniband!, expensive,)-> connectionless
3- fair scheduling (bw sharing) -> run to completion (SRPT)
4- sender-driven congestion control (based on buffer occupancy) -> receiver- driven congestion control
5- in-order delivery -> no ordering requirements
As well, it is important the move to NIC (as there is already a lot of NIC offloading).
His proposal for HOMA looks very nice but I like how he explains how dificult is going to be successful. Still worth trying.