Wafer-on-Wafer

After reading about wafer scale, I found this other new type of processor:

The Bow IPU is the first processor in the world to use Wafer-on-Wafer (WoW) 3D stacking technology, taking the proven benefits of the IPU to the next level.

An IPU is an “Intelligence Processing Unit”. With the boom of AI/ML, we have a new word.

There is a paper about it, from 2019, but still interesting. I haven’t been able to read fully and understand it but the section 1.2 about the differences between CPU, GPU and IPU helps to see (very high level) how each one works.

CPU: Tends to offer complex cores in relatively small counts. Excel at single-thread performance and control-dominated code, possibly at the expense of energy efficiency and aggregate arithmetic throughput per area of silicon.

GPU: Features smaller cores in a significantly higher count per
device (e.g., thousands). GPU cores are architecturally simpler. Excel at regular, dense, numerical, data-flow-dominated workloads that naturally lead to coalesced accesses and a coherent control flow.

IPU: Provides large core counts (1,216 per processor) and offer cores complex enough to be capable of executing completely distinct programs. IPU only offers small, distributed memories that are local and tightly coupled to each core. And are implemented as SRAM.

From a pure networking view, their pod solution uses 100G RoCEv2. So no infiniband.

In general, all this goes beyond my knowledge but still interesting the advances in processor technology with “wafer” likes design. It seems everything was focused in CPU (adding more cores, etc) and GPU. But there are new options, although the applications looks very niche.

Chips wars

I was reading this news recently about Huaewi capability of building 14nm chips.

Today, the EDA (Electronic Design Automation) market is largely controlled by three companies: California-based Synopsys and Cadence, as well as Germany's Siemens. 

I wonder, are the export controls good long term? Maybe the solution is worse than the illness…

And based on that, I learned that ASML is the biggest valued company in Europe!

As well, there is a book about “chip wars” that I want to read at some point.

CXL

In one meeting somebody mentioned CXL, and I had to look it up.

Interesting:

Eventually CXL it is expected to be an all-encompassing cache-coherent interface for connecting any number of CPUs, memory, process accelerators (notably FPGAs and GPUs), and other peripherals.

Wafer-Scale

Somehow I came across this company that provides some crazy numbers in just one rack. Then again nearly by coincidence I show this news from an email that mentioned “cerebras” and wafer-scale, a term that I have never heard about it. So found some info in wikipedia and all looks like amazing. As well, I have learned about Gene Amdahl as he was the first one trying wafer-scale integration and his law. Didnt know he was such a figure in the computer architecture history.

VMware Co-stop / LPM in hardware

This is a very interesting article about how Longest Prefix Matching is done in networks chips. I remember reading about bloom filters in some Cloudfare blog but I didnt think that would be use too in network chips. As well, I forgot how critical is LPM in networking.

I had to deal lately with some performance issues with an application running in a VM. I am quite a noob regarding virtualization and always assumed the bigger the VM the better (very masculine thing I guess…) But a virtualization expert at work explained me issues regarding that assumption with this link. I learnt a lot from it (still a noob though). But I agree that I see most vendors asking for crazy requirements when offering products to run in VM…. and that looks like that kills that idea itself of having a virtualization environment because that VM looks like requires a dedicated server…. So right-sizing your product/VM is very important. I agree with the statement that vendors dont really do load testing for their VM offering and the high requirements it is an excuse to “avoid” problems from customers.

Analog Computing

This is an interesting video about how we can use analog computing. It seems a good use in matrix calculation used in AI.

All our technology is digital but we are reaching limits (power usage, physical limits, etc) and the “boom” in AI seems to benefit from analog computing.

RISC-V

I have been reading this book during my lunch brakes for several month. Most of the times just a couple of pages to be honest as generally my knowledge of CPU architecture is very poor. I really enjoyed this subject in Uni and this book was one of my favourites during that time. It was like a bible of CPU architecture. And Patterson is an author in both books.

I remember that there were too main architectures RISC vs CISC. In a very summarize way, RISC were simple instruction that were easy to parallelized and executed (with more instructions to execute) and CISC were complex instruction (few to execute) but that were difficult to scale. So let’s say simplicity (RISC) “won” the race in CPU architecture.

RISC-V is an open standard so anybody can produce CPUs for executing those instruction. So you can easily get your hands dirty getting a board.

One of the reason of RISC-V is to learn from all the architectures mistakes and provide a design that works for any type of processor (embedded to super-computers), is efficient, modular and stable.

The book compares RISC-V with current ISAS from ARM-32, MIPS-32, Intel x86-32, etc. Based on cost, simplicity, performance , isolation from implementation, room for growth, code size and ease of programming.

There were many parts of the book that I couldn’t really understand but I found the chapter 8 quite interesting. This is about how to compute data concurrently. The best know architecture is Single Instruction Multiple Data (SIMD). The alternative is Vector architecture. And this is used in RISC-V. The implementation details are too our of my league.

In summary, it was a nice read to refresh a bit my CPU architecture knowledge.