FP8-LM

From the AlphaSignal email list, that most of the times go over my lame knowledge, I found this piece of info, quite interesting:

FP8-LM: Training FP8 Large Language Models

Goal: Optimize LLM training with FP8 low-bit data formats.
Issue: High cost of LLM computational resources.
Solution: FP8 automatic mixed-precision framework for LLMs.
Results: Reduced memory by 42%, increased speed by 64%.
Insight: FP8 maintains accuracy, optimizes training efficiency.

Repo. Paper

This is something I want to really understand at one point. FP (Floating-Point) instructions can be from several sizes (8, 16, 32, 64). So the bigger, the better precision. I guess for some scientific tasks that is important. But looks like for AI, with FP8 could be good enough.

Limits Computer Performance

Reading across this blog, I came to this statement:

What limits computer performance today is predictability, and the two big ones are instruction/branch predictability, and data locality.

That is from this interview. I dont kown Jim Keller but it is a long and interesting conversation. I liked it when he says he was the laziest person at Tesla!

And actually I found a tab from his company