From the AlphaSignal email list, that most of the times go over my lame knowledge, I found this piece of info, quite interesting:
FP8-LM: Training FP8 Large Language Models
Goal: Optimize LLM training with FP8 low-bit data formats.
Issue: High cost of LLM computational resources.
Solution: FP8 automatic mixed-precision framework for LLMs.
Results: Reduced memory by 42%, increased speed by 64%.
Insight: FP8 maintains accuracy, optimizes training efficiency.
This is something I want to really understand at one point. FP (Floating-Point) instructions can be from several sizes (8, 16, 32, 64). So the bigger, the better precision. I guess for some scientific tasks that is important. But looks like for AI, with FP8 could be good enough.