FP8-LM – T.I.L

From the AlphaSignal email list, that most of the times go over my lame knowledge, I found this piece of info, quite interesting:

FP8-LM: Training FP8 Large Language Models

Goal: Optimize LLM training with FP8 low-bit data formats.
Issue: High cost of LLM computational resources.
Solution: FP8 automatic mixed-precision framework for LLMs.
Results: Reduced memory by 42%, increased speed by 64%.
Insight: FP8 maintains accuracy, optimizes training efficiency.

Repo. Paper

This is something I want to really understand at one point. FP (Floating-Point) instructions can be from several sizes (8, 16, 32, 64). So the bigger, the better precision. I guess for some scientific tasks that is important. But looks like for AI, with FP8 could be good enough.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30