Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ This model is optimized for use with [VLLM](https://github.com/vllm-project/vllm
|
|
| 8 |
|
| 9 |
### Key Features of FP8 Marlin
|
| 10 |
|
| 11 |
-
The Marlin kernel achieves impressive efficiency by packing 4 8-bit values into an int32 and performing a 4xFP8 to 4xFP16/BF16 dequantization using bit arithmetic and SIMT operations. This approach yields nearly a **2x speedup** over FP16 on most models while maintaining **near lossless quality**.
|
| 12 |
|
| 13 |
#### FP8 Advantages on NVIDIA GPUs
|
| 14 |
|
|
|
|
| 8 |
|
| 9 |
### Key Features of FP8 Marlin
|
| 10 |
|
| 11 |
+
The NeuralMagic FP8 Marlin kernel achieves impressive efficiency by packing 4 8-bit values into an int32 and performing a 4xFP8 to 4xFP16/BF16 dequantization using bit arithmetic and SIMT operations. This approach yields nearly a **2x speedup** over FP16 on most models while maintaining **near lossless quality**.
|
| 12 |
|
| 13 |
#### FP8 Advantages on NVIDIA GPUs
|
| 14 |
|