@gokaygokay on Hugging Face: "FlashPack: Lightning-Fast Model Loading for PyTorch…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

gokaygokay

posted an update 3 days ago

Post

3608

FlashPack: Lightning-Fast Model Loading for PyTorch

https://github.com/fal-ai/flashpack

FlashPack — a new, high-throughput file format and loading mechanism for PyTorch that makes model checkpoint I/O blazingly fast, even on systems without access to GPU Direct Storage (GDS).

With FlashPack, loading any model can be 3–6× faster than with the current state-of-the-art methods like accelerate or the standard load_state_dict() and to() flow — all wrapped in a lightweight, pure-Python package that works anywhere.

rahul7star

1 day ago

•

edited 1 day ago

Nice Kaggle Testing

# Install FlashPack (if not already done)
# !pip install git+https://github.com/fal-ai/flashpack.git

import torch
import torch.nn as nn
from flashpack import FlashPackMixin

import torch
import torch.nn as nn
import torch.optim as optim
device ="cpu"

#MDLT
class model(nn.Module,FlashPackMixin):
    def __init__(self):
        super().__init__()
        self.x = nn.Parameter(torch.Tensor([1.0,2.0]))

    def forward(self,x):
        o= self.x + x
        return o

I = model().to(device)
In = torch.Tensor([1.0,2.0])
E= torch.Tensor([10.0,20.0])

criterion= nn.MSELoss()
optimizer = optim.Adam(I.parameters(),lr= .01)
                
max_epoch= 5000
tolerance = 1e-6

for epoch in range (1,max_epoch+1):
    optimizer.zero_grad()
    O= I(In).to(device)
    loss = criterion(E,O)
    loss.backward()
    optimizer.step()
    if(loss< tolerance):
        print("we did it")
        break

print(I.x)


I.save_flashpack("model.flashpack",target_dtype=torch.float32)

# Load model using FlashPack API
loaded_module = I.from_flashpack("model.flashpack")

print("Original parameter:", I.x)
print("Loaded parameter:", loaded_module.x)

rahul7star

1 day ago

Output

Step	What it does	Time
build_index: 10.90µs	Scans model parameters and builds index	Ultra-fast
create_memmap: 233.28µs	Creates an on-disk memory-mapped file for large tensors	Very fast
copy_to_memmap: 3.50ms	Copies tensors to file via efficient mmap write	Excellent speed
flush_payload: 5.83ms	Final flush of binary data to disk	Great performance
append_footer: 751.49µs	Writes metadata (dtype, shape, offsets)	Very small cost
atomic_rename: 45.43µs	Final rename to ensure atomic save	Instant
read_metadata + mmap_payload	Loading phase – reads metadata and memory maps file	~0.2ms total
cpu_from_memmap + assign	Loads tensors directly from mmap without full deserialization	~100µs

In this post