Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
gokaygokay 
posted an update 3 days ago
Post
3608
FlashPack: Lightning-Fast Model Loading for PyTorch

https://github.com/fal-ai/flashpack

FlashPack — a new, high-throughput file format and loading mechanism for PyTorch that makes model checkpoint I/O blazingly fast, even on systems without access to GPU Direct Storage (GDS).

With FlashPack, loading any model can be 3–6× faster than with the current state-of-the-art methods like accelerate or the standard load_state_dict() and to() flow — all wrapped in a lightweight, pure-Python package that works anywhere.

Nice Kaggle Testing

# Install FlashPack (if not already done)
# !pip install git+https://github.com/fal-ai/flashpack.git

import torch
import torch.nn as nn
from flashpack import FlashPackMixin

import torch
import torch.nn as nn
import torch.optim as optim
device ="cpu"

#MDLT
class model(nn.Module,FlashPackMixin):
    def __init__(self):
        super().__init__()
        self.x = nn.Parameter(torch.Tensor([1.0,2.0]))

    def forward(self,x):
        o= self.x + x
        return o

I = model().to(device)
In = torch.Tensor([1.0,2.0])
E= torch.Tensor([10.0,20.0])

criterion= nn.MSELoss()
optimizer = optim.Adam(I.parameters(),lr= .01)
                
max_epoch= 5000
tolerance = 1e-6

for epoch in range (1,max_epoch+1):
    optimizer.zero_grad()
    O= I(In).to(device)
    loss = criterion(E,O)
    loss.backward()
    optimizer.step()
    if(loss< tolerance):
        print("we did it")
        break

print(I.x)


I.save_flashpack("model.flashpack",target_dtype=torch.float32)

# Load model using FlashPack API
loaded_module = I.from_flashpack("model.flashpack")

print("Original parameter:", I.x)
print("Loaded parameter:", loaded_module.x)

·

Output

Step What it does Time
build_index: 10.90µs Scans model parameters and builds index Ultra-fast
create_memmap: 233.28µs Creates an on-disk memory-mapped file for large tensors Very fast
copy_to_memmap: 3.50ms Copies tensors to file via efficient mmap write Excellent speed
flush_payload: 5.83ms Final flush of binary data to disk Great performance
append_footer: 751.49µs Writes metadata (dtype, shape, offsets) Very small cost
atomic_rename: 45.43µs Final rename to ensure atomic save Instant
read_metadata + mmap_payload Loading phase – reads metadata and memory maps file ~0.2ms total
cpu_from_memmap + assign Loads tensors directly from mmap without full deserialization ~100µs