FineData

Team

community

AI & ML interests

We release large pre-training datasets to accelerate open LLM development. Part of the Hugging Face Science team (hf.co/science)

Recent Activity

guipenedo new activity 1 day ago

HuggingFaceFW/finewiki:Filtered Cebuano?

hynky new activity 4 days ago

HuggingFaceFW/finepdfs:OCR or not classifier

hynky new activity 4 days ago

HuggingFaceFW/finepdfs:A Few Questions About the Implementation Details of the finepdfs Project

View all activity

Papers

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

View all Papers

HuggingFaceFW 's Papers 2

Submitted by

guipenedo

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

HuggingFaceFW

1

Submitted by

philschmid

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

HuggingFaceFW

5