Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
dvruette 's Collections
Scaling Behavior of Discrete Diffusion Language Models
Generalized Interpolating Discrete Diffusion
OpenWebText BPE

OpenWebText BPE

updated 18 days ago

BPE tokenizers with vocab sizes between 1k and 131k trained on OpenWebText, as well as the pre-tokenized dataset for each of them.

Upvote
-

  • dvruette/openwebtext-bpe-1k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-2k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-4k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-8k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-16k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-33k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-66k

    Updated Dec 17, 2025

  • dvruette/openwebtext-bpe-131k

    Updated Dec 17, 2025

  • dvruette/openwebtext-tokenized-1k

    Viewer • Updated Dec 19, 2025 • 8.01M • 238

  • dvruette/openwebtext-tokenized-2k

    Viewer • Updated Dec 19, 2025 • 8.01M • 207

  • dvruette/openwebtext-tokenized-4k

    Viewer • Updated Dec 19, 2025 • 8.01M • 4

  • dvruette/openwebtext-tokenized-8k

    Viewer • Updated Dec 19, 2025 • 8.01M • 167

  • dvruette/openwebtext-tokenized-16k

    Viewer • Updated Dec 19, 2025 • 8.01M • 10

  • dvruette/openwebtext-tokenized-33k

    Viewer • Updated Dec 19, 2025 • 8.01M • 28

  • dvruette/openwebtext-tokenized-66k

    Viewer • Updated Dec 19, 2025 • 8.01M • 29

  • dvruette/openwebtext-tokenized-131k

    Viewer • Updated Dec 19, 2025 • 8.01M • 105
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs