I'm excited to announce the release of Kanon 2 Embedder, the world's best legal embedding model, ranked first on the Massive Legal Embedding Benchmark ๐
This model is the product of quite literally months of painstaking work alongside @abdurrahmanbutler collecting, cleaning, and processing terabytes of data as well as coming up with novel improvements to the standard embedder training recipe to push the limits of what's possible.
Kanon 2 Embedder is my most advanced model to date. On MLEB, it benchmarks as 9% more accurate than OpenAI's best embedding model and 30% faster.
Even when truncated from 1,792 to 768 dimensions, Kanon 2 Embedder continues to hold the number one spot on MLEB.
Importantly, Kanon 2 Embedder is also privacy and security friendly โ unlike Voyage, Cohere and Jina, none of your data is used to train our models by default.
Kanon 2 Embedder can also be self-hosted for enterprises with heightened security or reliability requirements.