ไธญๆ–‡็‰ˆๆœฌ

About TexTeller

  • ๐Ÿ“ฎ[2024-03-25] TexTeller 2.0 released! The training data for TexTeller 2.0 has been increased to 7.5M (about 15 times more than TexTeller 1.0 and also improved in data quality). The trained TexTeller 2.0 demonstrated superior performance in the test set, especially in recognizing rare symbols, complex multi-line formulas, and matrices.

    There are more test images here and a horizontal comparison of recognition models from different companies.

TexTeller is a ViT-based model designed for end-to-end formula recognition. It can recognize formulas in natural images and convert them into LaTeX-style formulas.

TexTeller is trained on a larger dataset of image-formula pairs (a 550K dataset available here), exhibits superior generalization ability and higher accuracy compared to LaTeX-OCR, which uses approximately 100K data points. This larger dataset enables TexTeller to cover most usage scenarios more effectively.

For more details, please refer to the ๐“๐ž๐ฑ๐“๐ž๐ฅ๐ฅ๐ž๐ซ GitHub repository.

Downloads last month
15
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train Brian314/TexTeller