|  | --- | 
					
						
						|  | license: mit | 
					
						
						|  | datasets: | 
					
						
						|  | - dleemiller/wiki-sim | 
					
						
						|  | - sentence-transformers/stsb | 
					
						
						|  | language: | 
					
						
						|  | - en | 
					
						
						|  | metrics: | 
					
						
						|  | - spearmanr | 
					
						
						|  | - pearsonr | 
					
						
						|  | base_model: | 
					
						
						|  | - answerdotai/ModernBERT-base | 
					
						
						|  | pipeline_tag: text-classification | 
					
						
						|  | library_name: sentence-transformers | 
					
						
						|  | tags: | 
					
						
						|  | - cross-encoder | 
					
						
						|  | - modernbert | 
					
						
						|  | - sts | 
					
						
						|  | - stsb | 
					
						
						|  | - stsbenchmark-sts | 
					
						
						|  | model-index: | 
					
						
						|  | - name: CrossEncoder based on answerdotai/ModernBERT-base | 
					
						
						|  | results: | 
					
						
						|  | - task: | 
					
						
						|  | type: semantic-similarity | 
					
						
						|  | name: Semantic Similarity | 
					
						
						|  | dataset: | 
					
						
						|  | name: sts test | 
					
						
						|  | type: sts-test | 
					
						
						|  | metrics: | 
					
						
						|  | - type: pearson_cosine | 
					
						
						|  | value: 0.9162245947821821 | 
					
						
						|  | name: Pearson Cosine | 
					
						
						|  | - type: spearman_cosine | 
					
						
						|  | value: 0.9121555789491528 | 
					
						
						|  | name: Spearman Cosine | 
					
						
						|  | - task: | 
					
						
						|  | type: semantic-similarity | 
					
						
						|  | name: Semantic Similarity | 
					
						
						|  | dataset: | 
					
						
						|  | name: sts dev | 
					
						
						|  | type: sts-dev | 
					
						
						|  | metrics: | 
					
						
						|  | - type: pearson_cosine | 
					
						
						|  | value: 0.9260833551026787 | 
					
						
						|  | name: Pearson Cosine | 
					
						
						|  | - type: spearman_cosine | 
					
						
						|  | value: 0.9236030687487745 | 
					
						
						|  | name: Spearman Cosine | 
					
						
						|  | --- | 
					
						
						|  | # ModernBERT Cross-Encoder: Semantic Similarity (STS) | 
					
						
						|  |  | 
					
						
						|  | Cross encoders are high performing encoder models that compare two texts and output a 0-1 score. | 
					
						
						|  | I've found the `cross-encoders/roberta-large-stsb` model to be very useful in creating evaluators for LLM outputs. | 
					
						
						|  | They're simple to use, fast and very accurate. | 
					
						
						|  |  | 
					
						
						|  | Like many people, I was excited about the architecture and training uplift from the ModernBERT architecture (`answerdotai/ModernBERT-base`). | 
					
						
						|  | So I've applied it to the stsb cross encoder, which is a very handy model. Additionally, I've added | 
					
						
						|  | pretraining from a much larger semi-synthetic dataset `dleemiller/wiki-sim` that targets this kind of objective. | 
					
						
						|  | The inference performance efficiency, expanded context and simplicity make this a really nice platform as an evaluator model. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## Features | 
					
						
						|  | - **High performing:** Achieves **Pearson: 0.9162** and **Spearman: 0.9122** on the STS-Benchmark test set. | 
					
						
						|  | - **Efficient architecture:** Based on the ModernBERT-base design (149M parameters), offering faster inference speeds. | 
					
						
						|  | - **Extended context length:** Processes sequences up to 8192 tokens, great for LLM output evals. | 
					
						
						|  | - **Diversified training:** Pretrained on `dleemiller/wiki-sim` and fine-tuned on `sentence-transformers/stsb`. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## Performance | 
					
						
						|  |  | 
					
						
						|  | | Model                          | STS-B Test Pearson | STS-B Test Spearman | Context Length | Parameters | Speed  | | 
					
						
						|  | |--------------------------------|--------------------|---------------------|----------------|------------|---------| | 
					
						
						|  | | `dleemiller/ModernCE-large-sts`           | **0.9256**         | **0.9215**          | **8192**       | 395M       | **Medium** | | 
					
						
						|  | | `dleemiller/CrossGemma-sts-300m`          | 0.9175         | 0.9135          | 2048       | 303M       | **Medium** | | 
					
						
						|  | | `dleemiller/ModernCE-base-sts`            | 0.9162         | 0.9122          | **8192**       | 149M       | **Fast** | | 
					
						
						|  | | `cross-encoder/stsb-roberta-large`        | 0.9147            | -              | 512            | 355M       | Slow    | | 
					
						
						|  | | `dleemiller/EttinX-sts-m`                 | 0.9143        | 0.9102          | **8192**       | 149M       | **Fast** | | 
					
						
						|  | | `dleemiller/NeoCE-sts`                    | 0.9124         | 0.9087          | 4096       | 250M       | **Fast** | | 
					
						
						|  | | `dleemiller/EttinX-sts-s`                 | 0.9004        | 0.8926          | **8192**       | 68M       | **Very Fast** | | 
					
						
						|  | | `cross-encoder/stsb-distilroberta-base`   | 0.8792            | -              | 512            | 82M        | Fast    | | 
					
						
						|  | | `dleemiller/EttinX-sts-xs`                | 0.8763        | 0.8689          | **8192**       | 32M       | **Very Fast** | | 
					
						
						|  | | `dleemiller/EttinX-sts-xxs`               | 0.8414        | 0.8311          | **8192**       | 17M       | **Very Fast** | | 
					
						
						|  | | `dleemiller/sts-bert-hash-nano`           | 0.7904        | 0.7743          | **8192**       | 0.97M       | **Very Fast** | | 
					
						
						|  | | `dleemiller/sts-bert-hash-pico`           | 0.7595        | 0.7474          | **8192**       | 0.45M       | **Very Fast** | | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## Usage | 
					
						
						|  |  | 
					
						
						|  | To use ModernCE for semantic similarity tasks, you can load the model with the Hugging Face `sentence-transformers` library: | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from sentence_transformers import CrossEncoder | 
					
						
						|  |  | 
					
						
						|  | # Load ModernCE model | 
					
						
						|  | model = CrossEncoder("dleemiller/ModernCE-base-sts") | 
					
						
						|  |  | 
					
						
						|  | # Predict similarity scores for sentence pairs | 
					
						
						|  | sentence_pairs = [ | 
					
						
						|  | ("It's a wonderful day outside.", "It's so sunny today!"), | 
					
						
						|  | ("It's a wonderful day outside.", "He drove to work earlier."), | 
					
						
						|  | ] | 
					
						
						|  | scores = model.predict(sentence_pairs) | 
					
						
						|  |  | 
					
						
						|  | print(scores)  # Outputs: array([0.9184, 0.0123], dtype=float32) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ### Output | 
					
						
						|  | The model returns similarity scores in the range `[0, 1]`, where higher scores indicate stronger semantic similarity. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## Training Details | 
					
						
						|  |  | 
					
						
						|  | ### Pretraining | 
					
						
						|  | The model was pretrained on the `pair-score-sampled` subset of the [`dleemiller/wiki-sim`](https://huggingface.co/datasets/dleemiller/wiki-sim) dataset. This dataset provides diverse sentence pairs with semantic similarity scores, helping the model build a robust understanding of relationships between sentences. | 
					
						
						|  | - **Classifier Dropout:** a somewhat large classifier dropout of 0.3, to reduce overreliance on teacher scores. | 
					
						
						|  | - **Objective:** STS-B scores from `cross-encoder/stsb-roberta-large`. | 
					
						
						|  |  | 
					
						
						|  | ### Fine-Tuning | 
					
						
						|  | Fine-tuning was performed on the [`sentence-transformers/stsb`](https://huggingface.co/datasets/sentence-transformers/stsb) dataset. | 
					
						
						|  |  | 
					
						
						|  | ### Validation Results | 
					
						
						|  | The model achieved the following test set performance after fine-tuning: | 
					
						
						|  | - **Pearson Correlation:** 0.9162 | 
					
						
						|  | - **Spearman Correlation:** 0.9122 | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## Model Card | 
					
						
						|  |  | 
					
						
						|  | - **Architecture:** ModernBERT-base | 
					
						
						|  | - **Tokenizer:** Custom tokenizer trained with modern techniques for long-context handling. | 
					
						
						|  | - **Pretraining Data:** `dleemiller/wiki-sim (pair-score-sampled)` | 
					
						
						|  | - **Fine-Tuning Data:** `sentence-transformers/stsb` | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## Thank You | 
					
						
						|  |  | 
					
						
						|  | Thanks to the AnswerAI team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## Citation | 
					
						
						|  |  | 
					
						
						|  | If you use this model in your research, please cite: | 
					
						
						|  |  | 
					
						
						|  | ```bibtex | 
					
						
						|  | @misc{moderncestsb2025, | 
					
						
						|  | author = {Miller, D. Lee}, | 
					
						
						|  | title = {ModernCE STS: An STS cross encoder model}, | 
					
						
						|  | year = {2025}, | 
					
						
						|  | publisher = {Hugging Face Hub}, | 
					
						
						|  | url = {https://huggingface.co/dleemiller/ModernCE-base-sts}, | 
					
						
						|  | } | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## License | 
					
						
						|  |  | 
					
						
						|  | This model is licensed under the [MIT License](LICENSE). |