ColBERT MUVERA Pico
This is a PyLate model finetuned from neuml/bert-hash-pico on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 80-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding.
Usage (txtai)
This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG).
Note: txtai 9.0+ is required for late interaction model support
import txtai
embeddings = txtai.Embeddings(
sparse="neuml/colbert-muvera-pico",
content=True
)
embeddings.index(documents())
# Run a query
embeddings.search("query to run")
Late interaction models excel as reranker pipelines.
from txtai.pipeline import Reranker, Similarity
similarity = Similarity(path="neuml/colbert-muvera-pico", lateencode=True)
ranker = Reranker(embeddings, similarity)
ranker("query to run")
Usage (PyLate)
Alternatively, the model can be loaded with PyLate.
from pylate import rank, models
queries = [
"query A",
"query B",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]
model = models.ColBERT(
model_name_or_path="neuml/colbert-muvera-pico",
)
queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)
reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
Full Model Architecture
ColBERT(
(0): Transformer({'max_seq_length': 299, 'do_lower_case': False}) with Transformer model: BertHashModel
(1): Dense({'in_features': 80, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
Evaluation
BEIR Subset
The following table shows a subset of BEIR scored with the txtai benchmarks script.
Scores reported are ndcg@10 and grouped into the following three categories.
FULL multi-vector maxsim
| Model | Parameters | NFCorpus | SciDocs | SciFact | Average |
|---|---|---|---|---|---|
| ColBERT v2 | 110M | 0.3165 | 0.1497 | 0.6456 | 0.3706 |
| ColBERT MUVERA Femto | 0.2M | 0.2513 | 0.0870 | 0.4710 | 0.2698 |
| ColBERT MUVERA Pico | 0.4M | 0.3005 | 0.1117 | 0.6452 | 0.3525 |
| ColBERT MUVERA Nano | 0.9M | 0.3180 | 0.1262 | 0.6576 | 0.3673 |
| ColBERT MUVERA Micro | 4M | 0.3235 | 0.1244 | 0.6676 | 0.3718 |
MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper
| Model | Parameters | NFCorpus | SciDocs | SciFact | Average |
|---|---|---|---|---|---|
| ColBERT v2 | 110M | 0.3025 | 0.1538 | 0.6278 | 0.3614 |
| ColBERT MUVERA Femto | 0.2M | 0.2316 | 0.0858 | 0.4641 | 0.2605 |
| ColBERT MUVERA Pico | 0.4M | 0.2821 | 0.1004 | 0.6090 | 0.3305 |
| ColBERT MUVERA Nano | 0.9M | 0.2996 | 0.1201 | 0.6249 | 0.3482 |
| ColBERT MUVERA Micro | 4M | 0.3095 | 0.1228 | 0.6464 | 0.3596 |
MUVERA encoding only
| Model | Parameters | NFCorpus | SciDocs | SciFact | Average |
|---|---|---|---|---|---|
| ColBERT v2 | 110M | 0.2356 | 0.1229 | 0.5002 | 0.2862 |
| ColBERT MUVERA Femto | 0.2M | 0.1851 | 0.0411 | 0.3518 | 0.1927 |
| ColBERT MUVERA Pico | 0.4M | 0.1926 | 0.0564 | 0.4424 | 0.2305 |
| ColBERT MUVERA Nano | 0.9M | 0.2355 | 0.0807 | 0.4904 | 0.2689 |
| ColBERT MUVERA Micro | 4M | 0.2348 | 0.0882 | 0.4875 | 0.2702 |
Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts.
As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more.
At 450K parameters, this model does shockingly well! It's not too far off from the baseline 4M parameter model at 1/10th the size. It's also not too far off from the original ColBERT v2 model, which has 110M parameters.
Nano BEIR
- Dataset:
NanoBEIR_mean - Evaluated with
pylate.evaluation.nano_beir_evaluator.NanoBEIREvaluator
| Metric | Value |
|---|---|
| MaxSim_accuracy@1 | 0.4826 |
| MaxSim_accuracy@3 | 0.6368 |
| MaxSim_accuracy@5 | 0.7015 |
| MaxSim_accuracy@10 | 0.7585 |
| MaxSim_precision@1 | 0.4826 |
| MaxSim_precision@3 | 0.2979 |
| MaxSim_precision@5 | 0.2345 |
| MaxSim_precision@10 | 0.1649 |
| MaxSim_recall@1 | 0.2728 |
| MaxSim_recall@3 | 0.4051 |
| MaxSim_recall@5 | 0.4649 |
| MaxSim_recall@10 | 0.532 |
| MaxSim_ndcg@10 | 0.5069 |
| MaxSim_mrr@10 | 0.5733 |
| MaxSim_map@100 | 0.4287 |
Training Details
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 32learning_rate: 0.0003num_train_epochs: 1warmup_ratio: 0.05fp16: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 0.0003weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.05warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional
Framework Versions
- Python: 3.10.18
- Sentence Transformers: 4.0.2
- PyLate: 1.3.2
- Transformers: 4.57.0
- PyTorch: 2.8.0+cu128
- Accelerate: 1.10.1
- Datasets: 4.1.1
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084"
}
PyLate
@misc{PyLate,
title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
author={Chaffin, Antoine and Sourty, Raphaël},
url={https://github.com/lightonai/pylate},
year={2024}
}
- Downloads last month
- 255
Model tree for NeuML/colbert-muvera-pico
Base model
NeuML/bert-hash-picoDataset used to train NeuML/colbert-muvera-pico
Collection including NeuML/colbert-muvera-pico
Evaluation results
- Maxsim Accuracy@1 on NanoClimateFEVERself-reported0.220
- Maxsim Accuracy@3 on NanoClimateFEVERself-reported0.320
- Maxsim Accuracy@5 on NanoClimateFEVERself-reported0.400
- Maxsim Accuracy@10 on NanoClimateFEVERself-reported0.540
- Maxsim Precision@1 on NanoClimateFEVERself-reported0.220
- Maxsim Precision@3 on NanoClimateFEVERself-reported0.113
- Maxsim Precision@5 on NanoClimateFEVERself-reported0.092
- Maxsim Precision@10 on NanoClimateFEVERself-reported0.062
- Maxsim Recall@1 on NanoClimateFEVERself-reported0.125
- Maxsim Recall@3 on NanoClimateFEVERself-reported0.180