ColBERT MUVERA Pico

This is a PyLate model finetuned from neuml/bert-hash-pico on the msmarco-en-bge-gemma unnormalized split dataset. It maps sentences & paragraphs to sequences of 80-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.

This model is trained with un-normalized scores, making it compatible with MUVERA fixed-dimensional encoding.

Usage (txtai)

This model can be used to build embeddings databases with txtai for semantic search and/or as a knowledge source for retrieval augmented generation (RAG).

Note: txtai 9.0+ is required for late interaction model support

import txtai

embeddings = txtai.Embeddings(
  sparse="neuml/colbert-muvera-pico",
  content=True
)
embeddings.index(documents())

# Run a query
embeddings.search("query to run")

Late interaction models excel as reranker pipelines.

from txtai.pipeline import Reranker, Similarity

similarity = Similarity(path="neuml/colbert-muvera-pico", lateencode=True)
ranker = Reranker(embeddings, similarity)
ranker("query to run")

Usage (PyLate)

Alternatively, the model can be loaded with PyLate.

from pylate import rank, models

queries = [
    "query A",
    "query B",
]

documents = [
    ["document A", "document B"],
    ["document 1", "document C", "document B"],
]

documents_ids = [
    [1, 2],
    [1, 3, 2],
]

model = models.ColBERT(
    model_name_or_path="neuml/colbert-muvera-pico",
)

queries_embeddings = model.encode(
    queries,
    is_query=True,
)

documents_embeddings = model.encode(
    documents,
    is_query=False,
)

reranked_documents = rank.rerank(
    documents_ids=documents_ids,
    queries_embeddings=queries_embeddings,
    documents_embeddings=documents_embeddings,
)

Full Model Architecture

ColBERT(
  (0): Transformer({'max_seq_length': 299, 'do_lower_case': False}) with Transformer model: BertHashModel 
  (1): Dense({'in_features': 80, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

Evaluation

BEIR Subset

The following table shows a subset of BEIR scored with the txtai benchmarks script.

Scores reported are ndcg@10 and grouped into the following three categories.

FULL multi-vector maxsim

Model Parameters NFCorpus SciDocs SciFact Average
ColBERT v2 110M 0.3165 0.1497 0.6456 0.3706
ColBERT MUVERA Femto 0.2M 0.2513 0.0870 0.4710 0.2698
ColBERT MUVERA Pico 0.4M 0.3005 0.1117 0.6452 0.3525
ColBERT MUVERA Nano 0.9M 0.3180 0.1262 0.6576 0.3673
ColBERT MUVERA Micro 4M 0.3235 0.1244 0.6676 0.3718

MUVERA encoding + maxsim re-ranking of the top 100 results per MUVERA paper

Model Parameters NFCorpus SciDocs SciFact Average
ColBERT v2 110M 0.3025 0.1538 0.6278 0.3614
ColBERT MUVERA Femto 0.2M 0.2316 0.0858 0.4641 0.2605
ColBERT MUVERA Pico 0.4M 0.2821 0.1004 0.6090 0.3305
ColBERT MUVERA Nano 0.9M 0.2996 0.1201 0.6249 0.3482
ColBERT MUVERA Micro 4M 0.3095 0.1228 0.6464 0.3596

MUVERA encoding only

Model Parameters NFCorpus SciDocs SciFact Average
ColBERT v2 110M 0.2356 0.1229 0.5002 0.2862
ColBERT MUVERA Femto 0.2M 0.1851 0.0411 0.3518 0.1927
ColBERT MUVERA Pico 0.4M 0.1926 0.0564 0.4424 0.2305
ColBERT MUVERA Nano 0.9M 0.2355 0.0807 0.4904 0.2689
ColBERT MUVERA Micro 4M 0.2348 0.0882 0.4875 0.2702

Note: The scores reported don't match scores reported in the respective papers due to different default settings in the txtai benchmark scripts.

As noted earlier, models trained with min-max score normalization don't perform well with MUVERA encoding. See this GitHub Issue for more.

At 450K parameters, this model does shockingly well! It's not too far off from the baseline 4M parameter model at 1/10th the size. It's also not too far off from the original ColBERT v2 model, which has 110M parameters.

Nano BEIR

  • Dataset: NanoBEIR_mean
  • Evaluated with pylate.evaluation.nano_beir_evaluator.NanoBEIREvaluator
Metric Value
MaxSim_accuracy@1 0.4826
MaxSim_accuracy@3 0.6368
MaxSim_accuracy@5 0.7015
MaxSim_accuracy@10 0.7585
MaxSim_precision@1 0.4826
MaxSim_precision@3 0.2979
MaxSim_precision@5 0.2345
MaxSim_precision@10 0.1649
MaxSim_recall@1 0.2728
MaxSim_recall@3 0.4051
MaxSim_recall@5 0.4649
MaxSim_recall@10 0.532
MaxSim_ndcg@10 0.5069
MaxSim_mrr@10 0.5733
MaxSim_map@100 0.4287

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • learning_rate: 0.0003
  • num_train_epochs: 1
  • warmup_ratio: 0.05
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0003
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Framework Versions

  • Python: 3.10.18
  • Sentence Transformers: 4.0.2
  • PyLate: 1.3.2
  • Transformers: 4.57.0
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.1.1
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084"
}

PyLate

@misc{PyLate,
title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
author={Chaffin, Antoine and Sourty, Raphaël},
url={https://github.com/lightonai/pylate},
year={2024}
}
Downloads last month
255
Safetensors
Model size
448k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NeuML/colbert-muvera-pico

Finetuned
(2)
this model

Dataset used to train NeuML/colbert-muvera-pico

Collection including NeuML/colbert-muvera-pico

Evaluation results