xml-base base trained on Query triplets

This is a sentence-transformers model finetuned from heydariAI/persian-embeddings on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: heydariAI/persian-embeddings
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'پنکه رومیزی',
    'پنکه رومیزی کوچک',
    'چراغ رومیزی',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8511, 0.2971],
#         [0.8511, 1.0000, 0.2242],
#         [0.2971, 0.2242, 1.0000]])

Evaluation

Metrics

Triplet

Metric query-dev query-test
cosine_accuracy 0.9676 0.9668

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 801,402 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 7.99 tokens
    • max: 44 tokens
    • min: 4 tokens
    • mean: 9.86 tokens
    • max: 24 tokens
    • min: 4 tokens
    • mean: 8.13 tokens
    • max: 16 tokens
  • Samples:
    anchor positive negative
    حراجی لباس بچه لباس بچگانه حراجی حراجی کفش زنانه
    گوشواره طلا دو حلقه اس گوشواره طلا زنانه دو حلقه انگشتر طلا زنانه دو بندی
    redmy a3قاب گوشی قاب گوشی مناسب برای گوشی ردمی A3 شارژر گوشی ردمی A3
  • Loss: GISTEmbedLoss with these parameters:
    {
        "guide": "SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')",
        "temperature": 0.0493,
        "margin_strategy": "relative",
        "margin": 0.0516,
        "contrast_anchors": true,
        "contrast_positives": true,
        "gather_across_devices": false
    }
    

Evaluation Dataset

json

  • Dataset: json
  • Size: 100,175 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 7.8 tokens
    • max: 25 tokens
    • min: 4 tokens
    • mean: 9.86 tokens
    • max: 23 tokens
    • min: 4 tokens
    • mean: 8.09 tokens
    • max: 16 tokens
  • Samples:
    anchor positive negative
    کراپ تیشرت زنانه ورزشی تیشرت کراپ زنانه ورزشی شلوار ورزشی زنانه
    فیشیال دستگاه دستگاه بخور صورت برای فیشیال دستگاه تصفیه هوای خانگی
    پیراهن مشکی مردانه یقه خرگوشی پیراهن مردانه مشکی یقه دار طرح خرگوشی شلوار مشکی مردانه یقه خرگوشی
  • Loss: GISTEmbedLoss with these parameters:
    {
        "guide": "SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')",
        "temperature": 0.0493,
        "margin_strategy": "relative",
        "margin": 0.0516,
        "contrast_anchors": true,
        "contrast_positives": true,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 1.1701480000238433e-05
  • num_train_epochs: 5
  • warmup_ratio: 0.15873389962653162
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1.1701480000238433e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.15873389962653162
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 3
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss query-dev_cosine_accuracy query-test_cosine_accuracy
-1 -1 - - 0.8824 -
0.0799 1000 0.2209 0.1147 0.9180 -
0.1597 2000 0.1248 0.0842 0.9316 -
0.2396 3000 0.0962 0.0693 0.9370 -
0.3195 4000 0.0842 0.0611 0.9426 -
0.3993 5000 0.0742 0.0555 0.9458 -
0.4792 6000 0.0681 0.0538 0.9490 -
0.5591 7000 0.0661 0.0498 0.9488 -
0.6389 8000 0.0637 0.0471 0.9525 -
0.7188 9000 0.0609 0.0461 0.9528 -
0.7987 10000 0.0573 0.0452 0.9525 -
0.8785 11000 0.055 0.0449 0.9550 -
0.9584 12000 0.0541 0.0431 0.9556 -
1.0383 13000 0.0553 0.0427 0.9547 -
1.1181 14000 0.053 0.0402 0.9586 -
1.1980 15000 0.0464 0.0401 0.9583 -
1.2779 16000 0.0437 0.0380 0.9586 -
1.3577 17000 0.0426 0.0373 0.9599 -
1.4376 18000 0.038 0.0376 0.9593 -
1.5175 19000 0.037 0.0361 0.9605 -
1.5973 20000 0.0348 0.0364 0.9607 -
1.6772 21000 0.033 0.0349 0.9621 -
1.7570 22000 0.029 0.0347 0.9609 -
1.8369 23000 0.0278 0.0345 0.9617 -
1.9168 24000 0.0261 0.0346 0.9620 -
1.9966 25000 0.0269 0.0334 0.9626 -
2.0765 26000 0.0267 0.0335 0.9632 -
2.1564 27000 0.0246 0.0333 0.9643 -
2.2362 28000 0.0227 0.0330 0.9629 -
2.3161 29000 0.0224 0.0327 0.9642 -
2.3960 30000 0.0209 0.0325 0.9642 -
2.4758 31000 0.0195 0.0330 0.9648 -
2.5557 32000 0.0191 0.0327 0.9652 -
2.6356 33000 0.0189 0.0316 0.9643 -
2.7154 34000 0.0165 0.0324 0.9645 -
2.7953 35000 0.015 0.0309 0.9644 -
2.8752 36000 0.0142 0.0323 0.9654 -
2.9550 37000 0.0139 0.0316 0.9646 -
3.0349 38000 0.0151 0.0303 0.9650 -
3.1148 39000 0.0145 0.0307 0.9664 -
3.1946 40000 0.0128 0.0303 0.9656 -
3.2745 41000 0.0127 0.0300 0.9659 -
3.3544 42000 0.0125 0.0305 0.9663 -
3.4342 43000 0.0106 0.0305 0.9661 -
3.5141 44000 0.011 0.0308 0.9670 -
3.5940 45000 0.0105 0.0295 0.9665 -
3.6738 46000 0.0101 0.0297 0.9666 -
3.7537 47000 0.0091 0.0299 0.9667 -
3.8336 48000 0.009 0.0297 0.9666 -
3.9134 49000 0.0082 0.0298 0.9662 -
3.9933 50000 0.0086 0.0301 0.9668 -
4.0732 51000 0.0087 0.0290 0.9674 -
4.1530 52000 0.0084 0.0287 0.9678 -
4.2329 53000 0.0078 0.0288 0.9667 -
4.3128 54000 0.008 0.0287 0.9669 -
4.3926 55000 0.0074 0.0287 0.9669 -
4.4725 56000 0.007 0.0288 0.9677 -
4.5524 57000 0.0068 0.0288 0.9674 -
4.6322 58000 0.007 0.0282 0.9677 -
4.7121 59000 0.0064 0.0286 0.9678 -
4.7919 60000 0.006 0.0283 0.9675 -
4.8718 61000 0.0059 0.0284 0.9675 -
4.9517 62000 0.0057 0.0284 0.9676 -
-1 -1 - - 0.9676 0.9668

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.55.0
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.10.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
25
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mjaliz/xml-base-gis-basalam-1MQ

Finetuned
(1)
this model

Evaluation results