SentenceTransformer based on codersan/FaLabse

This is a sentence-transformers model finetuned from codersan/FaLabse. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: codersan/FaLabse
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("codersan/FaLabse_Mizan4")
# Run inference
sentences = [
    'If this were continued, the barricade was no longer tenable.',
    'اگر این کار مداومت می\u200cیافت، سنگر قادر به مقاومت نمی\u200cبود.',
    'خوب، در این لحظه او یک محافظ داشت.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,021,596 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 3 tokens
    • mean: 16.37 tokens
    • max: 85 tokens
    • min: 3 tokens
    • mean: 18.63 tokens
    • max: 81 tokens
  • Samples:
    anchor positive
    They arose to obey. دختران برای اطاعت امر پدر از جا برخاستند.
    You'll know it all in time همه چیز را بم وقع خواهی دانست.
    She is in hysterics up there, and moans and says that we have been 'shamed and disgraced. او هر لحظه گرفتار یک‌ وضع است، زارزار گریه می‌کند. می‌گوید به ما توهین کرده‌اند، حیثیتمان را لکه‌دار نمودند.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • load_best_model_at_end: True
  • push_to_hub: True
  • hub_model_id: codersan/FaLabse_Mizan4
  • eval_on_start: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: codersan/FaLabse_Mizan4
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: True
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss
0 0 -
0.0031 100 0.1023
0.0063 200 0.1162
0.0094 300 0.0976
0.0125 400 0.088
0.0157 500 0.0691
0.0188 600 0.0678
0.0219 700 0.082
0.0251 800 0.08
0.0282 900 0.0758
0.0313 1000 0.0763
0.0345 1100 0.0786
0.0376 1200 0.0666
0.0407 1300 0.0722
0.0439 1400 0.0638
0.0470 1500 0.0615
0.0501 1600 0.0623
0.0532 1700 0.0639
0.0564 1800 0.0692
0.0595 1900 0.0625
0.0626 2000 0.0774
0.0658 2100 0.06
0.0689 2200 0.0543
0.0720 2300 0.0611
0.0752 2400 0.0697
0.0783 2500 0.0703
0.0814 2600 0.058
0.0846 2700 0.075
0.0877 2800 0.062
0.0908 2900 0.0756
0.0940 3000 0.0668
0.0971 3100 0.054
0.1002 3200 0.0626
0.1034 3300 0.0645
0.1065 3400 0.0714
0.1096 3500 0.0644
0.1128 3600 0.0693
0.1159 3700 0.0734
0.1190 3800 0.0622
0.1222 3900 0.0741
0.1253 4000 0.0761
0.1284 4100 0.0582
0.1316 4200 0.0804
0.1347 4300 0.0708
0.1378 4400 0.0734
0.1410 4500 0.0709
0.1441 4600 0.0759
0.1472 4700 0.085
0.1504 4800 0.0573
0.1535 4900 0.056
0.1566 5000 0.0601
0.1597 5100 0.0596
0.1629 5200 0.079
0.1660 5300 0.0679
0.1691 5400 0.0553
0.1723 5500 0.0677
0.1754 5600 0.0795
0.1785 5700 0.0779
0.1817 5800 0.0599
0.1848 5900 0.0667
0.1879 6000 0.064
0.1911 6100 0.0637
0.1942 6200 0.0747
0.1973 6300 0.0829
0.2005 6400 0.0589
0.2036 6500 0.0623
0.2067 6600 0.0589
0.2099 6700 0.0648
0.2130 6800 0.0527
0.2161 6900 0.0519
0.2193 7000 0.0668
0.2224 7100 0.0729
0.2255 7200 0.0627
0.2287 7300 0.0539
0.2318 7400 0.055
0.2349 7500 0.0663
0.2381 7600 0.0589
0.2412 7700 0.0555
0.2443 7800 0.0875
0.2475 7900 0.055
0.2506 8000 0.0584
0.2537 8100 0.0607
0.2569 8200 0.0551
0.2600 8300 0.0527
0.2631 8400 0.0773
0.2662 8500 0.0696
0.2694 8600 0.062
0.2725 8700 0.0716
0.2756 8800 0.06
0.2788 8900 0.0536
0.2819 9000 0.0604
0.2850 9100 0.0563
0.2882 9200 0.0734
0.2913 9300 0.0714
0.2944 9400 0.0658
0.2976 9500 0.0623
0.3007 9600 0.0713
0.3038 9700 0.0674
0.3070 9800 0.0708
0.3101 9900 0.0579
0.3132 10000 0.0616
0.3164 10100 0.0653
0.3195 10200 0.0614
0.3226 10300 0.0626
0.3258 10400 0.0611
0.3289 10500 0.0521
0.3320 10600 0.056
0.3352 10700 0.0761
0.3383 10800 0.0629
0.3414 10900 0.0658
0.3446 11000 0.0576
0.3477 11100 0.0483
0.3508 11200 0.0654
0.3540 11300 0.0602
0.3571 11400 0.065
0.3602 11500 0.0787
0.3634 11600 0.0634
0.3665 11700 0.0678
0.3696 11800 0.0758
0.3727 11900 0.0637
0.3759 12000 0.0577
0.3790 12100 0.0572
0.3821 12200 0.0614
0.3853 12300 0.0685
0.3884 12400 0.0641
0.3915 12500 0.0583
0.3947 12600 0.0502
0.3978 12700 0.0481
0.4009 12800 0.0546
0.4041 12900 0.0664
0.4072 13000 0.0699
0.4103 13100 0.0513
0.4135 13200 0.0423
0.4166 13300 0.0554
0.4197 13400 0.0592
0.4229 13500 0.0457
0.4260 13600 0.0612
0.4291 13700 0.0507
0.4323 13800 0.0592
0.4354 13900 0.0566
0.4385 14000 0.0806
0.4417 14100 0.0648
0.4448 14200 0.0535
0.4479 14300 0.0748
0.4511 14400 0.0488
0.4542 14500 0.0539
0.4573 14600 0.0597
0.4605 14700 0.065
0.4636 14800 0.0594
0.4667 14900 0.05
0.4699 15000 0.0488
0.4730 15100 0.0537
0.4761 15200 0.0396
0.4792 15300 0.0616
0.4824 15400 0.0605
0.4855 15500 0.0599
0.4886 15600 0.0616
0.4918 15700 0.0731
0.4949 15800 0.0654
0.4980 15900 0.0463
0.5012 16000 0.0463
0.5043 16100 0.0594
0.5074 16200 0.0575
0.5106 16300 0.056
0.5137 16400 0.0542
0.5168 16500 0.052
0.5200 16600 0.0438
0.5231 16700 0.0675
0.5262 16800 0.0619
0.5294 16900 0.0515
0.5325 17000 0.0575
0.5356 17100 0.0568
0.5388 17200 0.0508
0.5419 17300 0.059
0.5450 17400 0.0505
0.5482 17500 0.0582
0.5513 17600 0.0574
0.5544 17700 0.0613
0.5576 17800 0.048
0.5607 17900 0.0553
0.5638 18000 0.0571
0.5670 18100 0.0543
0.5701 18200 0.0484
0.5732 18300 0.0763
0.5764 18400 0.056
0.5795 18500 0.0533
0.5826 18600 0.044
0.5857 18700 0.0515
0.5889 18800 0.0516
0.5920 18900 0.0586
0.5951 19000 0.0523
0.5983 19100 0.0733
0.6014 19200 0.0453
0.6045 19300 0.0663
0.6077 19400 0.0381
0.6108 19500 0.0568
0.6139 19600 0.0492
0.6171 19700 0.0489
0.6202 19800 0.0575
0.6233 19900 0.0642
0.6265 20000 0.0535
0.6296 20100 0.0598
0.6327 20200 0.0569
0.6359 20300 0.0513
0.6390 20400 0.0515
0.6421 20500 0.053
0.6453 20600 0.0569
0.6484 20700 0.0372
0.6515 20800 0.0464
0.6547 20900 0.0522
0.6578 21000 0.0427
0.6609 21100 0.0584
0.6641 21200 0.0616
0.6672 21300 0.0552
0.6703 21400 0.0509
0.6735 21500 0.0439
0.6766 21600 0.0762
0.6797 21700 0.0539
0.6829 21800 0.0475
0.6860 21900 0.0557
0.6891 22000 0.0421
0.6922 22100 0.0471
0.6954 22200 0.0398
0.6985 22300 0.0521
0.7016 22400 0.0472
0.7048 22500 0.0579
0.7079 22600 0.0539
0.7110 22700 0.0527
0.7142 22800 0.0677
0.7173 22900 0.0509
0.7204 23000 0.0478
0.7236 23100 0.0593
0.7267 23200 0.0419
0.7298 23300 0.0576
0.7330 23400 0.0485
0.7361 23500 0.0544
0.7392 23600 0.0537
0.7424 23700 0.0481
0.7455 23800 0.0597
0.7486 23900 0.0464
0.7518 24000 0.0537
0.7549 24100 0.0508
0.7580 24200 0.045
0.7612 24300 0.0337
0.7643 24400 0.0478
0.7674 24500 0.0495
0.7706 24600 0.0427
0.7737 24700 0.0596
0.7768 24800 0.0468
0.7800 24900 0.0404
0.7831 25000 0.0467
0.7862 25100 0.0514
0.7894 25200 0.0462
0.7925 25300 0.0401
0.7956 25400 0.0539
0.7987 25500 0.0541
0.8019 25600 0.0639
0.8050 25700 0.0392
0.8081 25800 0.0466
0.8113 25900 0.0543
0.8144 26000 0.0507
0.8175 26100 0.0465
0.8207 26200 0.0386
0.8238 26300 0.0606
0.8269 26400 0.0558
0.8301 26500 0.0488
0.8332 26600 0.0556
0.8363 26700 0.047
0.8395 26800 0.0548
0.8426 26900 0.0423
0.8457 27000 0.0529
0.8489 27100 0.0513
0.8520 27200 0.0432
0.8551 27300 0.0605
0.8583 27400 0.0448
0.8614 27500 0.0508
0.8645 27600 0.0578
0.8677 27700 0.0409
0.8708 27800 0.0487
0.8739 27900 0.058
0.8771 28000 0.0461
0.8802 28100 0.0389
0.8833 28200 0.0427
0.8865 28300 0.0473
0.8896 28400 0.061
0.8927 28500 0.0423
0.8958 28600 0.0435
0.8990 28700 0.0389
0.9021 28800 0.0466
0.9052 28900 0.042
0.9084 29000 0.0466
0.9115 29100 0.0412
0.9146 29200 0.0444
0.9178 29300 0.059
0.9209 29400 0.0466
0.9240 29500 0.0381
0.9272 29600 0.0408
0.9303 29700 0.0557
0.9334 29800 0.0567
0.9366 29900 0.0537
0.9397 30000 0.041
0.9428 30100 0.0383
0.9460 30200 0.0412
0.9491 30300 0.0489
0.9522 30400 0.046
0.9554 30500 0.0525
0.9585 30600 0.0493
0.9616 30700 0.0485
0.9648 30800 0.0532
0.9679 30900 0.0446
0.9710 31000 0.0372
0.9742 31100 0.0472
0.9773 31200 0.0399
0.9804 31300 0.0402
0.9836 31400 0.0372
0.9867 31500 0.0497
0.9898 31600 0.0432
0.9930 31700 0.0382
0.9961 31800 0.0475
0.9992 31900 0.0367
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
-
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for codersan/FaLabse_Mizan4

Finetuned
codersan/FaLabse
Finetuned
(3)
this model