SentenceTransformer based on codersan/FaLabse

This is a sentence-transformers model finetuned from codersan/FaLabse. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: codersan/FaLabse
Maximum Sequence Length: 256 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("codersan/FaLabse_Mizan4")
# Run inference
sentences = [
    'If this were continued, the barricade was no longer tenable.',
    'اگر این کار مداومت می\u200cیافت، سنگر قادر به مقاومت نمی\u200cبود.',
    'خوب، در این لحظه او یک محافظ داشت.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 1,021,596 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 3 tokens
mean: 16.37 tokens
max: 85 tokens

min: 3 tokens
mean: 18.63 tokens
max: 81 tokens

	anchor	positive
type	string	string
details	min: 3 tokens mean: 16.37 tokens max: 85 tokens	min: 3 tokens mean: 18.63 tokens max: 81 tokens

Samples:

anchor	positive
`They arose to obey.`	`دختران برای اطاعت امر پدر از جا برخاستند.`
`You'll know it all in time`	`همه چیز را بم وقع خواهی دانست.`
`She is in hysterics up there, and moans and says that we have been 'shamed and disgraced.`	`او هر لحظه گرفتار یک‌ وضع است، زارزار گریه می‌کند. می‌گوید به ما توهین کرده‌اند، حیثیتمان را لکه‌دار نمودند.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
load_best_model_at_end: True
push_to_hub: True
hub_model_id: codersan/FaLabse_Mizan4
eval_on_start: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 8
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: True
resume_from_checkpoint: None
hub_model_id: codersan/FaLabse_Mizan4
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: True
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss
0	0	-
0.0031	100	0.1023
0.0063	200	0.1162
0.0094	300	0.0976
0.0125	400	0.088
0.0157	500	0.0691
0.0188	600	0.0678
0.0219	700	0.082
0.0251	800	0.08
0.0282	900	0.0758
0.0313	1000	0.0763
0.0345	1100	0.0786
0.0376	1200	0.0666
0.0407	1300	0.0722
0.0439	1400	0.0638
0.0470	1500	0.0615
0.0501	1600	0.0623
0.0532	1700	0.0639
0.0564	1800	0.0692
0.0595	1900	0.0625
0.0626	2000	0.0774
0.0658	2100	0.06
0.0689	2200	0.0543
0.0720	2300	0.0611
0.0752	2400	0.0697
0.0783	2500	0.0703
0.0814	2600	0.058
0.0846	2700	0.075
0.0877	2800	0.062
0.0908	2900	0.0756
0.0940	3000	0.0668
0.0971	3100	0.054
0.1002	3200	0.0626
0.1034	3300	0.0645
0.1065	3400	0.0714
0.1096	3500	0.0644
0.1128	3600	0.0693
0.1159	3700	0.0734
0.1190	3800	0.0622
0.1222	3900	0.0741
0.1253	4000	0.0761
0.1284	4100	0.0582
0.1316	4200	0.0804
0.1347	4300	0.0708
0.1378	4400	0.0734
0.1410	4500	0.0709
0.1441	4600	0.0759
0.1472	4700	0.085
0.1504	4800	0.0573
0.1535	4900	0.056
0.1566	5000	0.0601
0.1597	5100	0.0596
0.1629	5200	0.079
0.1660	5300	0.0679
0.1691	5400	0.0553
0.1723	5500	0.0677
0.1754	5600	0.0795
0.1785	5700	0.0779
0.1817	5800	0.0599
0.1848	5900	0.0667
0.1879	6000	0.064
0.1911	6100	0.0637
0.1942	6200	0.0747
0.1973	6300	0.0829
0.2005	6400	0.0589
0.2036	6500	0.0623
0.2067	6600	0.0589
0.2099	6700	0.0648
0.2130	6800	0.0527
0.2161	6900	0.0519
0.2193	7000	0.0668
0.2224	7100	0.0729
0.2255	7200	0.0627
0.2287	7300	0.0539
0.2318	7400	0.055
0.2349	7500	0.0663
0.2381	7600	0.0589
0.2412	7700	0.0555
0.2443	7800	0.0875
0.2475	7900	0.055
0.2506	8000	0.0584
0.2537	8100	0.0607
0.2569	8200	0.0551
0.2600	8300	0.0527
0.2631	8400	0.0773
0.2662	8500	0.0696
0.2694	8600	0.062
0.2725	8700	0.0716
0.2756	8800	0.06
0.2788	8900	0.0536
0.2819	9000	0.0604
0.2850	9100	0.0563
0.2882	9200	0.0734
0.2913	9300	0.0714
0.2944	9400	0.0658
0.2976	9500	0.0623
0.3007	9600	0.0713
0.3038	9700	0.0674
0.3070	9800	0.0708
0.3101	9900	0.0579
0.3132	10000	0.0616
0.3164	10100	0.0653
0.3195	10200	0.0614
0.3226	10300	0.0626
0.3258	10400	0.0611
0.3289	10500	0.0521
0.3320	10600	0.056
0.3352	10700	0.0761
0.3383	10800	0.0629
0.3414	10900	0.0658
0.3446	11000	0.0576
0.3477	11100	0.0483
0.3508	11200	0.0654
0.3540	11300	0.0602
0.3571	11400	0.065
0.3602	11500	0.0787
0.3634	11600	0.0634
0.3665	11700	0.0678
0.3696	11800	0.0758
0.3727	11900	0.0637
0.3759	12000	0.0577
0.3790	12100	0.0572
0.3821	12200	0.0614
0.3853	12300	0.0685
0.3884	12400	0.0641
0.3915	12500	0.0583
0.3947	12600	0.0502
0.3978	12700	0.0481
0.4009	12800	0.0546
0.4041	12900	0.0664
0.4072	13000	0.0699
0.4103	13100	0.0513
0.4135	13200	0.0423
0.4166	13300	0.0554
0.4197	13400	0.0592
0.4229	13500	0.0457
0.4260	13600	0.0612
0.4291	13700	0.0507
0.4323	13800	0.0592
0.4354	13900	0.0566
0.4385	14000	0.0806
0.4417	14100	0.0648
0.4448	14200	0.0535
0.4479	14300	0.0748
0.4511	14400	0.0488
0.4542	14500	0.0539
0.4573	14600	0.0597
0.4605	14700	0.065
0.4636	14800	0.0594
0.4667	14900	0.05
0.4699	15000	0.0488
0.4730	15100	0.0537
0.4761	15200	0.0396
0.4792	15300	0.0616
0.4824	15400	0.0605
0.4855	15500	0.0599
0.4886	15600	0.0616
0.4918	15700	0.0731
0.4949	15800	0.0654
0.4980	15900	0.0463
0.5012	16000	0.0463
0.5043	16100	0.0594
0.5074	16200	0.0575
0.5106	16300	0.056
0.5137	16400	0.0542
0.5168	16500	0.052
0.5200	16600	0.0438
0.5231	16700	0.0675
0.5262	16800	0.0619
0.5294	16900	0.0515
0.5325	17000	0.0575
0.5356	17100	0.0568
0.5388	17200	0.0508
0.5419	17300	0.059
0.5450	17400	0.0505
0.5482	17500	0.0582
0.5513	17600	0.0574
0.5544	17700	0.0613
0.5576	17800	0.048
0.5607	17900	0.0553
0.5638	18000	0.0571
0.5670	18100	0.0543
0.5701	18200	0.0484
0.5732	18300	0.0763
0.5764	18400	0.056
0.5795	18500	0.0533
0.5826	18600	0.044
0.5857	18700	0.0515
0.5889	18800	0.0516
0.5920	18900	0.0586
0.5951	19000	0.0523
0.5983	19100	0.0733
0.6014	19200	0.0453
0.6045	19300	0.0663
0.6077	19400	0.0381
0.6108	19500	0.0568
0.6139	19600	0.0492
0.6171	19700	0.0489
0.6202	19800	0.0575
0.6233	19900	0.0642
0.6265	20000	0.0535
0.6296	20100	0.0598
0.6327	20200	0.0569
0.6359	20300	0.0513
0.6390	20400	0.0515
0.6421	20500	0.053
0.6453	20600	0.0569
0.6484	20700	0.0372
0.6515	20800	0.0464
0.6547	20900	0.0522
0.6578	21000	0.0427
0.6609	21100	0.0584
0.6641	21200	0.0616
0.6672	21300	0.0552
0.6703	21400	0.0509
0.6735	21500	0.0439
0.6766	21600	0.0762
0.6797	21700	0.0539
0.6829	21800	0.0475
0.6860	21900	0.0557
0.6891	22000	0.0421
0.6922	22100	0.0471
0.6954	22200	0.0398
0.6985	22300	0.0521
0.7016	22400	0.0472
0.7048	22500	0.0579
0.7079	22600	0.0539
0.7110	22700	0.0527
0.7142	22800	0.0677
0.7173	22900	0.0509
0.7204	23000	0.0478
0.7236	23100	0.0593
0.7267	23200	0.0419
0.7298	23300	0.0576
0.7330	23400	0.0485
0.7361	23500	0.0544
0.7392	23600	0.0537
0.7424	23700	0.0481
0.7455	23800	0.0597
0.7486	23900	0.0464
0.7518	24000	0.0537
0.7549	24100	0.0508
0.7580	24200	0.045
0.7612	24300	0.0337
0.7643	24400	0.0478
0.7674	24500	0.0495
0.7706	24600	0.0427
0.7737	24700	0.0596
0.7768	24800	0.0468
0.7800	24900	0.0404
0.7831	25000	0.0467
0.7862	25100	0.0514
0.7894	25200	0.0462
0.7925	25300	0.0401
0.7956	25400	0.0539
0.7987	25500	0.0541
0.8019	25600	0.0639
0.8050	25700	0.0392
0.8081	25800	0.0466
0.8113	25900	0.0543
0.8144	26000	0.0507
0.8175	26100	0.0465
0.8207	26200	0.0386
0.8238	26300	0.0606
0.8269	26400	0.0558
0.8301	26500	0.0488
0.8332	26600	0.0556
0.8363	26700	0.047
0.8395	26800	0.0548
0.8426	26900	0.0423
0.8457	27000	0.0529
0.8489	27100	0.0513
0.8520	27200	0.0432
0.8551	27300	0.0605
0.8583	27400	0.0448
0.8614	27500	0.0508
0.8645	27600	0.0578
0.8677	27700	0.0409
0.8708	27800	0.0487
0.8739	27900	0.058
0.8771	28000	0.0461
0.8802	28100	0.0389
0.8833	28200	0.0427
0.8865	28300	0.0473
0.8896	28400	0.061
0.8927	28500	0.0423
0.8958	28600	0.0435
0.8990	28700	0.0389
0.9021	28800	0.0466
0.9052	28900	0.042
0.9084	29000	0.0466
0.9115	29100	0.0412
0.9146	29200	0.0444
0.9178	29300	0.059
0.9209	29400	0.0466
0.9240	29500	0.0381
0.9272	29600	0.0408
0.9303	29700	0.0557
0.9334	29800	0.0567
0.9366	29900	0.0537
0.9397	30000	0.041
0.9428	30100	0.0383
0.9460	30200	0.0412
0.9491	30300	0.0489
0.9522	30400	0.046
0.9554	30500	0.0525
0.9585	30600	0.0493
0.9616	30700	0.0485
0.9648	30800	0.0532
0.9679	30900	0.0446
0.9710	31000	0.0372
0.9742	31100	0.0472
0.9773	31200	0.0399
0.9804	31300	0.0402
0.9836	31400	0.0372
0.9867	31500	0.0497
0.9898	31600	0.0432
0.9930	31700	0.0382
0.9961	31800	0.0475
0.9992	31900	0.0367

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.1
Transformers: 4.47.0
PyTorch: 2.5.1+cu121
Accelerate: 1.2.1
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: -

Safetensors

Model size

0.5B params

Tensor type

F32

Model tree for codersan/FaLabse_Mizan4

Base model

sentence-transformers/LaBSE

Finetuned

codersan/FaLabse

Finetuned

(3)

this model