SentenceTransformer based on izdastolga/checkpoint-18-Aixr-Turkish-QA
This is a sentence-transformers model finetuned from izdastolga/checkpoint-18-Aixr-Turkish-QA on the aixr-msmarco-rag-wikirag-gpt-soru_cevap-tur_hist_quad dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: izdastolga/checkpoint-18-Aixr-Turkish-QA
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tizdas/checkpoint-1925-aixr-msmarco-rag-wikirag-gpt-soru_cevap-tur_hist_quad-0.8030")
# Run inference
sentences = [
'Instruct: Given a Turkish search query, retrieve relevant passages written in Turkish that best answer the query.\nQuery: kiribati vanuatu nerede bulunur',
"Kiribati'nin en doğudaki adaları, Hawaii'nin güneyindeki güney Line Adaları, Dünya'daki en gelişmiş zamana sahiptir: UTC + 14 saat. Kiribati 1979'da Birleşik Krallık'tan bağımsız oldu. Başkent ve şimdi en kalabalık bölge olan Güney Tarawa, bir dizi geçitle bağlı olan bir dizi adacıktan oluşur.",
'Memlük Sultanı Mansur Seyfettin Kalaunun',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
aixr-msmarco-rag-wikirag-gpt-soru_cevap-tur_hist_quad
- Dataset: aixr-msmarco-rag-wikirag-gpt-soru_cevap-tur_hist_quad at 5e80fa0
- Size: 415,046 training samples
- Columns:
anchorandpositive - Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 36 tokens
- mean: 44.21 tokens
- max: 131 tokens
- min: 3 tokens
- mean: 97.81 tokens
- max: 512 tokens
- Samples:
anchor positive Instruct: Given a Turkish search query, retrieve relevant passages written in Turkish that best answer the query.
Query: Bir balıkta kişi başına ne kadar balık kızartılırKişi başına yaklaşık 4 ons veya 113.4 gram çiğ balık kızartması gerekir. 0.25 pound'a eşdeğerdir.Instruct: Given a Turkish search query, retrieve relevant passages written in Turkish that best answer the query.
Query: İnsanın dış görünüşüne bakarak asla hüküm vermemeliyiz. Buna katılıyor musun, neden?Dış görünüş tek başına bir şey ifade etmez ama fikir verir. Fakat dış görünüş çok yanıltıcı da olabilmektedir. Bu nedenle her zaman dış görünüşten ziyade insanların karakterlerine odaklanmak önemlidir.Instruct: Given a Turkish search query, retrieve relevant passages written in Turkish that best answer the query.
Query: Kürtçe nerede konuşulurKürtçe, Türkiye, Kürdistan, İran, Suriye, Irak, İran, Ermenistan, Gürcistan ve Azerbaycan'ı kapsayan ve Avrupa ve Amerika Birleşik Devletleri'ne yayılmış büyük bir diasporaya sahip büyük bir bölgede konuşulan yakın ilişkili dillerin sürekliliğinden oluşan bir makro dildir. Status. - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 50, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 144per_device_eval_batch_size: 144learning_rate: 1e-06weight_decay: 0.01num_train_epochs: 1warmup_ratio: 0.1fp16: Truetf32: Truedataloader_num_workers: 4prompts: {'anchor': 'Instruct: Given a Turkish search query, retrieve relevant passages written in Turkish that best answer the query.\nQuery: '}
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 144per_device_eval_batch_size: 144per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-06weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Truelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 4dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: {'anchor': 'Instruct: Given a Turkish search query, retrieve relevant passages written in Turkish that best answer the query.\nQuery: '}batch_sampler: batch_samplermulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.0997 | 275 | 0.33 |
| 0.1995 | 550 | 0.0964 |
| 0.2992 | 825 | 0.0862 |
| 0.3990 | 1100 | 0.081 |
| 0.4987 | 1375 | 0.0764 |
| 0.5985 | 1650 | 0.0745 |
| 0.6982 | 1925 | 0.0726 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.49.0
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 2.21.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- -
Model tree for tizdas/checkpoint-1925-aixr-msmarco-rag-wikirag-gpt-soru_cevap-tur_hist_quad-0.8030
Base model
izdastolga/checkpoint-18-Aixr-Turkish-QA