SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zoharzaig/emoji-prediction-model")
# Run inference
sentences = [
    'Inspired by the history behind Norfolk Island’s flag.',
    "The flag of Norfolk Island emoji represents the unique flag of Norfolk Island, which is an external territory of Australia. It is used to symbolize the island's culture and identity.",
    'The gear emoji is commonly used to represent machinery, equipment, tools, or mechanics. It can also symbolize maintenance, repair, or work involving gears and mechanical parts.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.7065, -0.0235],
#         [ 0.7065,  1.0000, -0.0110],
#         [-0.0235, -0.0110,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 139,891 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 6 tokens
    • mean: 12.16 tokens
    • max: 29 tokens
    • min: 22 tokens
    • mean: 46.23 tokens
    • max: 89 tokens
  • Samples:
    sentence_0 sentence_1
    Lunch is scheduled for eleven today The eleven o’clock emoji is used to indicate the time of 11:00 on a clock. It can be used to show that it is late morning, or to signify that an event is happening at this specific time. It can also be used in a more figurative sense to represent the idea of being right on time for something.
    Just finished reading an inspiring article on trans rights. The transgender symbol emoji is often used to represent individuals who identify as transgender or non-binary
    I'm curious about the history behind Lesotho’s flag. The flag of Lesotho represents the country of Lesotho in southern Africa. It is a tricolor flag of horizontal stripes with a blue triangle on the left side. The colors symbolize different aspects of the country's history and culture.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0572 500 1.2611
0.1144 1000 1.0953
0.1715 1500 0.9964
0.2287 2000 0.9722
0.2859 2500 0.9712
0.3431 3000 0.918
0.4003 3500 0.9296
0.4575 4000 0.9069
0.5146 4500 0.9062
0.5718 5000 0.8788
0.6290 5500 0.895
0.6862 6000 0.8601
0.7434 6500 0.8461
0.8005 7000 0.8379
0.8577 7500 0.8209
0.9149 8000 0.8015
0.9721 8500 0.8103
1.0293 9000 0.7828
1.0865 9500 0.7064
1.1436 10000 0.6881
1.2008 10500 0.7004
1.2580 11000 0.7121
1.3152 11500 0.7222
1.3724 12000 0.7183
1.4296 12500 0.7024
1.4867 13000 0.7114
1.5439 13500 0.7115
1.6011 14000 0.6858
1.6583 14500 0.6944
1.7155 15000 0.6867
1.7726 15500 0.6776
1.8298 16000 0.7172
1.8870 16500 0.7086
1.9442 17000 0.6882
2.0014 17500 0.6788
2.0586 18000 0.5488
2.1157 18500 0.5428
2.1729 19000 0.5628
2.2301 19500 0.5524
2.2873 20000 0.5695
2.3445 20500 0.5708
2.4016 21000 0.5703
2.4588 21500 0.5512
2.5160 22000 0.5646
2.5732 22500 0.5753
2.6304 23000 0.5739
2.6876 23500 0.554
2.7447 24000 0.5744
2.8019 24500 0.5236
2.8591 25000 0.5471
2.9163 25500 0.5576
2.9735 26000 0.5601
3.0306 26500 0.5004
3.0878 27000 0.4471
3.1450 27500 0.4588
3.2022 28000 0.4439
3.2594 28500 0.4283
3.3166 29000 0.4452
3.3737 29500 0.4446
3.4309 30000 0.4413
3.4881 30500 0.4377
3.5453 31000 0.4504
3.6025 31500 0.4312
3.6597 32000 0.4397
3.7168 32500 0.4376
3.7740 33000 0.4596
3.8312 33500 0.4501
3.8884 34000 0.4338
3.9456 34500 0.4609
4.0027 35000 0.4476
4.0599 35500 0.3652
4.1171 36000 0.3506
4.1743 36500 0.3481
4.2315 37000 0.3805
4.2887 37500 0.3574
4.3458 38000 0.3622
4.4030 38500 0.3686
4.4602 39000 0.3572
4.5174 39500 0.3791
4.5746 40000 0.3736
4.6317 40500 0.3514
4.6889 41000 0.3682
4.7461 41500 0.3625
4.8033 42000 0.3601
4.8605 42500 0.3703
4.9177 43000 0.3783
4.9748 43500 0.3583

Framework Versions

  • Python: 3.9.6
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.2
  • PyTorch: 2.7.1
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zoharzaig/emoji-prediction-model

Finetuned
(305)
this model

Space using zoharzaig/emoji-prediction-model 1