SentenceTransformer based on sentence-transformers/all-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-mpnet-base-v2
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'specifically, the proposed regulations proposed the addition of new sec. 300.13 to the user fee regulations to establish a $67 user fee for issuing an estate tax closing letter for an estate.',
    'additionally, the preamble to the proposed regulations explains the special benefits conferred by the issuance of estate tax closing letters and analyzes how the irs has computed that the full cost of issuing an estate tax closing letter is $67.',
    'with respect to whether and how the partnership allocates the rehabilitation credit to partners, the comment specifically asked ``whether the partners are allocated 20 percent of the credit each year although all of the credit basis is reduced in the first year when the property is placed in service or whether, after the first year, the remaining four years over which the credit is spread is taken into account and applied solely at the partner level over those remaining years, consistent with the section 1.50-1 regulations.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 24,004 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 6 tokens
    • mean: 56.37 tokens
    • max: 384 tokens
    • min: 8 tokens
    • mean: 59.22 tokens
    • max: 384 tokens
  • Samples:
    sentence_0 sentence_1
    in such cases, the staff will accept the use of the simplified method for only some but not all share option grants. if a company uses this simplified method, the company should disclose in the notes to its financial statements the use of the method, the reason why the method was used, the types of share option grants for which the method was used if the method was not used for all share option grants, and the periods for which the method was used if the method was not used in all periods.
    background subject to various exceptions, section 6033(a)(1) of the internal revenue code (code) requires every organization exempt from taxation under section 501(a) (tax-exempt organization) to file an annual return, stating specifically the items of gross income, receipts, and disbursements, and such other information for the purpose of carrying out the internal revenue laws as the secretary of the treasury or his delegate (secretary) may by forms or regulations prescribe, and keep such records, render under oath such statements, make such other returns, and comply with such rules and regulations as the secretary may from time to time prescribe. the annual information returns required under section 6033 are forms 990, return of organization exempt from income tax;'' 990-ez, short form return of organization exempt from income tax;'' 990-pf, return of private foundation;'' and 990-bl, information and initial excise tax return for black lung benefit trusts and certain related persons.'' annual returns filed by tax- exempt organizations, section 527 organizations, nonexempt private foundations described in section 6033(d), and section 4947(a)(1) trusts (which are both treated as organizations described in section 501(c)(3) for this purpose) are information returns intended to help ensure that the filing organizations comply with applicable federal tax laws.
    interpretive response: no. before becoming a public entity, company a did not use the fair-value-based method for either its share options or its liability awards. \12\ \12\ this view is consistent with the fasb's basis for rejecting full retrospective application of fasb asc topic 718 as described in the basis for conclusions of statement 123r, paragraph b251.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.0833 500 0.5984
0.1666 1000 0.419
0.2500 1500 0.3454
0.3333 2000 0.3111
0.4166 2500 0.2628
0.4999 3000 0.2747
0.5832 3500 0.2567
0.6666 4000 0.2184
0.7499 4500 0.1802
0.8332 5000 0.1796
0.9165 5500 0.174
0.9998 6000 0.1742
1.0832 6500 0.1043
1.1665 7000 0.1011
1.2498 7500 0.1193
1.3331 8000 0.1167
1.4164 8500 0.1037
1.4998 9000 0.1097
1.5831 9500 0.1018
1.6664 10000 0.1017
1.7497 10500 0.1028
1.8330 11000 0.0854
1.9163 11500 0.088
1.9997 12000 0.1027
2.0830 12500 0.0778
2.1663 13000 0.0645
2.2496 13500 0.0503
2.3329 14000 0.0822
2.4163 14500 0.0616
2.4996 15000 0.0688
2.5829 15500 0.0543
2.6662 16000 0.0678
2.7495 16500 0.0565
2.8329 17000 0.0683
2.9162 17500 0.0412
2.9995 18000 0.0726

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
63
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hamzax001/sentence_seg

Finetuned
(305)
this model