SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-mpnet-base-v2
- Maximum Sequence Length: 384 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the ๐ค Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'specifically, the proposed regulations proposed the addition of new sec. 300.13 to the user fee regulations to establish a $67 user fee for issuing an estate tax closing letter for an estate.',
'additionally, the preamble to the proposed regulations explains the special benefits conferred by the issuance of estate tax closing letters and analyzes how the irs has computed that the full cost of issuing an estate tax closing letter is $67.',
'with respect to whether and how the partnership allocates the rehabilitation credit to partners, the comment specifically asked ``whether the partners are allocated 20 percent of the credit each year although all of the credit basis is reduced in the first year when the property is placed in service or whether, after the first year, the remaining four years over which the credit is spread is taken into account and applied solely at the partner level over those remaining years, consistent with the section 1.50-1 regulations.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 24,004 training samples
- Columns:
sentence_0andsentence_1 - Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 6 tokens
- mean: 56.37 tokens
- max: 384 tokens
- min: 8 tokens
- mean: 59.22 tokens
- max: 384 tokens
- Samples:
sentence_0 sentence_1 in such cases, the staff will accept the use of the simplified method for only some but not all share option grants.if a company uses this simplified method, the company should disclose in the notes to its financial statements the use of the method, the reason why the method was used, the types of share option grants for which the method was used if the method was not used for all share option grants, and the periods for which the method was used if the method was not used in all periods.background subject to various exceptions, section 6033(a)(1) of the internal revenue code (code) requires every organization exempt from taxation under section 501(a) (tax-exempt organization) to file an annual return, stating specifically the items of gross income, receipts, and disbursements, and such other information for the purpose of carrying out the internal revenue laws as the secretary of the treasury or his delegate (secretary) may by forms or regulations prescribe, and keep such records, render under oath such statements, make such other returns, and comply with such rules and regulations as the secretary may from time to time prescribe.the annual information returns required under section 6033 are forms 990,return of organization exempt from income tax;'' 990-ez,short form return of organization exempt from income tax;'' 990-pf,return of private foundation;'' and 990-bl,information and initial excise tax return for black lung benefit trusts and certain related persons.'' annual returns filed by tax- exempt organizations, section 527 organizations, nonexempt private foundations described in section 6033(d), and section 4947(a)(1) trusts (which are both treated as organizations described in section 501(c)(3) for this purpose) are information returns intended to help ensure that the filing organizations comply with applicable federal tax laws.interpretive response: no. before becoming a public entity, company a did not use the fair-value-based method for either its share options or its liability awards.\12\ \12\ this view is consistent with the fasb's basis for rejecting full retrospective application of fasb asc topic 718 as described in the basis for conclusions of statement 123r, paragraph b251. - Loss:
MultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 4per_device_eval_batch_size: 4fp16: Truemulti_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin
Training Logs
| Epoch | Step | Training Loss |
|---|---|---|
| 0.0833 | 500 | 0.5984 |
| 0.1666 | 1000 | 0.419 |
| 0.2500 | 1500 | 0.3454 |
| 0.3333 | 2000 | 0.3111 |
| 0.4166 | 2500 | 0.2628 |
| 0.4999 | 3000 | 0.2747 |
| 0.5832 | 3500 | 0.2567 |
| 0.6666 | 4000 | 0.2184 |
| 0.7499 | 4500 | 0.1802 |
| 0.8332 | 5000 | 0.1796 |
| 0.9165 | 5500 | 0.174 |
| 0.9998 | 6000 | 0.1742 |
| 1.0832 | 6500 | 0.1043 |
| 1.1665 | 7000 | 0.1011 |
| 1.2498 | 7500 | 0.1193 |
| 1.3331 | 8000 | 0.1167 |
| 1.4164 | 8500 | 0.1037 |
| 1.4998 | 9000 | 0.1097 |
| 1.5831 | 9500 | 0.1018 |
| 1.6664 | 10000 | 0.1017 |
| 1.7497 | 10500 | 0.1028 |
| 1.8330 | 11000 | 0.0854 |
| 1.9163 | 11500 | 0.088 |
| 1.9997 | 12000 | 0.1027 |
| 2.0830 | 12500 | 0.0778 |
| 2.1663 | 13000 | 0.0645 |
| 2.2496 | 13500 | 0.0503 |
| 2.3329 | 14000 | 0.0822 |
| 2.4163 | 14500 | 0.0616 |
| 2.4996 | 15000 | 0.0688 |
| 2.5829 | 15500 | 0.0543 |
| 2.6662 | 16000 | 0.0678 |
| 2.7495 | 16500 | 0.0565 |
| 2.8329 | 17000 | 0.0683 |
| 2.9162 | 17500 | 0.0412 |
| 2.9995 | 18000 | 0.0726 |
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.8.1
- Datasets: 3.6.0
- Tokenizers: 0.21.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 63
Model tree for hamzax001/sentence_seg
Base model
sentence-transformers/all-mpnet-base-v2