OnT-MPNet-go / README.md

Upload folder using huggingface_hub

f90701e verified about 1 month ago

5.66 kB

	---
	tags:
	- ontology-embedding
	- hyperbolic-space
	- hierarchical-reasoning
	- biomedical-ontology
	- generated_from_trainer
	- dataset_size:150000
	- loss:HierarchyTransformerLoss
	base_model: sentence-transformers/all-mpnet-base-v2
	widget:
	- source_sentence: cellular response to stimulus
	sentences:
	- response to stimulus
	- medial transverse frontopolar gyrus
	- biological regulation
	- source_sentence: regulation of cell differentiation involved in embryonic placenta
	development
	sentences:
	- thoracic wall
	- ectoderm-derived structure
	- regulation of cell differentiation
	- source_sentence: regulation of hippocampal neuron apoptotic process
	sentences:
	- external genitalia morphogenesis
	- compact layer of ventricle
	- biological regulation
	- source_sentence: transitional myocyte of internodal tract
	sentences:
	- secretory epithelial cell
	- internodal tract myocyte
	- insect haltere disc
	- source_sentence: alveolar atrium
	sentences:
	- organ part
	- superior recess of lesser sac
	- foramen of skull
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	---

	# OnT: Language Models as Ontology Encoders

	This is an OnT (Ontology Transformer) model trained on the GO dataset, based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). OnT is a language model-based framework for ontology embeddings, enabling effective representation of concepts as points in hyperbolic space and axioms as hierarchical relationships between concepts.

	## Model Details

	### Model Description
	- Model Type: Ontology Transformer (OnT)
	- Base model: [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
	- Training Dataset: GO
	- Maximum Sequence Length: 384 tokens
	- Output Dimensionality: 768 dimensions
	- Embedding Space: Hyperbolic Space
	- Key Features:
	- Hyperbolic embeddings for ontology concept encoding
	- Modeling of hierarchical relationships between concepts
	- Support for role embeddings as rotations over hyperbolic spaces
	- Concept rotation, transition, and existential quantifier representation

	### Model Sources

	- Repository: [OnT on GitHub](https://github.com/HuiYang1997/OnT)
	- Paper: [Language Models as Ontology Encoders](https://arxiv.org/abs/2507.14334)

	### Available Versions

	This model is available in 4 versions (Git branches) to suit different use cases:

	\| Branch \| Training Type \| Role Embedding \| Use Case \|
	\|--------\|------------\|----------------\|----------\|
	\| `main` (default) \| Prediction Dataset \| ✅ With role embedding \| Default version: training on prediction dataset, support role embedding \|
	\| `role-free` \| Prediction Dataset \| ❌ Without role embedding \| Training on prediction dataset, without role embedding \|
	\| `inference-default` \| Inference Dataset \| ✅ With role embedding \| Training on inference dataset, with role support \|
	\| `inference-role-free` \| Inference Dataset \| ❌ Without role embedding \| Training on inference dataset, without role embeddings \|

	How to use different versions:

	```python
	from OnT import OntologyTransformer

	# Default version (main branch - OnTr with role embedding)
	ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go")

	# Role-free version (without role embedding)
	ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="role-free")

	# Inference version with role embedding
	ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-default")

	# Inference version without role embedding
	ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-role-free")
	```

	### Full Model Architecture

	```
	OntologyTransformer(
	(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	)
	```

	## Usage

	### Installation

	First, install the required dependencies:

	```bash
	pip install sentence-transformers==3.4.0.dev0
	```

	You also need to install [HierarchyTransformers](https://github.com/KRR-Oxford/HierarchyTransformers) following the instructions in their repository.

	### Direct Usage

	Load the model and use it for ontology concept encoding:

	```python
	import torch
	from OnT import OntologyTransformer

	# Load the OnT model
	path = "Hui97/OnT-MPNet-go"
	ont = OntologyTransformer.from_pretrained(path)

	# Entity names to be encoded
	entity_names = [
	'alveolar atrium',
	'organ part',
	'superior recess of lesser sac',
	]

	# Get the entity embeddings in hyperbolic space
	entity_embeddings = ont.encode_concept(entity_names)
	print(entity_embeddings.shape)
	# [3, 768]

	# Role sentences to be encoded
	role_sentences = [
	"application attribute",
	"attribute",
	"chemical modifier"
	]

	# Get the role embeddings (rotations and scalings)
	role_rotations, role_scalings = ont.encode_roles(role_sentences)
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->



	## Citation

	### BibTeX

	If you use this model, please cite:

	```bibtex
	@article{yang2025language,
	title={Language Models as Ontology Encoders},
	author={Yang, Hui and Chen, Jiaoyan and He, Yuan and Gao, Yongsheng and Horrocks, Ian},
	journal={arXiv preprint arXiv:2507.14334},
	year={2025}
	}
	```