OnT-MPNet-go / README.md
Hui97's picture
Upload folder using huggingface_hub
f90701e verified
---
tags:
- ontology-embedding
- hyperbolic-space
- hierarchical-reasoning
- biomedical-ontology
- generated_from_trainer
- dataset_size:150000
- loss:HierarchyTransformerLoss
base_model: sentence-transformers/all-mpnet-base-v2
widget:
- source_sentence: cellular response to stimulus
sentences:
- response to stimulus
- medial transverse frontopolar gyrus
- biological regulation
- source_sentence: regulation of cell differentiation involved in embryonic placenta
development
sentences:
- thoracic wall
- ectoderm-derived structure
- regulation of cell differentiation
- source_sentence: regulation of hippocampal neuron apoptotic process
sentences:
- external genitalia morphogenesis
- compact layer of ventricle
- biological regulation
- source_sentence: transitional myocyte of internodal tract
sentences:
- secretory epithelial cell
- internodal tract myocyte
- insect haltere disc
- source_sentence: alveolar atrium
sentences:
- organ part
- superior recess of lesser sac
- foramen of skull
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# OnT: Language Models as Ontology Encoders
This is an OnT (Ontology Transformer) model trained on the GO dataset, based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). OnT is a language model-based framework for ontology embeddings, enabling effective representation of concepts as points in hyperbolic space and axioms as hierarchical relationships between concepts.
## Model Details
### Model Description
- **Model Type:** Ontology Transformer (OnT)
- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
- **Training Dataset:** GO
- **Maximum Sequence Length:** 384 tokens
- **Output Dimensionality:** 768 dimensions
- **Embedding Space:** Hyperbolic Space
- **Key Features:**
- Hyperbolic embeddings for ontology concept encoding
- Modeling of hierarchical relationships between concepts
- Support for role embeddings as rotations over hyperbolic spaces
- Concept rotation, transition, and existential quantifier representation
### Model Sources
- **Repository:** [OnT on GitHub](https://github.com/HuiYang1997/OnT)
- **Paper:** [Language Models as Ontology Encoders](https://arxiv.org/abs/2507.14334)
### Available Versions
This model is available in **4 versions** (Git branches) to suit different use cases:
| Branch | Training Type | Role Embedding | Use Case |
|--------|------------|----------------|----------|
| **`main`** (default) | Prediction Dataset | ✅ With role embedding | Default version: training on prediction dataset, support role embedding |
| **`role-free`** | Prediction Dataset | ❌ Without role embedding | Training on prediction dataset, without role embedding |
| **`inference-default`** | Inference Dataset | ✅ With role embedding | Training on inference dataset, with role support |
| **`inference-role-free`** | Inference Dataset | ❌ Without role embedding | Training on inference dataset, without role embeddings |
**How to use different versions:**
```python
from OnT import OntologyTransformer
# Default version (main branch - OnTr with role embedding)
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go")
# Role-free version (without role embedding)
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="role-free")
# Inference version with role embedding
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-default")
# Inference version without role embedding
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-role-free")
```
### Full Model Architecture
```
OntologyTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Installation
First, install the required dependencies:
```bash
pip install sentence-transformers==3.4.0.dev0
```
You also need to install [HierarchyTransformers](https://github.com/KRR-Oxford/HierarchyTransformers) following the instructions in their repository.
### Direct Usage
Load the model and use it for ontology concept encoding:
```python
import torch
from OnT import OntologyTransformer
# Load the OnT model
path = "Hui97/OnT-MPNet-go"
ont = OntologyTransformer.from_pretrained(path)
# Entity names to be encoded
entity_names = [
'alveolar atrium',
'organ part',
'superior recess of lesser sac',
]
# Get the entity embeddings in hyperbolic space
entity_embeddings = ont.encode_concept(entity_names)
print(entity_embeddings.shape)
# [3, 768]
# Role sentences to be encoded
role_sentences = [
"application attribute",
"attribute",
"chemical modifier"
]
# Get the role embeddings (rotations and scalings)
role_rotations, role_scalings = ont.encode_roles(role_sentences)
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
## Citation
### BibTeX
If you use this model, please cite:
```bibtex
@article{yang2025language,
title={Language Models as Ontology Encoders},
author={Yang, Hui and Chen, Jiaoyan and He, Yuan and Gao, Yongsheng and Horrocks, Ian},
journal={arXiv preprint arXiv:2507.14334},
year={2025}
}
```