|
|
--- |
|
|
tags: |
|
|
- ontology-embedding |
|
|
- hyperbolic-space |
|
|
- hierarchical-reasoning |
|
|
- biomedical-ontology |
|
|
- generated_from_trainer |
|
|
- dataset_size:150000 |
|
|
- loss:HierarchyTransformerLoss |
|
|
base_model: sentence-transformers/all-mpnet-base-v2 |
|
|
widget: |
|
|
- source_sentence: cellular response to stimulus |
|
|
sentences: |
|
|
- response to stimulus |
|
|
- medial transverse frontopolar gyrus |
|
|
- biological regulation |
|
|
- source_sentence: regulation of cell differentiation involved in embryonic placenta |
|
|
development |
|
|
sentences: |
|
|
- thoracic wall |
|
|
- ectoderm-derived structure |
|
|
- regulation of cell differentiation |
|
|
- source_sentence: regulation of hippocampal neuron apoptotic process |
|
|
sentences: |
|
|
- external genitalia morphogenesis |
|
|
- compact layer of ventricle |
|
|
- biological regulation |
|
|
- source_sentence: transitional myocyte of internodal tract |
|
|
sentences: |
|
|
- secretory epithelial cell |
|
|
- internodal tract myocyte |
|
|
- insect haltere disc |
|
|
- source_sentence: alveolar atrium |
|
|
sentences: |
|
|
- organ part |
|
|
- superior recess of lesser sac |
|
|
- foramen of skull |
|
|
pipeline_tag: sentence-similarity |
|
|
library_name: sentence-transformers |
|
|
--- |
|
|
|
|
|
# OnT: Language Models as Ontology Encoders |
|
|
|
|
|
This is an OnT (Ontology Transformer) model trained on the GO dataset, based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). OnT is a language model-based framework for ontology embeddings, enabling effective representation of concepts as points in hyperbolic space and axioms as hierarchical relationships between concepts. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
- **Model Type:** Ontology Transformer (OnT) |
|
|
- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) |
|
|
- **Training Dataset:** GO |
|
|
- **Maximum Sequence Length:** 384 tokens |
|
|
- **Output Dimensionality:** 768 dimensions |
|
|
- **Embedding Space:** Hyperbolic Space |
|
|
- **Key Features:** |
|
|
- Hyperbolic embeddings for ontology concept encoding |
|
|
- Modeling of hierarchical relationships between concepts |
|
|
- Support for role embeddings as rotations over hyperbolic spaces |
|
|
- Concept rotation, transition, and existential quantifier representation |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Repository:** [OnT on GitHub](https://github.com/HuiYang1997/OnT) |
|
|
- **Paper:** [Language Models as Ontology Encoders](https://arxiv.org/abs/2507.14334) |
|
|
|
|
|
### Available Versions |
|
|
|
|
|
This model is available in **4 versions** (Git branches) to suit different use cases: |
|
|
|
|
|
| Branch | Training Type | Role Embedding | Use Case | |
|
|
|--------|------------|----------------|----------| |
|
|
| **`main`** (default) | Prediction Dataset | ✅ With role embedding | Default version: training on prediction dataset, support role embedding | |
|
|
| **`role-free`** | Prediction Dataset | ❌ Without role embedding | Training on prediction dataset, without role embedding | |
|
|
| **`inference-default`** | Inference Dataset | ✅ With role embedding | Training on inference dataset, with role support | |
|
|
| **`inference-role-free`** | Inference Dataset | ❌ Without role embedding | Training on inference dataset, without role embeddings | |
|
|
|
|
|
**How to use different versions:** |
|
|
|
|
|
```python |
|
|
from OnT import OntologyTransformer |
|
|
|
|
|
# Default version (main branch - OnTr with role embedding) |
|
|
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go") |
|
|
|
|
|
# Role-free version (without role embedding) |
|
|
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="role-free") |
|
|
|
|
|
# Inference version with role embedding |
|
|
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-default") |
|
|
|
|
|
# Inference version without role embedding |
|
|
ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-role-free") |
|
|
``` |
|
|
|
|
|
### Full Model Architecture |
|
|
|
|
|
``` |
|
|
OntologyTransformer( |
|
|
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: BertModel |
|
|
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
|
) |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
First, install the required dependencies: |
|
|
|
|
|
```bash |
|
|
pip install sentence-transformers==3.4.0.dev0 |
|
|
``` |
|
|
|
|
|
You also need to install [HierarchyTransformers](https://github.com/KRR-Oxford/HierarchyTransformers) following the instructions in their repository. |
|
|
|
|
|
### Direct Usage |
|
|
|
|
|
Load the model and use it for ontology concept encoding: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from OnT import OntologyTransformer |
|
|
|
|
|
# Load the OnT model |
|
|
path = "Hui97/OnT-MPNet-go" |
|
|
ont = OntologyTransformer.from_pretrained(path) |
|
|
|
|
|
# Entity names to be encoded |
|
|
entity_names = [ |
|
|
'alveolar atrium', |
|
|
'organ part', |
|
|
'superior recess of lesser sac', |
|
|
] |
|
|
|
|
|
# Get the entity embeddings in hyperbolic space |
|
|
entity_embeddings = ont.encode_concept(entity_names) |
|
|
print(entity_embeddings.shape) |
|
|
# [3, 768] |
|
|
|
|
|
# Role sentences to be encoded |
|
|
role_sentences = [ |
|
|
"application attribute", |
|
|
"attribute", |
|
|
"chemical modifier" |
|
|
] |
|
|
|
|
|
# Get the role embeddings (rotations and scalings) |
|
|
role_rotations, role_scalings = ont.encode_roles(role_sentences) |
|
|
``` |
|
|
|
|
|
<!-- |
|
|
### Direct Usage (Transformers) |
|
|
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
|
|
</details> |
|
|
--> |
|
|
|
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
### BibTeX |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{yang2025language, |
|
|
title={Language Models as Ontology Encoders}, |
|
|
author={Yang, Hui and Chen, Jiaoyan and He, Yuan and Gao, Yongsheng and Horrocks, Ian}, |
|
|
journal={arXiv preprint arXiv:2507.14334}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|