Latin Intertextuality Classifier

This model is a fine-tuned version of PhilBerta for sequence classification of intertextual links between Jerome (Hieronymus) and other classical authors. This model is intended to integrate with the LociSimiles Python package for Latin intertextuality workflows: https://pypi.org/project/locisimiles/.

Model Description

Task: Binary classification for detecting intertextual links between classical Latin authors
Model type: Sequence Classification
Base model: PhilBerta
Max input tokens: 512
Language: Latin
License: Apache 2.0

Usage

When using standard tokenization for a sequence-pair classification task, the final input sequence follows the encoder-style pattern with special tokens:

<s> Jerome_phrase </s></s> Candidate_phrase </s>

Here is a complete example:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("julian-schelb/PhilBerta-class-latin-intertext-v1")
model = AutoModelForSequenceClassification.from_pretrained("julian-schelb/PhilBerta-class-latin-intertext-v1")

# Define your sentence pair
sentence1 = "omnia fert aetas, animum quoque; saepe ego longos cantando puerum memini me condere soles."
sentence2 = "saepe ego longos cantando puerum memini me condere soles."

# Tokenize the sentence pair for the model
inputs = tokenizer(
    sentence1, # Hieronymus
    sentence2, # Classical author
    add_special_tokens=True,
    truncation=True,
    padding="max_length",
    return_tensors='pt'
)

# Run the model in evaluation mode (no gradient calculation)
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    # probs[0][1] corresponds to the probability of "citation," if binary labels are 0="no citation", 1="citation"
    print("Prediction probabilities:", probs)

Citation

TBD

Downloads last month: 121

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for julian-schelb/PhilBerta-class-latin-intertext-v1

Base model

bowphs/PhilBerta

Finetuned

(1)

this model

Spaces using julian-schelb/PhilBerta-class-latin-intertext-v1 2

Collection including julian-schelb/PhilBerta-class-latin-intertext-v1

Models for Latin Intertextuality Search

Collection

Models useful for discovering intertextual links between historical Latin authors. • 2 items • Updated 26 days ago

Evaluation results

Test Accuracy on Latin Intertextuality Dataset
self-reported

0.996
Test Precision on Latin Intertextuality Dataset
self-reported

0.789
Test Recall on Latin Intertextuality Dataset
self-reported

0.918
Test F1 Score on Latin Intertextuality Dataset
self-reported

0.849

View on Papers With Code