Models for Latin Intertextuality Search
					Collection
				
Models  useful for discovering intertextual links between historical Latin authors.
					โข 
				2 items
				โข 
				Updated
					
				
This model is a fine-tuned version of PhilBerta for sequence classification of intertextual links between Jerome (Hieronymus) and other classical authors. This model is intended to integrate with the LociSimiles Python package for Latin intertextuality workflows: https://pypi.org/project/locisimiles/.
When using standard tokenization for a sequence-pair classification task, the final input sequence follows the encoder-style pattern with special tokens:
<s> Jerome_phrase </s></s> Candidate_phrase </s>
Here is a complete example:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("julian-schelb/PhilBerta-class-latin-intertext-v1")
model = AutoModelForSequenceClassification.from_pretrained("julian-schelb/PhilBerta-class-latin-intertext-v1")
# Define your sentence pair
sentence1 = "omnia fert aetas, animum quoque; saepe ego longos cantando puerum memini me condere soles."
sentence2 = "saepe ego longos cantando puerum memini me condere soles."
# Tokenize the sentence pair for the model
inputs = tokenizer(
    sentence1, # Hieronymus
    sentence2, # Classical author
    add_special_tokens=True,
    truncation=True,
    padding="max_length",
    return_tensors='pt'
)
# Run the model in evaluation mode (no gradient calculation)
with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    # probs[0][1] corresponds to the probability of "citation," if binary labels are 0="no citation", 1="citation"
    print("Prediction probabilities:", probs)
TBD
Base model
bowphs/PhilBerta