|
|
--- |
|
|
license: mit |
|
|
base_model: microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned |
|
|
tags: |
|
|
- text-embeddings |
|
|
- sentence-transformers |
|
|
- llm2vec |
|
|
- medical |
|
|
- chest-xray |
|
|
- radiology |
|
|
- clinical-nlp |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: feature-extraction |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis |
|
|
|
|
|
LLM2Vec4CXR is a text encoder optimized for chest X-ray report analysis and medical text understanding. |
|
|
It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234). |
|
|
|
|
|
## Model Description |
|
|
|
|
|
LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy. |
|
|
This design enhances semantic representation of chest X-ray reports, making the model robust across different reporting styles and effective even with domain-specific abbreviations. |
|
|
It improves performance on clinical text similarity, retrieval, and interpretation tasks. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
- **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct |
|
|
- **Pooling Mode**: Latent Attention (trained weights automatically loaded) |
|
|
- **Bidirectional Processing**: Enabled for better context understanding |
|
|
- **Medical Domain**: Specialized for chest X-ray report analysis |
|
|
- **Max Length**: 512 tokens |
|
|
- **Precision**: bfloat16 |
|
|
- **Automatic Loading**: Latent attention weights are automatically loaded from safetensors |
|
|
- **Simple API**: Built-in methods for similarity computation and instruction-based encoding |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
- Fully fine-tuned on chest X-ray reports and medical text data |
|
|
- Training focused on understanding pleural effusion status and other chest X-ray findings |
|
|
|
|
|
### Training Configuration |
|
|
- **Pooling Mode**: `latent_attention` (modified from base model) |
|
|
- **Enable Bidirectional**: True |
|
|
- **Max Length**: 512 |
|
|
- **Torch Dtype**: bfloat16 |
|
|
- **Full Fine-tuning**: All model weights were updated during training |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
# Only transformers is needed! |
|
|
pip install transformers torch |
|
|
``` |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModel |
|
|
|
|
|
# Load the model - that's it! |
|
|
model = AutoModel.from_pretrained( |
|
|
"lukeingawesome/llm2vec4cxr", |
|
|
trust_remote_code=True, |
|
|
torch_dtype=torch.bfloat16 |
|
|
).to("cuda" if torch.cuda.is_available() else "cpu").eval() |
|
|
|
|
|
# Simple text encoding |
|
|
report = "Small left pleural effusion with basal atelectasis." |
|
|
embedding = model.encode_text([report]) |
|
|
print(embedding.shape) # torch.Size([1, 2048]) |
|
|
|
|
|
# Multiple texts at once |
|
|
reports = [ |
|
|
"No acute cardiopulmonary abnormality.", |
|
|
"Small bilateral pleural effusions.", |
|
|
"Large left pleural effusion with compressive atelectasis." |
|
|
] |
|
|
embeddings = model.encode_text(reports) |
|
|
print(embeddings.shape) # torch.Size([3, 2048]) |
|
|
``` |
|
|
|
|
|
### Instruction-Based Encoding and Similarity |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModel |
|
|
|
|
|
# Load model |
|
|
model = AutoModel.from_pretrained( |
|
|
"lukeingawesome/llm2vec4cxr", |
|
|
trust_remote_code=True, |
|
|
torch_dtype=torch.bfloat16 |
|
|
).to("cuda" if torch.cuda.is_available() else "cpu").eval() |
|
|
|
|
|
# Instruction-based task with separator |
|
|
instruction = "Determine the status of the pleural effusion." |
|
|
report = "There is a small increase in the left-sided effusion." |
|
|
query = instruction + "!@#$%^&*()" + report |
|
|
|
|
|
# Compare against multiple candidates |
|
|
candidates = [ |
|
|
"No pleural effusion", |
|
|
"Pleural effusion present", |
|
|
"Worsening pleural effusion", |
|
|
"Improving pleural effusion" |
|
|
] |
|
|
|
|
|
# One-line similarity computation |
|
|
scores = model.compute_similarities(query, candidates) |
|
|
print(scores) |
|
|
# tensor([0.7171, 0.8270, 0.9155, 0.8113], device='cuda:0') |
|
|
|
|
|
best_match = candidates[torch.argmax(scores)] |
|
|
print(f"Best match: {best_match}") |
|
|
# Best match: Worsening pleural effusion |
|
|
``` |
|
|
|
|
|
### Medical Report Retrieval Example |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModel |
|
|
|
|
|
# Load model |
|
|
model = AutoModel.from_pretrained( |
|
|
"lukeingawesome/llm2vec4cxr", |
|
|
trust_remote_code=True, |
|
|
torch_dtype=torch.bfloat16 |
|
|
).to("cuda" if torch.cuda.is_available() else "cpu").eval() |
|
|
|
|
|
# Instruction for retrieval |
|
|
instruction = "Retrieve semantically similar reports" |
|
|
query_report = "Small left pleural effusion with basal atelectasis." |
|
|
query = instruction + "!@#$%^&*()" + query_report |
|
|
|
|
|
# Candidate reports |
|
|
candidates = [ |
|
|
"No acute cardiopulmonary abnormality.", |
|
|
"Small left pleural effusion is present.", |
|
|
"Large right pleural effusion causing compressive atelectasis.", |
|
|
"Heart size is normal with no evidence of pleural effusion.", |
|
|
] |
|
|
|
|
|
# Compute similarities |
|
|
scores = model.compute_similarities(query, candidates) |
|
|
|
|
|
# Get most similar |
|
|
best_idx = torch.argmax(scores) |
|
|
print(f"Most similar: {candidates[best_idx]}") |
|
|
print(f"Score: {scores[best_idx]:.4f}") |
|
|
``` |
|
|
|
|
|
## API Reference |
|
|
|
|
|
The model provides three main methods: |
|
|
|
|
|
### `encode_text(texts, max_length=512)` |
|
|
Simple text encoding for one or more texts. |
|
|
|
|
|
**Parameters:** |
|
|
- `texts`: List of strings or single string |
|
|
- `max_length`: Maximum sequence length (default: 512) |
|
|
|
|
|
**Returns:** Tensor of shape `(batch_size, 2048)` |
|
|
|
|
|
📄 **Related Papers**: |
|
|
- [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234) |
|
|
*Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).* |
|
|
- [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays. |
|
|
|
|
|
**Parameters:** |
|
|
- `texts`: List of strings with optional separator |
|
|
- `separator`: String separator (default: `'!@#$%^&*()'`) |
|
|
- `max_length`: Maximum sequence length (default: 512) |
|
|
|
|
|
**Returns:** Tensor of shape `(batch_size, 2048)` |
|
|
|
|
|
The model has been evaluated on chest X-ray report analysis tasks, particularly for: |
|
|
- Text retrieval/encoder |
|
|
- Medical text similarity comparison |
|
|
- Clinical finding extraction |
|
|
|
|
|
**Parameters:** |
|
|
- `query_text`: Single query string |
|
|
- `candidate_texts`: List of candidate strings |
|
|
- `separator`: String separator (default: `'!@#$%^&*()'`) |
|
|
- `max_length`: Maximum sequence length (default: 512) |
|
|
|
|
|
**Returns:** Tensor of shape `(num_candidates,)` with cosine similarity scores |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
- Fully fine-tuned on chest X-ray reports and medical text data |
|
|
- Training focused on understanding pleural effusion status and other chest X-ray findings |
|
|
|
|
|
### Training Configuration |
|
|
- **Pooling Mode**: `latent_attention` (512 latents, 8 attention heads) |
|
|
- **Enable Bidirectional**: True |
|
|
- **Max Length**: 512 tokens |
|
|
- **Torch Dtype**: bfloat16 |
|
|
- **Full Fine-tuning**: All model weights were updated during training |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
- **Model Type**: Bidirectional Language Model (LLM2Vec) |
|
|
- **Architecture**: LlamaBiModel (modified Llama 3.2) + Latent Attention Pooling |
|
|
- **Parameters**: ~1B parameters |
|
|
- **Hidden Size**: 2048 |
|
|
- **Input Length**: Up to 512 tokens |
|
|
- **Output Dimension**: 2048 |
|
|
- **Precision**: bfloat16 |
|
|
- **Dependencies**: Only transformers and torch |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
### Primary Use Cases |
|
|
- **Medical Text Embeddings**: Generate embeddings for chest X-ray reports |
|
|
- **Clinical Text Similarity**: Compare medical texts for semantic similarity |
|
|
- **Medical Information Retrieval**: Find relevant medical reports or findings |
|
|
- **Clinical NLP Research**: Foundation model for medical text analysis |
|
|
|
|
|
### Limitations |
|
|
- Specialized for chest X-ray reports - may not generalize to other medical domains |
|
|
- Requires careful preprocessing for optimal performance |
|
|
- Should be used as part of a larger clinical decision support system, not for standalone diagnosis |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
The model has been evaluated on chest X-ray report analysis tasks, particularly for: |
|
|
- Text retrieval and encoding |
|
|
- Medical text similarity comparison |
|
|
- Clinical finding extraction |
|
|
|
|
|
### Sample Performance |
|
|
|
|
|
The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks. |
|
|
**LLM2Vec4CXR** shows stronger performance in: |
|
|
- Handling medical abbreviations and radiological terminology |
|
|
- Capturing fine-grained semantic differences in chest X-ray reports |
|
|
- Understanding clinical context and temporal changes |
|
|
|
|
|
## Related Resources |
|
|
|
|
|
📄 **Paper**: [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234) |
|
|
|
|
|
🔗 **Related Projects**: |
|
|
- [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{ko2025exploring, |
|
|
title={Exploring the Capabilities of LLM Encoders for Image--Text Retrieval in Chest X-rays}, |
|
|
author={Ko, Hanbin and Cho, Gihun and Baek, Inhyeok and Kim, Donguk and Koo, Joonbeom and Kim, Changi and Lee, Dongheon and Park, Chang Min}, |
|
|
journal={arXiv preprint arXiv:2509.15234}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
This model is built upon: |
|
|
- [LLM2Vec](https://github.com/McGill-NLP/llm2vec) - Framework for converting decoder-only LLMs into text encoders |
|
|
- [LLM2CLIP](https://github.com/microsoft/LLM2CLIP) - Microsoft's implementation for connecting LLMs with CLIP models |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under the MIT License. |
|
|
|