llm2vec4cxr / README.md
lukeingawesome's picture
Update README: self-contained usage with trust_remote_code
8c10378 verified
---
license: mit
base_model: microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned
tags:
- text-embeddings
- sentence-transformers
- llm2vec
- medical
- chest-xray
- radiology
- clinical-nlp
language:
- en
pipeline_tag: feature-extraction
library_name: transformers
---
# LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis
LLM2Vec4CXR is a text encoder optimized for chest X-ray report analysis and medical text understanding.
It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234).
## Model Description
LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy.
This design enhances semantic representation of chest X-ray reports, making the model robust across different reporting styles and effective even with domain-specific abbreviations.
It improves performance on clinical text similarity, retrieval, and interpretation tasks.
### Key Features
- **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct
- **Pooling Mode**: Latent Attention (trained weights automatically loaded)
- **Bidirectional Processing**: Enabled for better context understanding
- **Medical Domain**: Specialized for chest X-ray report analysis
- **Max Length**: 512 tokens
- **Precision**: bfloat16
- **Automatic Loading**: Latent attention weights are automatically loaded from safetensors
- **Simple API**: Built-in methods for similarity computation and instruction-based encoding
## Training Details
### Training Data
- Fully fine-tuned on chest X-ray reports and medical text data
- Training focused on understanding pleural effusion status and other chest X-ray findings
### Training Configuration
- **Pooling Mode**: `latent_attention` (modified from base model)
- **Enable Bidirectional**: True
- **Max Length**: 512
- **Torch Dtype**: bfloat16
- **Full Fine-tuning**: All model weights were updated during training
## Usage
### Installation
```bash
# Only transformers is needed!
pip install transformers torch
```
### Basic Usage
```python
import torch
from transformers import AutoModel
# Load the model - that's it!
model = AutoModel.from_pretrained(
"lukeingawesome/llm2vec4cxr",
trust_remote_code=True,
torch_dtype=torch.bfloat16
).to("cuda" if torch.cuda.is_available() else "cpu").eval()
# Simple text encoding
report = "Small left pleural effusion with basal atelectasis."
embedding = model.encode_text([report])
print(embedding.shape) # torch.Size([1, 2048])
# Multiple texts at once
reports = [
"No acute cardiopulmonary abnormality.",
"Small bilateral pleural effusions.",
"Large left pleural effusion with compressive atelectasis."
]
embeddings = model.encode_text(reports)
print(embeddings.shape) # torch.Size([3, 2048])
```
### Instruction-Based Encoding and Similarity
```python
import torch
from transformers import AutoModel
# Load model
model = AutoModel.from_pretrained(
"lukeingawesome/llm2vec4cxr",
trust_remote_code=True,
torch_dtype=torch.bfloat16
).to("cuda" if torch.cuda.is_available() else "cpu").eval()
# Instruction-based task with separator
instruction = "Determine the status of the pleural effusion."
report = "There is a small increase in the left-sided effusion."
query = instruction + "!@#$%^&*()" + report
# Compare against multiple candidates
candidates = [
"No pleural effusion",
"Pleural effusion present",
"Worsening pleural effusion",
"Improving pleural effusion"
]
# One-line similarity computation
scores = model.compute_similarities(query, candidates)
print(scores)
# tensor([0.7171, 0.8270, 0.9155, 0.8113], device='cuda:0')
best_match = candidates[torch.argmax(scores)]
print(f"Best match: {best_match}")
# Best match: Worsening pleural effusion
```
### Medical Report Retrieval Example
```python
import torch
from transformers import AutoModel
# Load model
model = AutoModel.from_pretrained(
"lukeingawesome/llm2vec4cxr",
trust_remote_code=True,
torch_dtype=torch.bfloat16
).to("cuda" if torch.cuda.is_available() else "cpu").eval()
# Instruction for retrieval
instruction = "Retrieve semantically similar reports"
query_report = "Small left pleural effusion with basal atelectasis."
query = instruction + "!@#$%^&*()" + query_report
# Candidate reports
candidates = [
"No acute cardiopulmonary abnormality.",
"Small left pleural effusion is present.",
"Large right pleural effusion causing compressive atelectasis.",
"Heart size is normal with no evidence of pleural effusion.",
]
# Compute similarities
scores = model.compute_similarities(query, candidates)
# Get most similar
best_idx = torch.argmax(scores)
print(f"Most similar: {candidates[best_idx]}")
print(f"Score: {scores[best_idx]:.4f}")
```
## API Reference
The model provides three main methods:
### `encode_text(texts, max_length=512)`
Simple text encoding for one or more texts.
**Parameters:**
- `texts`: List of strings or single string
- `max_length`: Maximum sequence length (default: 512)
**Returns:** Tensor of shape `(batch_size, 2048)`
📄 **Related Papers**:
- [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
*Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).*
- [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays.
**Parameters:**
- `texts`: List of strings with optional separator
- `separator`: String separator (default: `'!@#$%^&*()'`)
- `max_length`: Maximum sequence length (default: 512)
**Returns:** Tensor of shape `(batch_size, 2048)`
The model has been evaluated on chest X-ray report analysis tasks, particularly for:
- Text retrieval/encoder
- Medical text similarity comparison
- Clinical finding extraction
**Parameters:**
- `query_text`: Single query string
- `candidate_texts`: List of candidate strings
- `separator`: String separator (default: `'!@#$%^&*()'`)
- `max_length`: Maximum sequence length (default: 512)
**Returns:** Tensor of shape `(num_candidates,)` with cosine similarity scores
## Training Details
### Training Data
- Fully fine-tuned on chest X-ray reports and medical text data
- Training focused on understanding pleural effusion status and other chest X-ray findings
### Training Configuration
- **Pooling Mode**: `latent_attention` (512 latents, 8 attention heads)
- **Enable Bidirectional**: True
- **Max Length**: 512 tokens
- **Torch Dtype**: bfloat16
- **Full Fine-tuning**: All model weights were updated during training
## Technical Specifications
- **Model Type**: Bidirectional Language Model (LLM2Vec)
- **Architecture**: LlamaBiModel (modified Llama 3.2) + Latent Attention Pooling
- **Parameters**: ~1B parameters
- **Hidden Size**: 2048
- **Input Length**: Up to 512 tokens
- **Output Dimension**: 2048
- **Precision**: bfloat16
- **Dependencies**: Only transformers and torch
## Intended Use
### Primary Use Cases
- **Medical Text Embeddings**: Generate embeddings for chest X-ray reports
- **Clinical Text Similarity**: Compare medical texts for semantic similarity
- **Medical Information Retrieval**: Find relevant medical reports or findings
- **Clinical NLP Research**: Foundation model for medical text analysis
### Limitations
- Specialized for chest X-ray reports - may not generalize to other medical domains
- Requires careful preprocessing for optimal performance
- Should be used as part of a larger clinical decision support system, not for standalone diagnosis
## Evaluation
The model has been evaluated on chest X-ray report analysis tasks, particularly for:
- Text retrieval and encoding
- Medical text similarity comparison
- Clinical finding extraction
### Sample Performance
The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks.
**LLM2Vec4CXR** shows stronger performance in:
- Handling medical abbreviations and radiological terminology
- Capturing fine-grained semantic differences in chest X-ray reports
- Understanding clinical context and temporal changes
## Related Resources
📄 **Paper**: [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
🔗 **Related Projects**:
- [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays
## Citation
If you use this model in your research, please cite:
```bibtex
@article{ko2025exploring,
title={Exploring the Capabilities of LLM Encoders for Image--Text Retrieval in Chest X-rays},
author={Ko, Hanbin and Cho, Gihun and Baek, Inhyeok and Kim, Donguk and Koo, Joonbeom and Kim, Changi and Lee, Dongheon and Park, Chang Min},
journal={arXiv preprint arXiv:2509.15234},
year={2025}
}
```
## Acknowledgments
This model is built upon:
- [LLM2Vec](https://github.com/McGill-NLP/llm2vec) - Framework for converting decoder-only LLMs into text encoders
- [LLM2CLIP](https://github.com/microsoft/LLM2CLIP) - Microsoft's implementation for connecting LLMs with CLIP models
## License
This model is licensed under the MIT License.