File size: 9,436 Bytes
313bbea
 
71d3180
 
 
 
 
 
 
 
 
 
 
 
 
313bbea
 
71d3180
 
8c10378
2f8ec38
 
71d3180
 
2f8ec38
d3e466f
 
71d3180
 
 
 
8c10378
71d3180
 
 
 
cfb7750
 
71d3180
 
 
 
 
 
 
 
 
 
 
 
 
 
313bbea
 
71d3180
313bbea
71d3180
8c10378
 
71d3180
 
 
313bbea
71d3180
cfb7750
8c10378
 
 
 
 
 
 
 
71d3180
cfb7750
8c10378
cfb7750
8c10378
6c19590
 
 
 
 
 
 
 
8c10378
6c19590
71d3180
8c10378
71d3180
6c19590
8c10378
 
cfb7750
8c10378
 
 
 
 
 
 
 
 
 
 
 
 
cfb7750
8c10378
 
 
 
cfb7750
 
8c10378
 
 
 
cfb7750
8c10378
 
 
cfb7750
 
8c10378
71d3180
967b6d0
2f8ec38
8c10378
2f8ec38
 
8c10378
 
 
 
 
2f8ec38
 
8c10378
 
 
2f8ec38
 
8c10378
2f8ec38
 
 
 
 
 
8c10378
 
2f8ec38
8c10378
 
 
 
2f8ec38
 
cfb7750
 
8c10378
af8b7ea
8c10378
 
71d3180
8c10378
 
 
cfb7750
8c10378
cfb7750
2f8ec38
 
 
 
71d3180
8c10378
 
 
 
313bbea
8c10378
71d3180
 
e2bbf2a
71d3180
 
 
8c10378
 
 
 
 
71d3180
8c10378
2f8ec38
8c10378
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71d3180
 
313bbea
71d3180
 
 
 
 
 
 
 
 
 
 
8c10378
71d3180
8c10378
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71d3180
 
 
 
 
 
2f8ec38
 
 
 
 
71d3180
313bbea
71d3180
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
---
license: mit
base_model: microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned
tags:
- text-embeddings
- sentence-transformers
- llm2vec
- medical
- chest-xray
- radiology
- clinical-nlp
language:
- en
pipeline_tag: feature-extraction
library_name: transformers
---

# LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis

LLM2Vec4CXR is a text encoder optimized for chest X-ray report analysis and medical text understanding.  
It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234).

## Model Description

LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy.  
This design enhances semantic representation of chest X-ray reports, making the model robust across different reporting styles and effective even with domain-specific abbreviations.  
It improves performance on clinical text similarity, retrieval, and interpretation tasks.

### Key Features

- **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct
- **Pooling Mode**: Latent Attention (trained weights automatically loaded)
- **Bidirectional Processing**: Enabled for better context understanding
- **Medical Domain**: Specialized for chest X-ray report analysis
- **Max Length**: 512 tokens
- **Precision**: bfloat16
- **Automatic Loading**: Latent attention weights are automatically loaded from safetensors
- **Simple API**: Built-in methods for similarity computation and instruction-based encoding

## Training Details

### Training Data
- Fully fine-tuned on chest X-ray reports and medical text data
- Training focused on understanding pleural effusion status and other chest X-ray findings

### Training Configuration
- **Pooling Mode**: `latent_attention` (modified from base model)
- **Enable Bidirectional**: True
- **Max Length**: 512
- **Torch Dtype**: bfloat16
- **Full Fine-tuning**: All model weights were updated during training

## Usage

### Installation

```bash
# Only transformers is needed!
pip install transformers torch
```

### Basic Usage

```python
import torch
from transformers import AutoModel

# Load the model - that's it!
model = AutoModel.from_pretrained(
    "lukeingawesome/llm2vec4cxr",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
).to("cuda" if torch.cuda.is_available() else "cpu").eval()

# Simple text encoding
report = "Small left pleural effusion with basal atelectasis."
embedding = model.encode_text([report])
print(embedding.shape)  # torch.Size([1, 2048])

# Multiple texts at once
reports = [
    "No acute cardiopulmonary abnormality.",
    "Small bilateral pleural effusions.",
    "Large left pleural effusion with compressive atelectasis."
]
embeddings = model.encode_text(reports)
print(embeddings.shape)  # torch.Size([3, 2048])
```

### Instruction-Based Encoding and Similarity

```python
import torch
from transformers import AutoModel

# Load model
model = AutoModel.from_pretrained(
    "lukeingawesome/llm2vec4cxr",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
).to("cuda" if torch.cuda.is_available() else "cpu").eval()

# Instruction-based task with separator
instruction = "Determine the status of the pleural effusion."
report = "There is a small increase in the left-sided effusion."
query = instruction + "!@#$%^&*()" + report

# Compare against multiple candidates
candidates = [
    "No pleural effusion",
    "Pleural effusion present",
    "Worsening pleural effusion",
    "Improving pleural effusion"
]

# One-line similarity computation
scores = model.compute_similarities(query, candidates)
print(scores)
# tensor([0.7171, 0.8270, 0.9155, 0.8113], device='cuda:0')

best_match = candidates[torch.argmax(scores)]
print(f"Best match: {best_match}")
# Best match: Worsening pleural effusion
```

### Medical Report Retrieval Example

```python
import torch
from transformers import AutoModel

# Load model
model = AutoModel.from_pretrained(
    "lukeingawesome/llm2vec4cxr",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
).to("cuda" if torch.cuda.is_available() else "cpu").eval()

# Instruction for retrieval
instruction = "Retrieve semantically similar reports"
query_report = "Small left pleural effusion with basal atelectasis."
query = instruction + "!@#$%^&*()" + query_report

# Candidate reports
candidates = [
    "No acute cardiopulmonary abnormality.",
    "Small left pleural effusion is present.",
    "Large right pleural effusion causing compressive atelectasis.",
    "Heart size is normal with no evidence of pleural effusion.",
]

# Compute similarities
scores = model.compute_similarities(query, candidates)

# Get most similar
best_idx = torch.argmax(scores)
print(f"Most similar: {candidates[best_idx]}")
print(f"Score: {scores[best_idx]:.4f}")
```

## API Reference

The model provides three main methods:

### `encode_text(texts, max_length=512)`
Simple text encoding for one or more texts.

**Parameters:**
- `texts`: List of strings or single string
- `max_length`: Maximum sequence length (default: 512)

**Returns:** Tensor of shape `(batch_size, 2048)`

📄 **Related Papers**:
- [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)  
  *Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).*
- [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays.

**Parameters:**
- `texts`: List of strings with optional separator
- `separator`: String separator (default: `'!@#$%^&*()'`)
- `max_length`: Maximum sequence length (default: 512)

**Returns:** Tensor of shape `(batch_size, 2048)`

The model has been evaluated on chest X-ray report analysis tasks, particularly for:
- Text retrieval/encoder
- Medical text similarity comparison
- Clinical finding extraction

**Parameters:**
- `query_text`: Single query string
- `candidate_texts`: List of candidate strings
- `separator`: String separator (default: `'!@#$%^&*()'`)
- `max_length`: Maximum sequence length (default: 512)

**Returns:** Tensor of shape `(num_candidates,)` with cosine similarity scores

## Training Details

### Training Data
- Fully fine-tuned on chest X-ray reports and medical text data
- Training focused on understanding pleural effusion status and other chest X-ray findings

### Training Configuration
- **Pooling Mode**: `latent_attention` (512 latents, 8 attention heads)
- **Enable Bidirectional**: True
- **Max Length**: 512 tokens
- **Torch Dtype**: bfloat16
- **Full Fine-tuning**: All model weights were updated during training

## Technical Specifications

- **Model Type**: Bidirectional Language Model (LLM2Vec)
- **Architecture**: LlamaBiModel (modified Llama 3.2) + Latent Attention Pooling
- **Parameters**: ~1B parameters
- **Hidden Size**: 2048
- **Input Length**: Up to 512 tokens
- **Output Dimension**: 2048
- **Precision**: bfloat16
- **Dependencies**: Only transformers and torch

## Intended Use

### Primary Use Cases
- **Medical Text Embeddings**: Generate embeddings for chest X-ray reports
- **Clinical Text Similarity**: Compare medical texts for semantic similarity
- **Medical Information Retrieval**: Find relevant medical reports or findings
- **Clinical NLP Research**: Foundation model for medical text analysis

### Limitations
- Specialized for chest X-ray reports - may not generalize to other medical domains
- Requires careful preprocessing for optimal performance
- Should be used as part of a larger clinical decision support system, not for standalone diagnosis

## Evaluation

The model has been evaluated on chest X-ray report analysis tasks, particularly for:
- Text retrieval and encoding
- Medical text similarity comparison
- Clinical finding extraction

### Sample Performance

The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks.  
**LLM2Vec4CXR** shows stronger performance in:
- Handling medical abbreviations and radiological terminology  
- Capturing fine-grained semantic differences in chest X-ray reports  
- Understanding clinical context and temporal changes

## Related Resources

📄 **Paper**: [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)  

🔗 **Related Projects**:
- [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays

## Citation

If you use this model in your research, please cite:

```bibtex
@article{ko2025exploring,
  title={Exploring the Capabilities of LLM Encoders for Image--Text Retrieval in Chest X-rays},
  author={Ko, Hanbin and Cho, Gihun and Baek, Inhyeok and Kim, Donguk and Koo, Joonbeom and Kim, Changi and Lee, Dongheon and Park, Chang Min},
  journal={arXiv preprint arXiv:2509.15234},
  year={2025}
}
```

## Acknowledgments

This model is built upon:
- [LLM2Vec](https://github.com/McGill-NLP/llm2vec) - Framework for converting decoder-only LLMs into text encoders
- [LLM2CLIP](https://github.com/microsoft/LLM2CLIP) - Microsoft's implementation for connecting LLMs with CLIP models

## License

This model is licensed under the MIT License.