lukeingawesome commited on
Commit
8c10378
·
verified ·
1 Parent(s): 50d2262

Update README: self-contained usage with trust_remote_code

Browse files
Files changed (1) hide show
  1. README.md +120 -129
README.md CHANGED
@@ -17,11 +17,9 @@ library_name: transformers
17
 
18
  # LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis
19
 
20
-
21
- LLM2Vec4CXR is optimized for chest X-ray report analysis and medical text understanding.
22
  It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234).
23
 
24
-
25
  ## Model Description
26
 
27
  LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy.
@@ -31,7 +29,7 @@ It improves performance on clinical text similarity, retrieval, and interpretati
31
  ### Key Features
32
 
33
  - **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct
34
- - **Pooling Mode**: Latent Attention (fine-tuned weights automatically loaded)
35
  - **Bidirectional Processing**: Enabled for better context understanding
36
  - **Medical Domain**: Specialized for chest X-ray report analysis
37
  - **Max Length**: 512 tokens
@@ -57,38 +55,27 @@ It improves performance on clinical text similarity, retrieval, and interpretati
57
  ### Installation
58
 
59
  ```bash
60
- # Install the LLM2Vec4CXR package directly from GitHub
61
- pip install git+https://github.com/lukeingawesome/llm2vec4cxr.git
62
-
63
- # Or clone and install in development mode
64
- git clone https://github.com/lukeingawesome/llm2vec4cxr.git
65
- cd llm2vec4cxr
66
- pip install -e .
67
  ```
68
 
69
  ### Basic Usage
70
 
71
  ```python
72
  import torch
73
- from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
74
-
75
- # Load the model - latent attention weights are automatically loaded!
76
- device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
77
- model = LLM2Vec.from_pretrained(
78
- base_model_name_or_path='lukeingawesome/llm2vec4cxr',
79
- pooling_mode="latent_attention",
80
- max_length=512,
81
- enable_bidirectional=True,
82
- torch_dtype=torch.bfloat16,
83
- use_safetensors=True,
84
- ).to(device).eval()
85
-
86
- # Configure tokenizer
87
- model.tokenizer.padding_side = 'left'
88
 
89
  # Simple text encoding
90
- report = "There is a small increase in the left-sided effusion. There continues to be volume loss at both bases."
91
  embedding = model.encode_text([report])
 
92
 
93
  # Multiple texts at once
94
  reports = [
@@ -97,148 +84,141 @@ reports = [
97
  "Large left pleural effusion with compressive atelectasis."
98
  ]
99
  embeddings = model.encode_text(reports)
 
100
  ```
101
 
102
- ### Advanced Usage with Instructions and Similarity
103
 
104
  ```python
105
- # For instruction-following tasks with separator
106
- instruction = 'Determine the change or the status of the pleural effusion.'
107
- report = 'There is a small increase in the left-sided effusion.'
108
- query_text = instruction + '!@#$%^&*()' + report
109
 
110
- # Compare against multiple options
 
 
 
 
 
 
 
 
 
 
 
 
111
  candidates = [
112
- 'No pleural effusion',
113
- 'Pleural effusion present',
114
- 'Pleural effusion is worsening',
115
- 'Pleural effusion is improving'
116
  ]
117
 
118
- # Get similarity scores using the built-in method
119
- similarities = model.compute_similarities(query_text, candidates)
120
- print(f"Similarities: {similarities}")
 
121
 
122
- # For custom separator-based encoding
123
- embeddings = model.encode_with_separator([query_text], separator='!@#$%^&*()')
 
124
  ```
125
 
126
- **Note**: The model now includes convenient methods like `compute_similarities()` and `encode_with_separator()` that handle complex tokenization automatically.
127
-
128
- ### Quick Start Example
129
-
130
- Here's a complete example showing the model's capabilities:
131
-
132
- ```python
133
- import torch
134
- from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
135
-
136
- # Load model
137
- device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
138
- model = LLM2Vec.from_pretrained(
139
- base_model_name_or_path='lukeingawesome/llm2vec4cxr',
140
- pooling_mode="latent_attention",
141
- max_length=512,
142
- enable_bidirectional=True,
143
- torch_dtype=torch.bfloat16,
144
- use_safetensors=True,
145
- ).to(device).eval()
146
-
147
- # Configure tokenizer
148
- model.tokenizer.padding_side = 'left'
149
-
150
- # Medical text analysis
151
- instruction = 'Determine the change or the status of the pleural effusion.'
152
- report = 'There is a small increase in the left-sided effusion.'
153
- query = instruction + '!@#$%^&*()' + report
154
-
155
- # Compare with different diagnoses
156
- options = [
157
- 'No pleural effusion',
158
- 'Pleural effusion is worsening',
159
- 'Pleural effusion is stable',
160
- 'Pleural effusion is improving'
161
- ]
162
-
163
- # Get similarity scores
164
- scores = model.compute_similarities(query, options)
165
- best_match = options[torch.argmax(scores)]
166
- print(f"Best match: {best_match} (score: {torch.max(scores):.4f})")
167
- ```
168
 
169
- Or retrieving clinically similar reports:
170
  ```python
171
  import torch
172
- from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
173
 
174
  # Load model
175
- device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
176
- model = LLM2Vec.from_pretrained(
177
- base_model_name_or_path='lukeingawesome/llm2vec4cxr',
178
- pooling_mode="latent_attention",
179
- max_length=512,
180
- enable_bidirectional=True,
181
- torch_dtype=torch.bfloat16,
182
- use_safetensors=True,
183
- ).to(device).eval()
184
-
185
- # Configure tokenizer
186
- model.tokenizer.padding_side = 'left'
187
 
188
  # Instruction for retrieval
189
- instruction = 'Retrieve semantically similar sentences'
190
- query_report = "There is a small LLLF PE with basal atelectasis."
191
- query_text = instruction + '!@#$%^&*()' + query_report
192
 
193
  # Candidate reports
194
- candidate_reports = [
195
  "No acute cardiopulmonary abnormality.",
196
  "Small left pleural effusion is present.",
197
  "Large right pleural effusion causing compressive atelectasis.",
198
  "Heart size is normal with no evidence of pleural effusion.",
199
- "There is left pleural effusion."
200
  ]
201
 
202
- # Compute similarity scores
203
- scores = model.compute_similarities(query_text, candidate_reports)
204
 
205
- # Retrieve the most similar report
206
- best_match = candidate_reports[torch.argmax(scores)]
207
- print(f"Most similar report: {best_match} (score: {torch.max(scores):.4f})")
 
208
  ```
209
 
210
  ## API Reference
211
 
212
- The model provides several convenient methods:
213
 
214
- ### Core Methods
 
215
 
216
- - **`encode_text(texts)`**: Simple text encoding with automatic embed_mask handling
217
- - **`encode_with_separator(texts, separator='!@#$%^&*()')`**: Encoding with instruction/content separation
218
- - **`compute_similarities(query_text, candidate_texts)`**: One-line similarity computation
219
- - **`from_pretrained(..., pooling_mode="latent_attention")`**: Automatic latent attention weight loading
220
 
 
221
 
222
  📄 **Related Papers**:
223
  - [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
224
  *Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).*
225
  - [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays.
226
 
 
 
 
 
227
 
228
- ## Evaluation
229
 
230
  The model has been evaluated on chest X-ray report analysis tasks, particularly for:
231
  - Text retrieval/encoder
232
  - Medical text similarity comparison
233
  - Clinical finding extraction
234
 
235
- ### Sample Performance
 
 
 
 
236
 
237
- The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks.
238
- In particular, **LLM2Vec4CXR** shows stronger performance in:
239
- - Handling medical abbreviations and radiological terminology
240
- - Capturing fine-grained semantic differences in chest X-ray reports
241
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
242
 
243
  ## Intended Use
244
 
@@ -253,14 +233,27 @@ In particular, **LLM2Vec4CXR** shows stronger performance in:
253
  - Requires careful preprocessing for optimal performance
254
  - Should be used as part of a larger clinical decision support system, not for standalone diagnosis
255
 
256
- ## Technical Specifications
257
 
258
- - **Model Type**: Bidirectional Language Model (LLM2Vec)
259
- - **Architecture**: LlamaBiModel (modified Llama 3.2)
260
- - **Parameters**: ~1B parameters
261
- - **Input Length**: Up to 512 tokens
262
- - **Output**: Dense embeddings
263
- - **Precision**: bfloat16
 
 
 
 
 
 
 
 
 
 
 
 
 
264
 
265
  ## Citation
266
 
@@ -275,8 +268,6 @@ If you use this model in your research, please cite:
275
  }
276
  ```
277
 
278
- A preprint of this model will be released soon.
279
-
280
  ## Acknowledgments
281
 
282
  This model is built upon:
 
17
 
18
  # LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis
19
 
20
+ LLM2Vec4CXR is a text encoder optimized for chest X-ray report analysis and medical text understanding.
 
21
  It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234).
22
 
 
23
  ## Model Description
24
 
25
  LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy.
 
29
  ### Key Features
30
 
31
  - **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct
32
+ - **Pooling Mode**: Latent Attention (trained weights automatically loaded)
33
  - **Bidirectional Processing**: Enabled for better context understanding
34
  - **Medical Domain**: Specialized for chest X-ray report analysis
35
  - **Max Length**: 512 tokens
 
55
  ### Installation
56
 
57
  ```bash
58
+ # Only transformers is needed!
59
+ pip install transformers torch
 
 
 
 
 
60
  ```
61
 
62
  ### Basic Usage
63
 
64
  ```python
65
  import torch
66
+ from transformers import AutoModel
67
+
68
+ # Load the model - that's it!
69
+ model = AutoModel.from_pretrained(
70
+ "lukeingawesome/llm2vec4cxr",
71
+ trust_remote_code=True,
72
+ torch_dtype=torch.bfloat16
73
+ ).to("cuda" if torch.cuda.is_available() else "cpu").eval()
 
 
 
 
 
 
 
74
 
75
  # Simple text encoding
76
+ report = "Small left pleural effusion with basal atelectasis."
77
  embedding = model.encode_text([report])
78
+ print(embedding.shape) # torch.Size([1, 2048])
79
 
80
  # Multiple texts at once
81
  reports = [
 
84
  "Large left pleural effusion with compressive atelectasis."
85
  ]
86
  embeddings = model.encode_text(reports)
87
+ print(embeddings.shape) # torch.Size([3, 2048])
88
  ```
89
 
90
+ ### Instruction-Based Encoding and Similarity
91
 
92
  ```python
93
+ import torch
94
+ from transformers import AutoModel
 
 
95
 
96
+ # Load model
97
+ model = AutoModel.from_pretrained(
98
+ "lukeingawesome/llm2vec4cxr",
99
+ trust_remote_code=True,
100
+ torch_dtype=torch.bfloat16
101
+ ).to("cuda" if torch.cuda.is_available() else "cpu").eval()
102
+
103
+ # Instruction-based task with separator
104
+ instruction = "Determine the status of the pleural effusion."
105
+ report = "There is a small increase in the left-sided effusion."
106
+ query = instruction + "!@#$%^&*()" + report
107
+
108
+ # Compare against multiple candidates
109
  candidates = [
110
+ "No pleural effusion",
111
+ "Pleural effusion present",
112
+ "Worsening pleural effusion",
113
+ "Improving pleural effusion"
114
  ]
115
 
116
+ # One-line similarity computation
117
+ scores = model.compute_similarities(query, candidates)
118
+ print(scores)
119
+ # tensor([0.7171, 0.8270, 0.9155, 0.8113], device='cuda:0')
120
 
121
+ best_match = candidates[torch.argmax(scores)]
122
+ print(f"Best match: {best_match}")
123
+ # Best match: Worsening pleural effusion
124
  ```
125
 
126
+ ### Medical Report Retrieval Example
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
127
 
 
128
  ```python
129
  import torch
130
+ from transformers import AutoModel
131
 
132
  # Load model
133
+ model = AutoModel.from_pretrained(
134
+ "lukeingawesome/llm2vec4cxr",
135
+ trust_remote_code=True,
136
+ torch_dtype=torch.bfloat16
137
+ ).to("cuda" if torch.cuda.is_available() else "cpu").eval()
 
 
 
 
 
 
 
138
 
139
  # Instruction for retrieval
140
+ instruction = "Retrieve semantically similar reports"
141
+ query_report = "Small left pleural effusion with basal atelectasis."
142
+ query = instruction + "!@#$%^&*()" + query_report
143
 
144
  # Candidate reports
145
+ candidates = [
146
  "No acute cardiopulmonary abnormality.",
147
  "Small left pleural effusion is present.",
148
  "Large right pleural effusion causing compressive atelectasis.",
149
  "Heart size is normal with no evidence of pleural effusion.",
 
150
  ]
151
 
152
+ # Compute similarities
153
+ scores = model.compute_similarities(query, candidates)
154
 
155
+ # Get most similar
156
+ best_idx = torch.argmax(scores)
157
+ print(f"Most similar: {candidates[best_idx]}")
158
+ print(f"Score: {scores[best_idx]:.4f}")
159
  ```
160
 
161
  ## API Reference
162
 
163
+ The model provides three main methods:
164
 
165
+ ### `encode_text(texts, max_length=512)`
166
+ Simple text encoding for one or more texts.
167
 
168
+ **Parameters:**
169
+ - `texts`: List of strings or single string
170
+ - `max_length`: Maximum sequence length (default: 512)
 
171
 
172
+ **Returns:** Tensor of shape `(batch_size, 2048)`
173
 
174
  📄 **Related Papers**:
175
  - [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
176
  *Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).*
177
  - [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays.
178
 
179
+ **Parameters:**
180
+ - `texts`: List of strings with optional separator
181
+ - `separator`: String separator (default: `'!@#$%^&*()'`)
182
+ - `max_length`: Maximum sequence length (default: 512)
183
 
184
+ **Returns:** Tensor of shape `(batch_size, 2048)`
185
 
186
  The model has been evaluated on chest X-ray report analysis tasks, particularly for:
187
  - Text retrieval/encoder
188
  - Medical text similarity comparison
189
  - Clinical finding extraction
190
 
191
+ **Parameters:**
192
+ - `query_text`: Single query string
193
+ - `candidate_texts`: List of candidate strings
194
+ - `separator`: String separator (default: `'!@#$%^&*()'`)
195
+ - `max_length`: Maximum sequence length (default: 512)
196
 
197
+ **Returns:** Tensor of shape `(num_candidates,)` with cosine similarity scores
 
 
 
198
 
199
+ ## Training Details
200
+
201
+ ### Training Data
202
+ - Fully fine-tuned on chest X-ray reports and medical text data
203
+ - Training focused on understanding pleural effusion status and other chest X-ray findings
204
+
205
+ ### Training Configuration
206
+ - **Pooling Mode**: `latent_attention` (512 latents, 8 attention heads)
207
+ - **Enable Bidirectional**: True
208
+ - **Max Length**: 512 tokens
209
+ - **Torch Dtype**: bfloat16
210
+ - **Full Fine-tuning**: All model weights were updated during training
211
+
212
+ ## Technical Specifications
213
+
214
+ - **Model Type**: Bidirectional Language Model (LLM2Vec)
215
+ - **Architecture**: LlamaBiModel (modified Llama 3.2) + Latent Attention Pooling
216
+ - **Parameters**: ~1B parameters
217
+ - **Hidden Size**: 2048
218
+ - **Input Length**: Up to 512 tokens
219
+ - **Output Dimension**: 2048
220
+ - **Precision**: bfloat16
221
+ - **Dependencies**: Only transformers and torch
222
 
223
  ## Intended Use
224
 
 
233
  - Requires careful preprocessing for optimal performance
234
  - Should be used as part of a larger clinical decision support system, not for standalone diagnosis
235
 
236
+ ## Evaluation
237
 
238
+ The model has been evaluated on chest X-ray report analysis tasks, particularly for:
239
+ - Text retrieval and encoding
240
+ - Medical text similarity comparison
241
+ - Clinical finding extraction
242
+
243
+ ### Sample Performance
244
+
245
+ The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks.
246
+ **LLM2Vec4CXR** shows stronger performance in:
247
+ - Handling medical abbreviations and radiological terminology
248
+ - Capturing fine-grained semantic differences in chest X-ray reports
249
+ - Understanding clinical context and temporal changes
250
+
251
+ ## Related Resources
252
+
253
+ 📄 **Paper**: [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
254
+
255
+ 🔗 **Related Projects**:
256
+ - [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays
257
 
258
  ## Citation
259
 
 
268
  }
269
  ```
270
 
 
 
271
  ## Acknowledgments
272
 
273
  This model is built upon: