lukeingawesome
/

llm2vec4cxr

@@ -17,11 +17,9 @@ library_name: transformers
 # LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis
-LLM2Vec4CXR is optimized for chest X-ray report analysis and medical text understanding.
 It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234).
 ## Model Description
 LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy.
@@ -31,7 +29,7 @@ It improves performance on clinical text similarity, retrieval, and interpretati
 ### Key Features
 - **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct
-- **Pooling Mode**: Latent Attention (fine-tuned weights automatically loaded)
 - **Bidirectional Processing**: Enabled for better context understanding
 - **Medical Domain**: Specialized for chest X-ray report analysis
 - **Max Length**: 512 tokens
@@ -57,38 +55,27 @@ It improves performance on clinical text similarity, retrieval, and interpretati
 ### Installation
 ```bash
-# Install the LLM2Vec4CXR package directly from GitHub
-pip install git+https://github.com/lukeingawesome/llm2vec4cxr.git
-# Or clone and install in development mode
-git clone https://github.com/lukeingawesome/llm2vec4cxr.git
-cd llm2vec4cxr
-pip install -e .
 ```
 ### Basic Usage
 ```python
 import torch
-from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
-# Load the model - latent attention weights are automatically loaded!
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-model = LLM2Vec.from_pretrained(
-    base_model_name_or_path='lukeingawesome/llm2vec4cxr',
-    pooling_mode="latent_attention",
-    max_length=512,
-    enable_bidirectional=True,
-    torch_dtype=torch.bfloat16,
-    use_safetensors=True,
-).to(device).eval()
-# Configure tokenizer
-model.tokenizer.padding_side = 'left'
 # Simple text encoding
-report = "There is a small increase in the left-sided effusion. There continues to be volume loss at both bases."
 embedding = model.encode_text([report])
 # Multiple texts at once
 reports = [
@@ -97,148 +84,141 @@ reports = [
     "Large left pleural effusion with compressive atelectasis."
 ]
 embeddings = model.encode_text(reports)
 ```
-### Advanced Usage with Instructions and Similarity
 ```python
-# For instruction-following tasks with separator
-instruction = 'Determine the change or the status of the pleural effusion.'
-report = 'There is a small increase in the left-sided effusion.'
-query_text = instruction + '!@#$%^&*()' + report
-# Compare against multiple options
 candidates = [
-    'No pleural effusion',
-    'Pleural effusion present',
-    'Pleural effusion is worsening',
-    'Pleural effusion is improving'
 ]
-# Get similarity scores using the built-in method
-similarities = model.compute_similarities(query_text, candidates)
-print(f"Similarities: {similarities}")
-# For custom separator-based encoding
-embeddings = model.encode_with_separator([query_text], separator='!@#$%^&*()')
 ```
-**Note**: The model now includes convenient methods like `compute_similarities()` and `encode_with_separator()` that handle complex tokenization automatically.
-### Quick Start Example
-Here's a complete example showing the model's capabilities:
-```python
-import torch
-from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
-# Load model
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-model = LLM2Vec.from_pretrained(
-    base_model_name_or_path='lukeingawesome/llm2vec4cxr',
-    pooling_mode="latent_attention",
-    max_length=512,
-    enable_bidirectional=True,
-    torch_dtype=torch.bfloat16,
-    use_safetensors=True,
-).to(device).eval()
-# Configure tokenizer
-model.tokenizer.padding_side = 'left'
-# Medical text analysis
-instruction = 'Determine the change or the status of the pleural effusion.'
-report = 'There is a small increase in the left-sided effusion.'
-query = instruction + '!@#$%^&*()' + report
-# Compare with different diagnoses
-options = [
-    'No pleural effusion',
-    'Pleural effusion is worsening',
-    'Pleural effusion is stable',
-    'Pleural effusion is improving'
-]
-# Get similarity scores
-scores = model.compute_similarities(query, options)
-best_match = options[torch.argmax(scores)]
-print(f"Best match: {best_match} (score: {torch.max(scores):.4f})")
-```
-Or retrieving clinically similar reports:
 ```python
 import torch
-from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
 # Load model
-device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
-model = LLM2Vec.from_pretrained(
-    base_model_name_or_path='lukeingawesome/llm2vec4cxr',
-    pooling_mode="latent_attention",
-    max_length=512,
-    enable_bidirectional=True,
-    torch_dtype=torch.bfloat16,
-    use_safetensors=True,
-).to(device).eval()
-# Configure tokenizer
-model.tokenizer.padding_side = 'left'
 # Instruction for retrieval
-instruction = 'Retrieve semantically similar sentences'
-query_report = "There is a small LLLF PE with basal atelectasis."
-query_text = instruction + '!@#$%^&*()' + query_report
 # Candidate reports
-candidate_reports = [
     "No acute cardiopulmonary abnormality.",
     "Small left pleural effusion is present.",
     "Large right pleural effusion causing compressive atelectasis.",
     "Heart size is normal with no evidence of pleural effusion.",
-    "There is left pleural effusion."
 ]
-# Compute similarity scores
-scores = model.compute_similarities(query_text, candidate_reports)
-# Retrieve the most similar report
-best_match = candidate_reports[torch.argmax(scores)]
-print(f"Most similar report: {best_match} (score: {torch.max(scores):.4f})")
 ```
 ## API Reference
-The model provides several convenient methods:
-### Core Methods
-- **`encode_text(texts)`**: Simple text encoding with automatic embed_mask handling
-- **`encode_with_separator(texts, separator='!@#$%^&*()')`**: Encoding with instruction/content separation
-- **`compute_similarities(query_text, candidate_texts)`**: One-line similarity computation
-- **`from_pretrained(..., pooling_mode="latent_attention")`**: Automatic latent attention weight loading
 📄 **Related Papers**:
 - [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
   *Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).*
 - [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays.
-## Evaluation
 The model has been evaluated on chest X-ray report analysis tasks, particularly for:
 - Text retrieval/encoder
 - Medical text similarity comparison
 - Clinical finding extraction
-### Sample Performance
-The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks.
-In particular, **LLM2Vec4CXR** shows stronger performance in:
-- Handling medical abbreviations and radiological terminology
-- Capturing fine-grained semantic differences in chest X-ray reports
 ## Intended Use
@@ -253,14 +233,27 @@ In particular, **LLM2Vec4CXR** shows stronger performance in:
 - Requires careful preprocessing for optimal performance
 - Should be used as part of a larger clinical decision support system, not for standalone diagnosis
-## Technical Specifications
-- **Model Type**: Bidirectional Language Model (LLM2Vec)
-- **Architecture**: LlamaBiModel (modified Llama 3.2)
-- **Parameters**: ~1B parameters
-- **Input Length**: Up to 512 tokens
-- **Output**: Dense embeddings
-- **Precision**: bfloat16
 ## Citation
@@ -275,8 +268,6 @@ If you use this model in your research, please cite:
 }
 ```
-A preprint of this model will be released soon.
 ## Acknowledgments
 This model is built upon:

 # LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis
+LLM2Vec4CXR is a text encoder optimized for chest X-ray report analysis and medical text understanding.
 It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234).
 ## Model Description
 LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy.
 ### Key Features
 - **Base Architecture**: LLM2CLIP-Llama-3.2-1B-Instruct
+- **Pooling Mode**: Latent Attention (trained weights automatically loaded)
 - **Bidirectional Processing**: Enabled for better context understanding
 - **Medical Domain**: Specialized for chest X-ray report analysis
 - **Max Length**: 512 tokens
 ### Installation
 ```bash
+# Only transformers is needed!
+pip install transformers torch
 ```
 ### Basic Usage
 ```python
 import torch
+from transformers import AutoModel
+# Load the model - that's it!
+model = AutoModel.from_pretrained(
+    "lukeingawesome/llm2vec4cxr",
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16
+).to("cuda" if torch.cuda.is_available() else "cpu").eval()
 # Simple text encoding
+report = "Small left pleural effusion with basal atelectasis."
 embedding = model.encode_text([report])
+print(embedding.shape)  # torch.Size([1, 2048])
 # Multiple texts at once
 reports = [
     "Large left pleural effusion with compressive atelectasis."
 ]
 embeddings = model.encode_text(reports)
+print(embeddings.shape)  # torch.Size([3, 2048])
 ```
+### Instruction-Based Encoding and Similarity
 ```python
+import torch
+from transformers import AutoModel
+# Load model
+model = AutoModel.from_pretrained(
+    "lukeingawesome/llm2vec4cxr",
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16
+).to("cuda" if torch.cuda.is_available() else "cpu").eval()
+# Instruction-based task with separator
+instruction = "Determine the status of the pleural effusion."
+report = "There is a small increase in the left-sided effusion."
+query = instruction + "!@#$%^&*()" + report
+# Compare against multiple candidates
 candidates = [
+    "No pleural effusion",
+    "Pleural effusion present",
+    "Worsening pleural effusion",
+    "Improving pleural effusion"
 ]
+# One-line similarity computation
+scores = model.compute_similarities(query, candidates)
+print(scores)
+# tensor([0.7171, 0.8270, 0.9155, 0.8113], device='cuda:0')
+best_match = candidates[torch.argmax(scores)]
+print(f"Best match: {best_match}")
+# Best match: Worsening pleural effusion
 ```
+### Medical Report Retrieval Example
 ```python
 import torch
+from transformers import AutoModel
 # Load model
+model = AutoModel.from_pretrained(
+    "lukeingawesome/llm2vec4cxr",
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16
+).to("cuda" if torch.cuda.is_available() else "cpu").eval()
 # Instruction for retrieval
+instruction = "Retrieve semantically similar reports"
+query_report = "Small left pleural effusion with basal atelectasis."
+query = instruction + "!@#$%^&*()" + query_report
 # Candidate reports
+candidates = [
     "No acute cardiopulmonary abnormality.",
     "Small left pleural effusion is present.",
     "Large right pleural effusion causing compressive atelectasis.",
     "Heart size is normal with no evidence of pleural effusion.",
 ]
+# Compute similarities
+scores = model.compute_similarities(query, candidates)
+# Get most similar
+best_idx = torch.argmax(scores)
+print(f"Most similar: {candidates[best_idx]}")
+print(f"Score: {scores[best_idx]:.4f}")
 ```
 ## API Reference
+The model provides three main methods:
+### `encode_text(texts, max_length=512)`
+Simple text encoding for one or more texts.
+**Parameters:**
+- `texts`: List of strings or single string
+- `max_length`: Maximum sequence length (default: 512)
+**Returns:** Tensor of shape `(batch_size, 2048)`
 📄 **Related Papers**:
 - [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
   *Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).*
 - [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays.
+**Parameters:**
+- `texts`: List of strings with optional separator
+- `separator`: String separator (default: `'!@#$%^&*()'`)
+- `max_length`: Maximum sequence length (default: 512)
+**Returns:** Tensor of shape `(batch_size, 2048)`
 The model has been evaluated on chest X-ray report analysis tasks, particularly for:
 - Text retrieval/encoder
 - Medical text similarity comparison
 - Clinical finding extraction
+**Parameters:**
+- `query_text`: Single query string
+- `candidate_texts`: List of candidate strings
+- `separator`: String separator (default: `'!@#$%^&*()'`)
+- `max_length`: Maximum sequence length (default: 512)
+**Returns:** Tensor of shape `(num_candidates,)` with cosine similarity scores
+## Training Details
+### Training Data
+- Fully fine-tuned on chest X-ray reports and medical text data
+- Training focused on understanding pleural effusion status and other chest X-ray findings
+### Training Configuration
+- **Pooling Mode**: `latent_attention` (512 latents, 8 attention heads)
+- **Enable Bidirectional**: True
+- **Max Length**: 512 tokens
+- **Torch Dtype**: bfloat16
+- **Full Fine-tuning**: All model weights were updated during training
+## Technical Specifications
+- **Model Type**: Bidirectional Language Model (LLM2Vec)
+- **Architecture**: LlamaBiModel (modified Llama 3.2) + Latent Attention Pooling
+- **Parameters**: ~1B parameters
+- **Hidden Size**: 2048
+- **Input Length**: Up to 512 tokens
+- **Output Dimension**: 2048
+- **Precision**: bfloat16
+- **Dependencies**: Only transformers and torch
 ## Intended Use
 - Requires careful preprocessing for optimal performance
 - Should be used as part of a larger clinical decision support system, not for standalone diagnosis
+## Evaluation
+The model has been evaluated on chest X-ray report analysis tasks, particularly for:
+- Text retrieval and encoding
+- Medical text similarity comparison
+- Clinical finding extraction
+### Sample Performance
+The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks.
+**LLM2Vec4CXR** shows stronger performance in:
+- Handling medical abbreviations and radiological terminology
+- Capturing fine-grained semantic differences in chest X-ray reports
+- Understanding clinical context and temporal changes
+## Related Resources
+📄 **Paper**: [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
+🔗 **Related Projects**:
+- [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays
 ## Citation
 }
 ```
 ## Acknowledgments
 This model is built upon: