lukeingawesome
/

llm2vec4cxr

@@ -17,11 +17,15 @@ library_name: transformers
 # LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis
-This model is a fine-tuned version of [microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned](https://huggingface.co/microsoft/LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned) specifically optimized for chest X-ray report analysis and medical text understanding.
 ## Model Description
-LLM2Vec4CXR is a bidirectional language model that converts the base decoder-only LLM into a text encoder optimized for medical text embeddings. The model has been fully fine-tuned with modified pooling strategy (`latent_attention`) to better capture semantic relationships in chest X-ray reports.
 ### Key Features
@@ -161,6 +165,47 @@ best_match = options[torch.argmax(scores)]
 print(f"Best match: {best_match} (score: {torch.max(scores):.4f})")
 ```
 ## API Reference
 The model provides several convenient methods:
@@ -172,19 +217,12 @@ The model provides several convenient methods:
 - **`compute_similarities(query_text, candidate_texts)`**: One-line similarity computation
 - **`from_pretrained(..., pooling_mode="latent_attention")`**: Automatic latent attention weight loading
-### Migration from Manual Usage
-If you were previously using manual tokenization, you can now simply use:
-```python
-# Old way (still works)
-tokenized = model.tokenizer(text, return_tensors="pt", ...)
-tokenized["embed_mask"] = tokenized["attention_mask"].clone()
-embeddings = model(tokenized)
-# New way (recommended)
-embeddings = model.encode_text([text])
-```
 ## Evaluation
@@ -195,7 +233,11 @@ The model has been evaluated on chest X-ray report analysis tasks, particularly
 ### Sample Performance
-The model shows improved performance compared to the base model on medical text understanding tasks, particularly in distinguishing between different pleural effusion states and medical abbreviations.
 ## Intended Use
@@ -224,11 +266,11 @@ The model shows improved performance compared to the base model on medical text
 If you use this model in your research, please cite:
 ```bibtex
-@misc{llm2vec4cxr,
-  title={LLM2Vec4CXR: Fine-tuned LLM for Chest X-ray Report Analysis},
-  author={Hanbin Ko},
-  year={2025},
-  howpublished={\\url{https://huggingface.co/lukeingawesome/llm2vec4cxr}},
 }
 ```

 # LLM2Vec4CXR - Fine-tuned Model for Chest X-ray Report Analysis
+LLM2Vec4CXR is optimized for chest X-ray report analysis and medical text understanding.
+It is introduced in our paper [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234).
 ## Model Description
+LLM2Vec4CXR is a **bidirectional text encoder** fine-tuned with a `latent_attention` pooling strategy.
+This design enhances semantic representation of chest X-ray reports, improving performance on clinical text similarity, retrieval, and interpretation tasks.
 ### Key Features
 print(f"Best match: {best_match} (score: {torch.max(scores):.4f})")
 ```
+Or retrieving clinically similar reports:
+```
+import torch
+from llm2vec_wrapper import LLM2VecWrapper as LLM2Vec
+# Load model
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+model = LLM2Vec.from_pretrained(
+    base_model_name_or_path='lukeingawesome/llm2vec4cxr',
+    pooling_mode="latent_attention",
+    max_length=512,
+    enable_bidirectional=True,
+    torch_dtype=torch.bfloat16,
+    use_safetensors=True,
+).to(device).eval()
+# Configure tokenizer
+model.tokenizer.padding_side = 'left'
+# Instruction for retrieval
+instruction = 'Retrieve semantically similar sentences'
+query_report = "There is a small LLLF PE with basal atelectasis."
+query_text = instruction + '!@#$%^&*()' + query_report
+# Candidate reports
+candidate_reports = [
+    "No acute cardiopulmonary abnormality.",
+    "Small left pleural effusion is present.",
+    "Large right pleural effusion causing compressive atelectasis.",
+    "Heart size is normal with no evidence of pleural effusion.",
+    "There is left pleural effusion."
+]
+# Compute similarity scores
+scores = model.compute_similarities(query_text, candidate_reports)
+# Retrieve the most similar report
+best_match = candidate_reports[torch.argmax(scores)]
+print(f"Most similar report: {best_match} (score: {torch.max(scores):.4f})")
+```
 ## API Reference
 The model provides several convenient methods:
 - **`compute_similarities(query_text, candidate_texts)`**: One-line similarity computation
 - **`from_pretrained(..., pooling_mode="latent_attention")`**: Automatic latent attention weight loading
+📄 **Related Papers**:
+- [Exploring the Capabilities of LLM Encoders for Image–Text Retrieval in Chest X-rays](https://arxiv.org/pdf/2509.15234)
+  *Ko, Hanbin, et al. "Exploring the capabilities of LLM encoders for image–text retrieval in chest X-rays." arXiv preprint arXiv:2509.15234 (2025).*
+- [LLM2CLIP4CXR](https://github.com/lukeingawesome/llm2clip4cxr): A CLIP-based model that leverages the LLM2Vec encoder to align visual and textual representations of chest X-rays.
 ## Evaluation
 ### Sample Performance
+The model demonstrates consistent improvements over the base LLM2CLIP architecture on medical text understanding benchmarks.
+In particular, **LLM2Vec4CXR** shows stronger performance in:
+- Handling medical abbreviations and radiological terminology
+- Capturing fine-grained semantic differences in chest X-ray reports
 ## Intended Use
 If you use this model in your research, please cite:
 ```bibtex
+@article{ko2025exploring,
+  title={Exploring the Capabilities of LLM Encoders for Image--Text Retrieval in Chest X-rays},
+  author={Ko, Hanbin and Cho, Gihun and Baek, Inhyeok and Kim, Donguk and Koo, Joonbeom and Kim, Changi and Lee, Dongheon and Park, Chang Min},
+  journal={arXiv preprint arXiv:2509.15234},
+  year={2025}
 }
 ```