BERT-based-uncased models fine-tuned on SST-2
LEAF
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Organization Card
Robustness in Both Domains: CLIP Needs a Robust Text Encoder
Elias Abad Rocamora, Christian Schlarmann, Naman Deep Singh, Yongtao Wu, Matthias Hein and Volkan Cevher
LIONS @ EPFL and Tübingen AI Center
In this repo, you will find all the models trained for our NeurIPS 2025 paper.
Loading CLIPModels
You can load our models as any other CLIP model, for example, loading LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2 can be done by following the "openai/clip-vit-large-patch14" example snippet:
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
model_name = "LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2"
processor_name = "openai/clip-vit-large-patch14"
model = CLIPModel.from_pretrained(model_name)
processor = CLIPProcessor.from_pretrained(processor_name)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
When loading other model sizes, the processor_name needs to be changed accordingly as:
| Model Size | Processor Name |
|---|---|
| ViT-L-14 | "openai/clip-vit-large-patch14" |
| ViT-H-14 | "laion/CLIP-ViT-H-14-laion2B-s32B-b79K" |
| ViT-g-14 | "laion/CLIP-ViT-g-14-laion2B-s12B-b42K" |
| ViT-bigG-14 | "laion/CLIP-ViT-bigG-14-laion2B-39B-b160k" |
Loading CLIPTextModels
If just need the text encoder, you can load it with the following snippet:
from transformers import CLIPTokenizer, CLIPTextModel
model_name = "LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2"
processor_name = "openai/clip-vit-large-patch14"
model = CLIPTextModel.from_pretrained(model_name)
tokenizer = CLIPTokenizer.from_pretrained(processor_name)
inputs = tokenizer(["a photo of a cat", "a photo of a dog"], padding=True, return_tensors="pt")
outputs = model(**inputs)
last_hidden_state = outputs.last_hidden_state
pooled_output = outputs.pooled_output # pooled (EOS token) states
Acknowledgements
Our codebase is based in the OpenCLIP codebase, we appreciate the effort of the OpenCLIP team and the release of their code and model weights.
-
LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2
Feature Extraction • 0.4B • Updated • 1 -
LEAF-CLIP/OpenCLIP-ViT-H-rho50-k1-constrained-FARE2
Feature Extraction • 1.0B • Updated • 2 -
LEAF-CLIP/OpenCLIP-ViT-bigG-rho50-k1-constrained
Feature Extraction • 3B • Updated -
LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-constrained-FARE2
Feature Extraction • 1B • Updated • 1
BERT-based-uncased models fine-tuned on SST-2
-
LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2
Feature Extraction • 0.4B • Updated • 1 -
LEAF-CLIP/OpenCLIP-ViT-H-rho50-k1-constrained-FARE2
Feature Extraction • 1.0B • Updated • 2 -
LEAF-CLIP/OpenCLIP-ViT-bigG-rho50-k1-constrained
Feature Extraction • 3B • Updated -
LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-constrained-FARE2
Feature Extraction • 1B • Updated • 1
models
42
LEAF-CLIP/LEAF-BERT-base-uncased-SST-2-rho-50-k1-constrained
Text Classification
•
0.1B
•
Updated
•
3
LEAF-CLIP/LEAF-BERT-base-uncased-SST-2-rho-50-k1
0.1B
•
Updated
•
4
LEAF-CLIP/OpenCLIP-ViT-bigG-rho50-k1-constrained
Feature Extraction
•
3B
•
Updated
LEAF-CLIP/OpenCLIP-ViT-H-rho50-k1-constrained-FARE2
Feature Extraction
•
1.0B
•
Updated
•
2
LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-constrained-FARE2
Feature Extraction
•
1B
•
Updated
•
1
LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2
Feature Extraction
•
0.4B
•
Updated
•
1
LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-constrained
1B
•
Updated
LEAF-CLIP/OpenCLIP-ViT-g-FARE2
1B
•
Updated
LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-FARE2
1B
•
Updated
LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1
1B
•
Updated
datasets
0
None public yet