Llammy3.2-3B-GUFF

Use the FLUX model as much as you want.

HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23 • 13.7k • 91

Vivek/gptneo_piqa

Updated Aug 6, 2021 • 5 • 1

calcuis/qwen-image-edit-plus-gguf

Image-to-Image • 14B • Updated 24 days ago • 22.2k • 34

BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement

Paper • 2412.14203 • Published Dec 16, 2024 • 1

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

Paper • 2506.17450 • Published Jun 20 • 63

Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

Paper • 2510.19808 • Published 5 days ago • 22

TiC-CLIP: Continual Training of CLIP Models

Paper • 2310.16226 • Published Oct 24, 2023 • 9

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Paper • 2405.15613 • Published May 24, 2024 • 17

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 54

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

Paper • 2506.20639 • Published Jun 25 • 31

nvidia/omnivinci

Feature Extraction • Updated 4 days ago • 990 • 98

NexaAI/OmniNeural-4B

Any-to-Any • Updated 27 days ago • 102 • 155

nvidia/omni-embed-nemotron-3b

Feature Extraction • 5B • Updated 18 days ago • 1.42k • 51

nvidia/llama-nemoretriever-colembed-3b-v1

Visual Document Retrieval • 4B • Updated Jul 10 • 1.78k • 53

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

nvidia/Nemotron-H-4B-Instruct-128K

Text Generation • 4B • Updated 4 days ago • 1.14k • 7

tencent/HunyuanImage-3.0

Text-to-Image • 83B • Updated 13 days ago • 34.8k • • 939

ayjays132/Quantum-NeuralAdaptiveLearningSystem

Text Classification • 0.1B • Updated Dec 22, 2024 • 6 • 4

nvidia/Cosmos-Predict2-2B-Video2World

Image-to-Video • Updated Jul 23 • 750 • 33

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 39

MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets

Paper • 2403.03194 • Published Mar 5, 2024 • 15

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 45

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1, 2024 • 31

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15, 2024 • 13

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 18

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Paper • 2404.17672 • Published Apr 26, 2024 • 19

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 131

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Paper • 2408.09787 • Published Aug 19, 2024 • 10

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25, 2024 • 121

Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction

Paper • 2503.09040 • Published Mar 12 • 1

FLUX Unlimited