WithAnyone: Towards Controllable and ID Consistent Image Generation Paper β’ 2510.14975 β’ Published 12 days ago β’ 79
Durian: Dual Reference-guided Portrait Animation with Attribute Transfer Paper β’ 2509.04434 β’ Published Sep 4 β’ 10
OpenAI-GPT 20B, 37B ,120B: Neo, reg, uncensored, ablit. Collection OpenAi's model in various sizes and formats, including NEO Imatrix, DI, Tri Matrix, Uncensored, Albiterated, and Brainstorm 20x (37B). β’ 8 items β’ Updated 17 days ago β’ 4
200+ Roleplay, Creative Writing, Uncensored, NSFW models. Collection Oldest models listed first, with Newest models at bottom of the page. Most repos have full examples, instructions, best settings and so on. β’ 299 items β’ Updated 3 days ago β’ 340
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper β’ 2508.18265 β’ Published Aug 25 β’ 201
T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper β’ 2507.05964 β’ Published Jul 8 β’ 118
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper β’ 2506.23918 β’ Published Jun 30 β’ 88
WebSailor: Navigating Super-human Reasoning for Web Agent Paper β’ 2507.02592 β’ Published Jul 3 β’ 120
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs Paper β’ 2506.21656 β’ Published Jun 26 β’ 14
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Paper β’ 2506.21356 β’ Published Jun 26 β’ 22
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation Paper β’ 2506.21416 β’ Published Jun 26 β’ 28
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper β’ 2506.17450 β’ Published Jun 20 β’ 63
Light of Normals: Unified Feature Representation for Universal Photometric Stereo Paper β’ 2506.18882 β’ Published Jun 23 β’ 89
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection Paper β’ 2506.09827 β’ Published Jun 11 β’ 20
MoA: Heterogeneous Mixture of Adapters for Parameter-Efficient Fine-Tuning of Large Language Models Paper β’ 2506.05928 β’ Published Jun 6 β’ 4
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Paper β’ 2506.07177 β’ Published Jun 8 β’ 22
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Paper β’ 2506.08887 β’ Published Jun 10 β’ 4
Aligning Text, Images, and 3D Structure Token-by-Token Paper β’ 2506.08002 β’ Published Jun 9 β’ 21