Shikhar Singh's picture

100 462

Shikhar Singh

AxAI

·

axe--

AI & ML interests

Commonsense & Language Grounding

Recent Activity

liked a model about 4 hours ago

nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16

liked a Space 1 day ago

VAST-AI/MIDI-3D

liked a Space 1 day ago

EduardoPacheco/Grounding-Dino-Inference

View all activity

Organizations

None yet

upvoted an article 4 days ago

Article

Fine-Tune a Semantic Segmentation Model with a Custom Dataset

Mar 17, 2022

• 29

upvoted 3 articles 9 days ago

Article

Fine-Tune ViT for Image Classification with 🤗 Transformers

Feb 11, 2022

• 52

Article

PP-OCRv5 on Hugging Face: A Specialized Approach to OCR

By

and 5 others •

Sep 10

• 108

Article

Supercharge your OCR Pipelines with Open Models

12 days ago

• 217

upvoted an article about 2 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21

• 225

upvoted a collection about 2 months ago

August 29 Releases

40 items • Updated Sep 1 • 7

upvoted a paper 2 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 202

upvoted an article 3 months ago

Article

Faster fine-tuning using TRL & Unsloth

Jan 10, 2024

• 73

upvoted 2 collections 3 months ago

Qwen2.5-VL (All Versions)

All versions of Qwen2.5-VL including the new 32B version and 4-bit, 16-bit and more! • 16 items • Updated 2 days ago • 21

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 544

upvoted 4 articles 4 months ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Mar 12

• 468

Article

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

By

and 11 others •

Jun 27

• 29

Article

Vision Language Models (Better, Faster, Stronger)

May 12

• 558

Article

Gemma 3n fully available in the open-source ecosystem!

Jun 26

• 118

upvoted an article 5 months ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

By

•

Jul 29, 2024

• 364

upvoted a collection 6 months ago

Describe Anything

Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 11 days ago • 58

upvoted a paper 7 months ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 117

upvoted an article 7 months ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

Apr 5

• 145

upvoted a collection 7 months ago

Whisper Release

Whisper includes both English-only and multilingual checkpoints for ASR and ST, ranging from 38M params for the tiny models to 1.5B params for large. • 12 items • Updated Sep 13, 2023 • 141

upvoted a collection 8 months ago

Gemma 3 Release

28 items • Updated Aug 11 • 522