Quang Huy
NothingLQH
·
AI & ML interests
None yet
Recent Activity
updated
a collection
about 1 month ago
SpeechToText
updated
a collection
about 2 months ago
Image
updated
a collection
about 2 months ago
Image
Organizations
None yet
Translation
ControlVPS
ORC
Speech
-
facebook/wav2vec2-lv-60-espeak-cv-ft
Automatic Speech Recognition • Updated • 162k • 60 -
Running on T4439
Resemble Enhance
🚀439Enhance and denoise your audio files
-
pyannote/speaker-diarization-3.1
Automatic Speech Recognition • Updated • 13.9M • 1.44k -
Atotti/miipher-2-HuBERT-HiFi-GAN-v0.1
Updated • 3 • 14
ImageToVideo
-
Pushing the Boundaries of State Space Models for Image and Video Generation
Paper • 2502.00972 • Published -
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Paper • 2501.13920 • Published • 19 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 300 • • 345 -
IndexTeam/Index-anisora
Updated • 8 • 215
TextToText
NLP
3D
LiveImage
DatasetLanguage
Image
LLM
-
stepfun-ai/GOT-OCR2_0
Image-Text-to-Text • 0.7B • Updated • 82.9k • 1.53k -
Running on ZeroFeatured558
Midi Music Generator
🎼558Generate MIDI music from prompts
-
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 78 • 54 -
OpenGVLab/InternVL2_5-38B-MPO-AWQ
Image-Text-to-Text • Updated • 55 • 6
Automation
TextToVideo
VLM
-
FocusedAD: Character-centric Movie Audio Description
Paper • 2504.12157 • Published • 8 -
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper • 2504.10465 • Published • 27 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 19 -
OS-Copilot/OS-Atlas-Base-7B
Image-Text-to-Text • 8B • Updated • 377 • 42
Code
Prompt
Story
SpeechToText
-
Running1
Vietnamese Streaming RNN-T
💻1RNN-T with Whisper Encoder
-
erax-ai/EraX-WoW-Turbo-V1.0
Automatic Speech Recognition • 0.8B • Updated • 30 • 54 -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 2.71M • • 2.77k -
nvidia/canary-1b
Automatic Speech Recognition • Updated • 2.34k • 457
Anime
Video
IdeaMusic
Vistral-7B-Chat
TextToSpeech
MJ6
Automation
Translation
TextToVideo
ControlVPS
VLM
-
FocusedAD: Character-centric Movie Audio Description
Paper • 2504.12157 • Published • 8 -
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper • 2504.10465 • Published • 27 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 19 -
OS-Copilot/OS-Atlas-Base-7B
Image-Text-to-Text • 8B • Updated • 377 • 42
ORC
Code
Speech
-
facebook/wav2vec2-lv-60-espeak-cv-ft
Automatic Speech Recognition • Updated • 162k • 60 -
Running on T4439
Resemble Enhance
🚀439Enhance and denoise your audio files
-
pyannote/speaker-diarization-3.1
Automatic Speech Recognition • Updated • 13.9M • 1.44k -
Atotti/miipher-2-HuBERT-HiFi-GAN-v0.1
Updated • 3 • 14
Prompt
ImageToVideo
-
Pushing the Boundaries of State Space Models for Image and Video Generation
Paper • 2502.00972 • Published -
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Paper • 2501.13920 • Published • 19 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 300 • • 345 -
IndexTeam/Index-anisora
Updated • 8 • 215
Story
TextToText
SpeechToText
-
Running1
Vietnamese Streaming RNN-T
💻1RNN-T with Whisper Encoder
-
erax-ai/EraX-WoW-Turbo-V1.0
Automatic Speech Recognition • 0.8B • Updated • 30 • 54 -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 2.71M • • 2.77k -
nvidia/canary-1b
Automatic Speech Recognition • Updated • 2.34k • 457
NLP
Anime
3D
Video
LiveImage
IdeaMusic
DatasetLanguage
Vistral-7B-Chat
Image
TextToSpeech
LLM
-
stepfun-ai/GOT-OCR2_0
Image-Text-to-Text • 0.7B • Updated • 82.9k • 1.53k -
Running on ZeroFeatured558
Midi Music Generator
🎼558Generate MIDI music from prompts
-
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 78 • 54 -
OpenGVLab/InternVL2_5-38B-MPO-AWQ
Image-Text-to-Text • Updated • 55 • 6