Sam Heutmaker
commited on
Commit
·
81d530d
1
Parent(s):
71b9500
update readme
Browse files
README.md
CHANGED
|
@@ -41,7 +41,7 @@ model-index:
|
|
| 41 |
|
| 42 |
**ClipTagger-12b** is a 12-billion parameter vision-language model (VLM) designed for video understanding at massive scale. Developed by [Inference.net](https://inference.net) in collaboration with [Grass](https://grass.io), this model was created to meet the demanding requirements of trillion-scale video frame captioning workloads, without sacrificing output quality.
|
| 43 |
|
| 44 |
-
**ClipTagger-12b
|
| 45 |
|
| 46 |
The model generates structured, schema-consistent JSON outputs for every video frame, making it ideal for building searchable video databases, content moderation systems, and accessibility tools. It maintains temporal consistency across frames while delivering frontier-quality performance at a fraction of the cost of closed-source alternatives.
|
| 47 |
|
|
|
|
| 41 |
|
| 42 |
**ClipTagger-12b** is a 12-billion parameter vision-language model (VLM) designed for video understanding at massive scale. Developed by [Inference.net](https://inference.net) in collaboration with [Grass](https://grass.io), this model was created to meet the demanding requirements of trillion-scale video frame captioning workloads, without sacrificing output quality.
|
| 43 |
|
| 44 |
+
**ClipTagger-12b exceeds or matches the performance of GPT-4.1 and Claude 4 Sonnet, while costing 15x less per generation.**
|
| 45 |
|
| 46 |
The model generates structured, schema-consistent JSON outputs for every video frame, making it ideal for building searchable video databases, content moderation systems, and accessibility tools. It maintains temporal consistency across frames while delivering frontier-quality performance at a fraction of the cost of closed-source alternatives.
|
| 47 |
|