AI & ML interests
None defined yet.
Recent Activity
View all activity
nroggendorff
posted
an
update
1 day ago
Post
4381
deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
Post
1436
Two new tools added to the
Nymbo/Tools MCP server,
GPT-5-Codex proves that for many cases, shell commands really are all you need, and Claude Skills seem to lean into this. The thing is, nothing about the design of Claude Skills actually restricts them to proprietary models!
# File_System
There's a new directory inside the repo called
Files created in the public HF space live in the space's running container, and gets cleared when the space is restarted. When running the server locally, files are actually stored on disk.
# Shell_Exec
What good is a filesystem if you can't execute commands in that filesystem? This tool automatically detects if the server is running on Windows or Linux, and suggests using the appropriate shell (PowerShell/Bash). Both of these new tools require that the agent uses relative paths, rather than absolute paths. I could be convinced to back pedal on this.
# Closing Thoughts
The
File_System and Shell_Exec. You can theoretically do basically anything with these two tools, and it should enable support for many Claude Skills. GPT-5-Codex proves that for many cases, shell commands really are all you need, and Claude Skills seem to lean into this. The thing is, nothing about the design of Claude Skills actually restricts them to proprietary models!
# File_System
There's a new directory inside the repo called
Filesystem, that's the agent's "root". It can perform the following actions : list, read, write, append, mkdir, move, copy, delete, info, help. It's able to keep this all within the scope of one tool call by making the Action field required and all other fields optional. Using a filesystem shouldn't require 15 different tools.Files created in the public HF space live in the space's running container, and gets cleared when the space is restarted. When running the server locally, files are actually stored on disk.
# Shell_Exec
What good is a filesystem if you can't execute commands in that filesystem? This tool automatically detects if the server is running on Windows or Linux, and suggests using the appropriate shell (PowerShell/Bash). Both of these new tools require that the agent uses relative paths, rather than absolute paths. I could be convinced to back pedal on this.
# Closing Thoughts
The
File_System and Shell_Exec tools aren't super polished yet, I'll continue to improve the agent's instructions and UX of using the new tools. Most of my testing was done with gpt-oss-20b and if it messes up, it gets the gist after one failed tool call. It should work perfectly fine for the GPU poor.
nroggendorff
posted
an
update
8 days ago
Post
3307
I love getting emails telling me when there's somebody else's active access token in one of my commit SHAs. HF should really only tell you if it is your token, otherwise I could just make a dataset with a bunch of random strings and wait for a valid token.
Also, don't comment about how unlikely this is. I've gotten a warning email about a token I 'leaked' at least four times.
In all cases, it has been in the digest hash.
user,permission,token
nroggendorff,write,hf_...
pepper13,finegrained,hf_...
...,...,...
...Also, don't comment about how unlikely this is. I've gotten a warning email about a token I 'leaked' at least four times.
In all cases, it has been in the digest hash.
Post
1640
I've made some improvements to my custom Deep_Research tool in the
Nymbo/Tools MCP server. I've added a second LLM process and it still takes less than 1 minute to complete!
The original version of my Deep_Research tool would basically dump up to 50 fetched webpages onto the Researcher model (
# New "Filterer" Process
The new process includes another LLM call before the researcher process. The Filterer (also
# Researcher Context
The Researcher now gets only the relevant webpages, then begins writing the report. When testing with 50 initial results, the researcher would often end up with 10-20 results of relevant context.
It still takes less than a minute to accomplish everything, thanks entirely to Cerebras inference. It now takes about 35-45 seconds to complete once the tool is run.
It's also worth noting that both the Filterer and Researcher now are provided the current time/date before they see the content, reducing hallucinations caused by knowledge cutoffs.
The original version of my Deep_Research tool would basically dump up to 50 fetched webpages onto the Researcher model (
Qwen3-235B), with only a little bit of context shown from each page.# New "Filterer" Process
The new process includes another LLM call before the researcher process. The Filterer (also
Qwen3-235B) gets the query summary and the original 50 pages with low context, and decides which pages are most relevant to the research topic. The Filterer then outputs the URLs to the relevant pages, which are then re-fetched (with more context) and sent to the Researcher.# Researcher Context
The Researcher now gets only the relevant webpages, then begins writing the report. When testing with 50 initial results, the researcher would often end up with 10-20 results of relevant context.
It still takes less than a minute to accomplish everything, thanks entirely to Cerebras inference. It now takes about 35-45 seconds to complete once the tool is run.
It's also worth noting that both the Filterer and Researcher now are provided the current time/date before they see the content, reducing hallucinations caused by knowledge cutoffs.
Post
287
New Technique to Deeply Poison AI on Images and Prove Creative Provenance
I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.
Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.
You can see examples and learn more about how and WHY it works better than current methods:
https://severian-poisonous-shield-for-images.static.hf.space
If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.
This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.
I've developed a new method to protect creative work from unauthorized AI training. My Poisonous Shield for Images algorithm embeds a deep, removal-resistant poison into the mathematical structure of your images. It's designed to be toxic to machine learning models, achieving up to 20-348% disruption in AI training convergence in benchmark tests.
Unlike traditional watermarks, this protection survives compression and resizing and is not removed by standard tools. The technique also embeds cryptographic proof of provenance directly into the image, verifying ownership and detecting tampering.
You can see examples and learn more about how and WHY it works better than current methods:
https://severian-poisonous-shield-for-images.static.hf.space
If you are interested in using this technology to protect your work from AI training and unauthorized use, please reach out to me. It is currently in the prototype phase but fully functioning and effective. Still working on expanding it to a production-grade usable app.
This is not intended as a pure self-promotion post. I am genuinely wanting to help creators and want to gauge interest from different communities. I've spent the past year and a half building this from scratch with new math and code to try and solve this massive problem.
Post
3141
MLX port of BDH (Baby Dragon Hatchling) is up!
I’ve ported the BDH ( https://github.com/pathwaycom/bdh ) model to MLX for Apple Silicon. It’s a faithful conversion of the PyTorch version: same math, same architecture (byte-level vocab, shared weights across layers, ReLU sparsity, RoPE attention with Q=K), with MLX-friendly APIs and a detailed README explaining the few API-level differences and why results are equivalent.
Code, docs, and training script are ready to use. You may need to adjust the training script a bit to fit your own custom dataset. Only tested on M4 so far, but should work perfect for any M1/M2/M3 users out there.
I’m currently training this MLX build on my Internal Knowledge Map (IKM) dataset Severian/Internal-Knowledge-Map
Training’s underway; expect a day or so before I publish weights. When it’s done, I’ll upload the checkpoint to Hugging Face for anyone to test.
Repo: https://github.com/severian42/BDH-MLX
HF model (coming soon): Severian/BDH-MLX
If you try it on your own data, feedback and PRs are welcome.
I’ve ported the BDH ( https://github.com/pathwaycom/bdh ) model to MLX for Apple Silicon. It’s a faithful conversion of the PyTorch version: same math, same architecture (byte-level vocab, shared weights across layers, ReLU sparsity, RoPE attention with Q=K), with MLX-friendly APIs and a detailed README explaining the few API-level differences and why results are equivalent.
Code, docs, and training script are ready to use. You may need to adjust the training script a bit to fit your own custom dataset. Only tested on M4 so far, but should work perfect for any M1/M2/M3 users out there.
I’m currently training this MLX build on my Internal Knowledge Map (IKM) dataset Severian/Internal-Knowledge-Map
Training’s underway; expect a day or so before I publish weights. When it’s done, I’ll upload the checkpoint to Hugging Face for anyone to test.
Repo: https://github.com/severian42/BDH-MLX
HF model (coming soon): Severian/BDH-MLX
If you try it on your own data, feedback and PRs are welcome.
Post
2947
🚀 New blog: Maintain the unmaintainable – 1M+ Python LOC, 400+ models
How do you stop a million-line library built by thousands of contributors from collapsing under its own weight?
At 🤗 Transformers, we do it with explicit software-engineering tenets, principles that make the codebase hackable at scale.
🔍 Inside the post:
– One Model, One File: readability first — you can still open a modeling file and see the full logic, top to bottom.
– Modular Transformers: visible inheritance that cuts maintenance cost by ~15× while keeping models readable.
– Config-Driven Performance: FlashAttention, tensor parallelism, and attention scheduling are config-level features, not rewrites.
Written with @lysandre ,@pcuenq and @yonigozlan , this is a deep dive into how Transformers stays fast, open, and maintainable.
Read it here → transformers-community/Transformers-tenets
How do you stop a million-line library built by thousands of contributors from collapsing under its own weight?
At 🤗 Transformers, we do it with explicit software-engineering tenets, principles that make the codebase hackable at scale.
🔍 Inside the post:
– One Model, One File: readability first — you can still open a modeling file and see the full logic, top to bottom.
– Modular Transformers: visible inheritance that cuts maintenance cost by ~15× while keeping models readable.
– Config-Driven Performance: FlashAttention, tensor parallelism, and attention scheduling are config-level features, not rewrites.
Written with @lysandre ,@pcuenq and @yonigozlan , this is a deep dive into how Transformers stays fast, open, and maintainable.
Read it here → transformers-community/Transformers-tenets
Post
2123
Cool stuff these past weeks on huggingface! 🤗 🚀 !
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.co/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.co/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.co/blog/watermarking-with-gradio
• 📈Trackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
• 🌍EmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.co/blog/embeddinggemma
• 💻Open LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
• 🤖Smol2Operator GUI agents
https://huggingface.co/blog/smol2operator
• 🖼️Gradio visible watermarking
https://huggingface.co/blog/watermarking-with-gradio
nroggendorff
posted
an
update
25 days ago
nroggendorff
posted
an
update
about 1 month ago
Post
6576
large AI labs open-sourced a ton of models last week 🔥
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🤝
> IBM released a new Docling model with 258M params based on Granite (A2.0) 📝 ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset 💻 OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash 💭 meituan-longcat/LongCat-Flash-Thinking
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🤝
> IBM released a new Docling model with 258M params based on Granite (A2.0) 📝 ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset 💻 OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash 💭 meituan-longcat/LongCat-Flash-Thinking
Post
679
the french ministry of culture releases their first conversation datasets on huggingface 👇🏻
ministere-culture/comparia-conversations
ministere-culture/comparia-conversations
Post
1044
There's now a custom Deep_Research tool in my
Nymbo/Tools MCP server! TL;DR: The agent using the tools writes a summary of your requests and up to five DuckDuckGo searches (up to 50 results). Each of the webpages found in the searches are then fetched and given to our researcher (
# Deep_Research Tool
It accomplishes everything in under a minute so it doesn't hit MCP's 60 second timeout, mostly thanks to Cerebras. The only thing required to make this work is a
The Deep_Research tool could certainly be improved. It still needs some sort of mechanism for sorting URLs based on importance (I've got some ideas but I don't want it to be the responsibility of the agent using the tool). I'll probably add a second researcher to filter out the bad sources before inferencing the big researcher. I'm hellbent on keeping this all within the scope of one tool call.
# More Fetch/Web Search Improvements
The
The
More to come soon ~
Qwen3-235B-A22B-Thinking-2507). The researcher sees the summary, searched queries, and fetched links, then writes a thorough research report. The agent using the tool provides the user with a summary of the report and a link to download research_report.txt. The researcher's instructions are similar to some leaked Perplexity sys prompts.# Deep_Research Tool
It accomplishes everything in under a minute so it doesn't hit MCP's 60 second timeout, mostly thanks to Cerebras. The only thing required to make this work is a
HF_READ_TOKEN for inference. The Deep_Research tool could certainly be improved. It still needs some sort of mechanism for sorting URLs based on importance (I've got some ideas but I don't want it to be the responsibility of the agent using the tool). I'll probably add a second researcher to filter out the bad sources before inferencing the big researcher. I'm hellbent on keeping this all within the scope of one tool call.
# More Fetch/Web Search Improvements
The
Search_DuckDuckGo tool has been further enhanced. It now allows the agent to browse through all pages of results. The results also now include published date (if detected). It also now supports every DDG search types! Default DDG search is called text, but it can also now search by news, images, videos, and books. The
Fetch_Webpage tool now specifies how much of the page has been truncated, and cursor index, allowing it to pickup where it left off without re-consuming tokens. The model can now also choose to strip CSS selectors to remove excess noise, and there's a new URL Scraper mode that only returns URLs found on the full page.More to come soon ~
Post
3246
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face 🔥
> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license 👏 use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! 🤗
> built on SigLIP2 & granite-165M
model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo 💗
> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license 👏 use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! 🤗
> built on SigLIP2 & granite-165M
model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo 💗
freddyaboulton
posted
an
update
about 1 month ago
Post
1505
Gradio 6.0 is launching this year!
We're revamping the core to give you performance improvements and unprecedented customization. Build better, faster.
Check out the GitHub milestone to learn what's planned under the hood!
https://github.com/gradio-app/gradio/issues?q=is:issue%20state:open%20milestone:%22Gradio%206%22
We're revamping the core to give you performance improvements and unprecedented customization. Build better, faster.
Check out the GitHub milestone to learn what's planned under the hood!
https://github.com/gradio-app/gradio/issues?q=is:issue%20state:open%20milestone:%22Gradio%206%22
Post
667
COMPUTER CONTROL IS ON-DEVICE !
🏡🤖 78 % of EU smart-home owners DON’T trust cloud voice assistants.
So we killed the cloud.
Meet Exté: a palm-sized Android device that sees, hears & speaks your language - 100 % offline, 0 % data sent anywhere.
🔓 We submitted our technologies for consideration to the Liquid AI hackathon.
📊 Dataset: 79 k UI-action pairs on Hugging Face (largest Android-control corpus ever) Tonic/android-operator-episodes
⚡ Model: 98 % task accuracy, 678MB compressed , fits on existing android devices ! Tonic/l-android-control
🛤️ Experiment Tracker : check out the training on our TrackioApp Tonic/l-android-control
🎮 Live Model Demo: Upload an Android Screenshot and instructions to see the model in action ! Tonic/l-operator-demo
Built in a garage, funded by pre-orders, no VC. Now we’re scaling to 1 k installer units.
We’re giving 50 limited-edition prototypes to investors , installers & researchers who want to co-design the sovereign smart home.
👇 Drop “EUSKERA” in the comments if you want an invite, tag a friend who still thinks Alexa is “convenient,” and smash ♥️ if AI should belong to people - not servers.
🏡🤖 78 % of EU smart-home owners DON’T trust cloud voice assistants.
So we killed the cloud.
Meet Exté: a palm-sized Android device that sees, hears & speaks your language - 100 % offline, 0 % data sent anywhere.
🔓 We submitted our technologies for consideration to the Liquid AI hackathon.
📊 Dataset: 79 k UI-action pairs on Hugging Face (largest Android-control corpus ever) Tonic/android-operator-episodes
⚡ Model: 98 % task accuracy, 678MB compressed , fits on existing android devices ! Tonic/l-android-control
🛤️ Experiment Tracker : check out the training on our TrackioApp Tonic/l-android-control
🎮 Live Model Demo: Upload an Android Screenshot and instructions to see the model in action ! Tonic/l-operator-demo
Built in a garage, funded by pre-orders, no VC. Now we’re scaling to 1 k installer units.
We’re giving 50 limited-edition prototypes to investors , installers & researchers who want to co-design the sovereign smart home.
👇 Drop “EUSKERA” in the comments if you want an invite, tag a friend who still thinks Alexa is “convenient,” and smash ♥️ if AI should belong to people - not servers.
Post
1115
a ton of image/video generation models and LLMs from big labs 🔥
> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use 💬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR 📝
> ByteDance released bytedance-research/HuMo, video generation from any input ⏯️
find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use 💬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR 📝
> ByteDance released bytedance-research/HuMo, video generation from any input ⏯️
find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
ehristoforu
posted
an
update
about 1 month ago
Post
2128
🚀Hello from the Project Fluently team!
✨ We are happy to share with you our new universal LLM models based on Qwen3 1.7B and 4B — powerful, multilingual and ready to solve a wide range of problems!
🛠️ We have conducted additional training and carefully merged them to achieve even better results and maximize the potential of the models.
🆓 And most importantly — the models are completely open and free under the Apache-2.0 license!
🔗 Links to repositories:
- FluentlyQwen3-4B: fluently/FluentlyQwen3-4B
- FluentlyQwen3-1.7B: fluently/FluentlyQwen3-1.7B
😍 We will be very glad to hear your feedback and impressions! Your opinion is very important to us!
✨ We are happy to share with you our new universal LLM models based on Qwen3 1.7B and 4B — powerful, multilingual and ready to solve a wide range of problems!
🛠️ We have conducted additional training and carefully merged them to achieve even better results and maximize the potential of the models.
🆓 And most importantly — the models are completely open and free under the Apache-2.0 license!
🔗 Links to repositories:
- FluentlyQwen3-4B: fluently/FluentlyQwen3-4B
- FluentlyQwen3-1.7B: fluently/FluentlyQwen3-1.7B
😍 We will be very glad to hear your feedback and impressions! Your opinion is very important to us!