MultiTransformer (Multi🤖Transformers)

mindchain

posted an update 7 days ago

Post

2867

Claude Code Self & Continual Learning

Hey everyone! 👋

30 GitHub Stars in 4 Days - Thank You!

I'm really grateful for the positive response to the Claude Reflect System. In just 4 days, 30 developers have shown interest by starring the project. Thank you so much!

What Is Claude Reflect?

Correct once, never again. Claude Reflect helps Claude Code remember your corrections and preferences across sessions. Instead of repeating the same feedback, the system learns and applies it automatically.

Main Features:

🧠 Learning System
- Detects corrections and preferences from conversations
- Stores them permanently in skill files
- Applies learnings in future sessions

🔒 Safety First
- Automatic backups before changes
- YAML validation
- Git version control

⚡ Two Modes
- Manual: Run /reflect when you want
- Auto: Reflects automatically at session end

How It Works

If you correct Claude to use pytest instead of unittest, this preference gets saved. Next time, Claude will remember and use pytest automatically. It's that simple.

Getting Started

1. Clone the repository
2. Install dependencies
3. Activate the skill
4. Try it out!

The python-project-creator example shows how the system learns from your feedback.

Give It a Try

https://github.com/haddock-development/claude-reflect-system

Feel free to check it out, give feedback, or contribute. Every bit of input helps improve the project!

Thank you so much for your support!

---
#ClaudeCode #AI #MachineLearning #ContinualLearning #OpenSource #Developer #Coding #Python #Productivity #DevTools #GitHub #SoftwareDevelopment #Programming #AIAssistant #DeveloperTools #CodeQuality #Tech

Feel free to give it a try by yourself.
https://github.com/haddock-development/claude-reflect-system

2 replies

·

mindchain

posted an update 11 days ago

Post

1705

Scaling Physical AI: SAM 3D, NVIDIA Cosmos, and Unreal Engine!

The "Sim-to-Real" gap is officially history. In early 2026, we are no longer just rendering data; we are simulating reality. By bridging Meta’s SAM 3D, Unreal Engine, and the NVIDIA Cosmos suite, we’ve built an autonomous pipeline for Physical AI that evolves itself.

The 2026 Tech Stack:
SAM 3D: Generates high-fidelity digital twins from 2D photos in seconds.

Unreal Engine + MCP: The AI "Director" orchestrates environments via the Model Context Protocol, providing perfect Ground Truth.

NeMo Data Designer: The orchestration hub on GitHub. Following NVIDIA’s acquisition of Gretel in early 2025, its leading generative privacy and tabular tech are now fully integrated here.

NVIDIA Cosmos Transfer: Neural rendering that adds hyper-realism to Unreal Engine outputs.

NVIDIA Cosmos Predict: Predicts physically accurate motion (falling, sliding) without manual animation.

NVIDIA Cosmos Reason: The automated supervisor checking every frame for logical and physical consistency.

The Workflow:
Asset Capture: SAM 3D turns real-world photos into Nanite meshes for Unreal Engine.

Orchestration: NeMo Data Designer (with Gretel-powered integrity) defines the data schema, while AI builds the world in Unreal Engine.

Completion: NVIDIA Cosmos (Transfer & Predict) adds photorealism and physics, while NVIDIA Cosmos Reason guarantees quality.

By combining Gretel’s data heritage with the visual power of Unreal Engine, we generate 100,000 perfect frames per hour. Weights and tools are on Hugging Face. Stop labeling. Start simulating.

#PhysicalAI #SAM3D #NVIDIACosmos #UnrealEngine #NeMo #Gretel #SyntheticData #HuggingFace #Robotics #AI #ComputerVision

mindchain

posted an update 12 days ago

Post

2047

Skill Reflect: A Concept for Automated AI Skill Mastery

Let’s be real for a second: most of us are using AI all wrong. We send a prompt, get a "meh" answer, and then spend twenty minutes fixing it ourselves. That’s not a workflow; that’s just a digital chore. I wanted to see if I could push Claude further—to see if I could build a system that actually learns and refines itself. That’s how the Claude-Reflect-System (Skill Reflect) was born.

But here’s the thing: this isn’t some polished, final product. It’s a concept. It’s a blueprint. I’ve built the foundation of a recursive reflection loop that forces the AI to step back, look at its work, and act as its own harshest critic. It identifies the "skill delta"—the gap between "okay" and "mastery"—and closes it. This logic isn't just for Claude; you can grab this architecture and drop it right into codex-cli, terminal agents, or whatever stack you're building.

I’m a big believer in the law of causality. Action, reaction. Cause and effect. If you control the cause—the way the AI thinks about its mistakes—you dictate the effect: a perfected skill. This is a playground for builders who are tired of stochastic guessing. I want you to take this. Fork it. Break it. Make it better. This is an open invitation to the community to take this reflection loop and see how far we can push the boundaries of agentic reasoning. Whether you're building Claude Code plugins or just want to automate your self-learning, the code is there for you to smash. Stop accepting the first draft. Let’s build something that actually thinks.

https://github.com/haddock-development/claude-reflect-system

#Skills #ClaudeCode #ClaudeCodeSkills #ClaudeCodePlugins #ClaudeCodeMarketplace #CodexCLI #AI #SelfLearning #Automation #OpenSource #LLM #Reasoning #Causality #Matrix #Concept

mindchain

posted an update 13 days ago

Post

1832

Neural Traffic Control: Orchestrating Multi-Path Reasoning 🚥
The future of AI isn't just about "better" models—it’s about high-precision orchestration. We are moving from linear processing to Parallel MTP-Reasoning, where we manage neural traffic across stabilized, transparent, and recursive highways.

1️⃣ The Backbone: Stabilized High-Dimensional Routing (arXiv:2512.24880) Using DeepSeek’s mHC (Manifold-Constrained Hyper-Connections), we solve the instability of deep MoE architectures. By projecting weight updates onto the Birkhoff Polytope, we ensure that our "Simpsons-style" expert lanes maintain mathematical identity. This is the hardware-level stability needed to run multiple reasoning paths without collapse.

2️⃣ The Vision: Gemma Scope 2 & Feature Steering You can't steer what you can't see. Gemma Scope 2 provides the "X-ray" for our highways. By using Sparse Autoencoders (SAEs), our Meta-Controller identifies the active features in each expert lane. We don't just route data; we route intent by monitoring feature-drift in real-time.

3️⃣ The Logic: Recursive Open Meta-Agents (arXiv:2512.24601) We integrate the ROMA (Recursive Open Meta-Agent) framework. Instead of a flat response, the model operates in a recursive loop, refining its internal state before any output occurs. This is the "brain" of our [Meta-Controller GitHub Repo], enabling the model to simulate and discard weak logic internally.

4️⃣ The Simulation: Parallel MTP-Reasoning This is where it comes together: Multi-Token Prediction (MTP) meets Parallel Simulation. Our Python-driven controller runs three parallel Gemma 3 instances.

The Process: 3 paths generated simultaneously.

The Filter: A 500-token lookahead window.

The Decision: The Meta-Controller uses SAE-data from Gemma Scope to select the path with the highest logical fidelity.

The Result: A self-correcting, transparent, and multi-threaded reasoning engine. We aren't just scaling parameters; we are scaling architectural precision. 🧠

mindchain

posted an update 15 days ago

Post

3650

The Architecture of 2026: Beyond the Token Trap 🚀

We are witnessing a tectonic shift in Transformer architecture. It’s no longer just about "predicting the next token"—it’s about executing latent plans on a high-speed data highway.

What happens when we combine DeepSeek’s stability with Google’s strategic intelligence?

1️⃣ The Infrastructure: DeepSeek’s mHC Moving from a single-lane residual stream to a multi-lane highway. Using the Birkhoff Polytope, mHC ensures mathematical stability (Identity Mapping) while routing specialized data through dedicated lanes.

2️⃣ The Intelligence: Google’s Meta-Controller An internal AI unit that lives inside the Transformer. It escapes the "Token Trap" by extracting data to create a latent plan, steering the model via Temporal Abstraction.

The Synergy: In a Topological Transformer, the Meta-Controller finally has the "dedicated lanes" it needs to steer complex reasoning without causing gradient explosions.

We aren't just making models bigger; we are making them architecturally smarter. 🧠

#MachineLearning #DeepSeek #GoogleAI #Transformer #AIArchitecture

Reubencf

posted an update 15 days ago

Post

3179

Happy New Year 2026
i have planned to build many things this year , most of them will be cheaper or free alternative's to paid products

i am looking forward to release some useful spaces ✌️ Stay Tuned !

1 reply

·

Reubencf

posted an update 19 days ago

Post

2683

As 2025 is ending i would like to thank everyone for trying out
Reubencf/Nano_Banana_Editor

looking forward to build and release more in the future for the open source community

Parveshiiii

posted an update 27 days ago

Post

3556

Hey everyone!
We’re excited to introduce our new Telegram group: https://t.me/XenArcAI

This space is built for **model builders, tech enthusiasts, and developers** who want to learn, share, and grow together. Whether you’re just starting out or already deep into AI/ML, you’ll find a supportive community ready to help with knowledge, ideas, and collaboration.

💡 Join us to:
- Connect with fellow developers and AI enthusiasts
- Share your projects, insights, and questions
- Learn from others and contribute to a growing knowledge base

👉 If you’re interested, hop in and be part of the conversation: https://t.me/XenArcAI

12 replies

·

daavoo

posted an update 30 days ago

Post

1847

2025: The Year of Agents.
2026: The Year of Local Agents?

Relying on cloud-hosted LLMs is often overkill. While frontier models still lead in complex coding, local models are now more than capable of handling many agentic workflows—with zero latency and total privacy.

To help bridge the gap between local inference and usable agents, I’m releasing agent.cpp: https://github.com/mozilla-ai/agent.cpp

It provides minimal, high-performance building blocks for agents in C++, built directly around the awesome llama.cpp ecosystem.
Stop sending your data to a remote API. Start building and running agents on your own hardware.

1 reply

·

Reubencf

posted an update about 1 month ago

Post

4840

Great News !
Reubencf/Nano_Banana_Editor Now supports black-forest-labs/FLUX.1-Kontext-dev and Qwen/Qwen-Image-Edit-2509

Just log in with Huggingface and try it out

KingNish

posted an update about 1 month ago

Post

2544

Muon vs MuonClip vs Muon+Adamw

Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.

Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.

Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.

Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.

Full Blog Link: https://huggingface.co/blog/KingNish/optimizer-part1

KingNish

posted an update about 1 month ago

Post

2538

I tested Muon vs MuonClip vs Muon+AdamW for fine-tuning LLMs
Just published a blog on that, Read here 👉 https://huggingface.co/blog/KingNish/optimizer-part1

1 reply

·

hesamation

posted an update about 2 months ago

Post

2942

this is big... 50 AI researchers from Bytedance, Alibaba, Tencent, and other labs/universities just published a 300-page paper with surprising lessons about coding models and agents (data, pre and post-training, etc).

key highlights:

> small LLMs can beat proprietary giants
RL (RLVR specifically) gives small open-source models an edge over big models in reasoning. a 14B model trained with RLVR on high-quality verified problems can match the performance of OpenAI's o3.

> models have a hard time learning Python.
mixing language models during pre-training is good, but Python behaves different from statically typed languages. languages with similar syntax (Java and C#, or JavaScript and TypeScript) creates high positive synergy. mixing Python heavily into the training of statically typed languages can actually hurt because of Python's dynamic typing.

> not all languages are equal (coding scaling laws)
the amount of data required to specialize a model on a language drastically depends on the language. paper argues like C# and Java are easier to learn (less training data required). languages like Python and Javascript are actually more tricky to learn, ironically (you see AI most used for these languages :)

> MoE vs Dense (ability vs stability)
MoE models offer higher capacity, but are much more fragile during SFT than dense models. hyperparams in training have a more drastic effect in MoE models, while dense models are more stable. MoE models also require constant learning rate schedules to avoid routing instability.

> code models are "insecure" by default (duh)
training on public repos makes models learn years of accumulated insecure coding patterns. safety fine-tuning often fails to work much on code. a model might refuse to write a hate speech email but will happily generate a SQL-injection vulnerable function because it "works."

read the full paper:
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence (2511.18538)

1 reply

·

Reubencf

posted an update about 2 months ago

Post

2512

Hey everyone! 👋

I am thrilled to present MCP-1st-Birthday/Reuben_OS my submission for the Hugging Face MCP 1st Birthday Hackathon (Creative Track).

ReubenOS is a virtual cloud-based operating system designed specifically to act as a backend for Claude Desktop via the Model Context Protocol (MCP). It gives Claude a persistent environment to work in!

✨ Key Features

* 📱 Flutter IDE: Claude can write Flutter code and I can view/execute the files directly in the ReubenOS dashboard.
* 🎵 AI Audio Studio: Integrated with ElevenLabs to generate songs and voiceovers from text prompts within Claude.
* 🔒 Secure File System: A passkey-protected file system (private & public folders) to store code, JSON, and documents.
* 🧠 Gemini Integration: Access Google's Gemini model directly inside the OS.
* 📝 Quiz Engine: Ask Claude to "Create a Python quiz," and it deploys a graded interactive quiz to the web instantly.

9 replies

·

Parveshiiii

posted an update 2 months ago

Post

1648

Another banger from XenArcAI! 🔥

We’re thrilled to unveil three powerful new releases that push the boundaries of AI research and development:

🔗 https://huggingface.co/XenArcAI/SparkEmbedding-300m

- A lightning-fast embedding model built for scale.
- Optimized for semantic search, clustering, and representation learning.

🔗 https://huggingface.co/datasets/XenArcAI/CodeX-7M-Non-Thinking

- A massive dataset of 7 million code samples.
- Designed for training models on raw coding patterns without reasoning layers.

🔗 https://huggingface.co/datasets/XenArcAI/CodeX-2M-Thinking

- A curated dataset of 2 million code samples.
- Focused on reasoning-driven coding tasks, enabling smarter AI coding assistants.

Together, these projects represent a leap forward in building smarter, faster, and more capable AI systems.

💡 Innovation meets dedication.
🌍 Knowledge meets responsibility.

Parveshiiii

posted an update 2 months ago

Post

3049

SparkEmbedding - SoTA cross lingual retrieval

Iam very happy to announce our latest embedding model sparkembedding-300m base on embeddinggemma-300m we fine tuned it on 1m extra examples spanning over 119 languages and result is this model achieves exceptional cross lingual retrieval

Model: https://huggingface.co/XenArcAI/SparkEmbedding-300m

lunarflu

posted an update 2 months ago

Post

922

The #1 trending AI/ML dataset today 🏆

Massive scale, diversity and end-to-end potential from nvidia !
nvidia/PhysicalAI-Autonomous-Vehicles

lunarflu

posted an update 2 months ago

Post

663

The new King 👑has arrived!

Moonshot AI now the top model on Hugging Face 🔥
moonshotai/Kimi-K2-Thinking

lunarflu

posted an update 2 months ago

Post

2776

💸🤑You don’t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on 🤗 :
HuggingFaceTB/smol-training-playbook

Parveshiiii

posted an update 3 months ago

Post

214

AIRealNet - SoTA - Image detection model

We’re proud to release AIRealNet — a binary image classifier built to detect whether an image is AI-generated or a real human photograph. Based on SwinV2 and fine-tuned on the AI-vs-Real dataset, this model is optimized for high-accuracy classification across diverse visual domains.

If you care about synthetic media detection or want to explore the frontier of AI vs human realism, we’d love your support. Please like the model and try it out. Every download helps us improve and expand future versions.

Model page: https://huggingface.co/XenArcAI/AIRealNet

AI & ML interests

Team members 106

MultiTransformer's activity