Matricardi Fabio's picture

Matricardi Fabio

FM-1976

·

https://medium.com/@fabio.matricardi

AI & ML interests

control system engineering, AI, LLM with python. ThePoorGPUguy on substack

Recent Activity

liked a model about 22 hours ago

LiquidAI/LFM2-2.6B-Exp

reacted to codelion's post with 🚀 about 22 hours ago

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m

liked a model about 22 hours ago

codelion/dhara-70m

View all activity

Organizations

None yet

FM-1976 's models 11

FM-1976/gemma-2b-docjoybot-lora-F16-GGUF

10.4M • Updated May 9 • 14 • 1

FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF

8B • Updated May 9 • 15 • 1

FM-1976/Qwen-1.5B-Tweet-Generations-F16-GGUF

2.18M • Updated May 8 • 12 • 1

FM-1976/SmolLM2-360M-it-llamafile

Text Generation • Updated Apr 15 • 12

FM-1976/Qwen2.5-1.6b-llamafile

Text Generation • Updated Apr 15 • 23 • 1

FM-1976/Lite-Oute-1-300M-Instruct-openvino

Text Generation • Updated Mar 7 • 10

FM-1976/stablelm-zephyr-3b-openvino-4bit

Updated Feb 24 • 16

FM-1976/ov_Llama-SmolTalk-3.2-1B-Instruct

Text Generation • Updated Nov 29, 2024 • 14

FM-1976/ov_NuExtract-1.5-tiny

Text Generation • Updated Nov 29, 2024 • 10

FM-1976/NuExtract-1.5-tiny-ONNX

Updated Nov 28, 2024 • 9

FM-1976/gemma-2-2b-it-Q5_K_M-GGUF

Text Generation • 3B • Updated Oct 13, 2024 • 14 • 1