17 70 130

archit PRO

archit11

archit-spec

AI & ML interests

small language models

Recent Activity

upvoted an article 17 days ago

Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo

updated a dataset about 1 month ago

archit11/claude_code_traces_dirty

published a dataset about 1 month ago

archit11/claude_code_traces_dirty

View all activity

Organizations

upvoted an article 17 days ago

Article

Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo

Dec 23, 2024

•

upvoted an article about 2 months ago

Article

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement

Nov 7, 2025

•

upvoted 2 articles 5 months ago

Article

From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels

Aug 18, 2025

•

Article

How to Run a Hugging Face Model in JAX (Part 1)

Jul 20, 2025

•

upvoted a paper 5 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 158

upvoted 3 articles 6 months ago

Article

You could have designed state of the art positional encoding

Nov 25, 2024

•

429

Article

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

Jun 26, 2025

•

Article

G2P Shrinks Speech Models

Feb 5, 2025

•

upvoted 3 articles 7 months ago

Article

State of open video generation models in Diffusers

Jan 27, 2025

•

Article

How Long Prompts Block Other Requests - Optimizing LLM Performance

Jun 12, 2025

•

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Apr 16, 2025

•

upvoted 2 papers 7 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 187

upvoted an article 9 months ago

Article

Enabling Long Context Training with Sequence Parallelism in Axolotl

Apr 4, 2025

•

upvoted an article 10 months ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21, 2025

•

193

upvoted an article 11 months ago

Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

Aug 4, 2024

•

upvoted 3 collections 11 months ago

upvoted an article 11 months ago

Article

How to deploy and fine-tune DeepSeek models on AWS

Jan 30, 2025

•

archit PRO

AI & ML interests

Recent Activity

Organizations

archit11's activity

Controlling Language Model Generation with NVIDIA's LogitsProcessorZoo

Exploring Direct Tensor Manipulation in Language Models: A Case Study in Binary-Level Model Enhancement

From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels

How to Run a Hugging Face Model in JAX (Part 1)

You could have designed state of the art positional encoding

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

G2P Shrinks Speech Models

State of open video generation models in Diffusers

How Long Prompts Block Other Requests - Optimizing LLM Performance

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

Enabling Long Context Training with Sequence Parallelism in Axolotl

SigLIP 2: A better multilingual vision language encoder

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

How to deploy and fine-tune DeepSeek models on AWS