Llama3.2_MoE_1Bx8_working
Llama3.2_MoE_1Bx8_working is a Mixture of Experts (MoE) made with the following models using LazyMergekit:
- Divyansh008/Urvashi-1B-rp
- dphn/Dolphin3.0-Llama3.2-1B
- phamhai/Llama-3.2-1B-CyberFrog
- meta-llama/Llama-3.2-1B-Instruct
- meta-llama/Llama-3.2-1B-Instruct
- JingyaoLi/ScienceLLaMA-1b
- meta-llama/Llama-3.2-1B-Instruct
- meta-llama/Llama-3.2-1B-Instruct
🧩 Configuration
base_model: meta-llama/Llama-3.2-1B-Instruct
dtype: bfloat16
gate_mode: hidden
experts:
# Expert 1: Casual Conversation & Creative Storytelling
# Unique markers: narrative, character, dialogue, imagination
- source_model: Divyansh008/Urvashi-1B-rp
positive_prompts:
- "Once upon a time there was"
- "The character felt nervous because"
- "She walked into the room and saw"
- "In a distant galaxy far away"
- "The wizard cast a spell that"
- "Let's pretend we're adventurers who"
- "Describe what the scene looks like"
- "The hero's journey began when"
# Expert 2: Programming & Software Engineering
# Unique markers: code syntax, technical commands, development terms
- source_model: dphn/Dolphin3.0-Llama3.2-1B
positive_prompts:
- "def function_name(parameters):"
- "import pandas as pd"
- "SELECT * FROM database WHERE"
- "git commit -m 'fixed bug in'"
- "const apiEndpoint = fetch()"
- "class ClassName extends React.Component"
- "docker-compose up --build"
- "npm install package-name --save"
# Expert 3: Pure Mathematics & Numerical Computation
# Unique markers: mathematical symbols, equations, numbers
- source_model: phamhai/Llama-3.2-1B-CyberFrog
positive_prompts:
- "∫(x² + 3x) dx ="
- "lim(x→∞) of the function"
- "Matrix multiplication A × B where"
- "The probability P(X|Y) equals"
- "Differentiate y = 5x³ - 2x² + 7"
- "Find eigenvalues of matrix"
- "Taylor series expansion of e^x"
- "Calculate 847 × 653 step by step"
# Expert 4: Hindi & Indic Languages Communication
# Unique markers: Devanagari, Tamil, Telugu scripts, regional phrases
- source_model: meta-llama/Llama-3.2-1B-Instruct
positive_prompts:
- "यह हिंदी में कैसे लिखें"
- "मुझे बताइए कि आज का मौसम"
- "தமிழில் இதை எப்படி"
- "తెలుగులో ఎలా చెప్పాలి"
- "ಕನ್ನಡದಲ್ಲಿ ಈ ಪದದ ಅರ್ಥ"
- "मराठीत हे शब्द म्हणजे"
- "ગુજરાતી ભાષામાં અનુવાદ કરો"
- "বাংলা ভাষায় এই বাক্য"
# Expert 5: Indian Cultural Heritage & Historical Knowledge
# Unique markers: specific Indian names, festivals, historical figures
- source_model: meta-llama/Llama-3.2-1B-Instruct
positive_prompts:
- "Emperor Ashoka's reign during Mauryan"
- "Diwali celebration involves lighting diyas"
- "Rabindranath Tagore wrote Gitanjali"
- "Taj Mahal was built by Shah Jahan"
- "Holi festival colors represent spring"
- "Mahatma Gandhi led Salt March in"
- "Kathakali dance originated in Kerala"
- "Cricket World Cup victory in 1983"
# Expert 6: Natural Sciences & Laboratory Research
# Unique markers: scientific terminology, experimental methods, formulas
- source_model: JingyaoLi/ScienceLLaMA-1b
positive_prompts:
- "Hypothesis: photosynthesis rate increases with"
- "Experimental procedure: titrate HCl with"
- "DNA replication occurs during S-phase"
- "Newton's second law F=ma demonstrates"
- "Mitochondria generate ATP through oxidative"
- "Chemical formula C₆H₁₂O₆ represents glucose"
- "Quantum entanglement phenomenon occurs when"
- "Peer-reviewed study published in Nature"
# Expert 7: Ancient Philosophy & Spiritual Texts
# Unique markers: Sanskrit terms, philosophical concepts, scripture names
- source_model: meta-llama/Llama-3.2-1B-Instruct
positive_prompts:
- "Bhagavad Gita Chapter 2 Verse 47"
- "Advaita Vedanta teaches non-duality"
- "Buddha's Four Noble Truths state"
- "Upanishads describe Brahman as"
- "Concept of Maya illusion explains"
- "Patanjali's Yoga Sutras define chitta"
- "Jain principle of Ahimsa nonviolence"
- "Karma-phala means fruits of action"
# Expert 8: Ayurveda & Traditional Wellness Practices
# Unique markers: Ayurvedic terms, dosha names, traditional remedies
- source_model: meta-llama/Llama-3.2-1B-Instruct
positive_prompts:
- "Vata dosha imbalance causes dryness"
- "Turmeric milk recipe for immunity"
- "Surya Namaskar yoga sequence benefits"
- "Triphala churna powder cleanses digestive"
- "Pitta constitution people should avoid"
- "Pranayama Nadi Shodhana alternate nostril"
- "Ashwagandha herb reduces cortisol stress"
- "Kapha body type gains weight easily"
gate:
type: topk
k: 2
capacity_factor: 1.5
drop_tokens: false
💻 Usage
!pip install -qU transformers bitsandbytes accelerate
from transformers import AutoTokenizer
import transformers
import torch
model = "Divyansh008/Llama3.2_MoE_1Bx8_working"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)
messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
- Downloads last month
- 82
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support