Llama3.2_MoE_1Bx8_working

Llama3.2_MoE_1Bx8_working is a Mixture of Experts (MoE) made with the following models using LazyMergekit:

🧩 Configuration

base_model: meta-llama/Llama-3.2-1B-Instruct
dtype: bfloat16
gate_mode: hidden

experts:
  # Expert 1: Casual Conversation & Creative Storytelling
  # Unique markers: narrative, character, dialogue, imagination
  - source_model: Divyansh008/Urvashi-1B-rp
    positive_prompts:
      - "Once upon a time there was"
      - "The character felt nervous because"
      - "She walked into the room and saw"
      - "In a distant galaxy far away"
      - "The wizard cast a spell that"
      - "Let's pretend we're adventurers who"
      - "Describe what the scene looks like"
      - "The hero's journey began when"

  # Expert 2: Programming & Software Engineering
  # Unique markers: code syntax, technical commands, development terms
  - source_model: dphn/Dolphin3.0-Llama3.2-1B
    positive_prompts:
      - "def function_name(parameters):"
      - "import pandas as pd"
      - "SELECT * FROM database WHERE"
      - "git commit -m 'fixed bug in'"
      - "const apiEndpoint = fetch()"
      - "class ClassName extends React.Component"
      - "docker-compose up --build"
      - "npm install package-name --save"

  # Expert 3: Pure Mathematics & Numerical Computation
  # Unique markers: mathematical symbols, equations, numbers
  - source_model: phamhai/Llama-3.2-1B-CyberFrog
    positive_prompts:
      - "∫(x² + 3x) dx ="
      - "lim(x→∞) of the function"
      - "Matrix multiplication A × B where"
      - "The probability P(X|Y) equals"
      - "Differentiate y = 5x³ - 2x² + 7"
      - "Find eigenvalues of matrix"
      - "Taylor series expansion of e^x"
      - "Calculate 847 × 653 step by step"

  # Expert 4: Hindi & Indic Languages Communication
  # Unique markers: Devanagari, Tamil, Telugu scripts, regional phrases
  - source_model: meta-llama/Llama-3.2-1B-Instruct
    positive_prompts:
      - "यह हिंदी में कैसे लिखें"
      - "मुझे बताइए कि आज का मौसम"
      - "தமிழில் இதை எப்படி"
      - "తెలుగులో ఎలా చెప్పాలి"
      - "ಕನ್ನಡದಲ್ಲಿ ಈ ಪದದ ಅರ್ಥ"
      - "मराठीत हे शब्द म्हणजे"
      - "ગુજરાતી ભાષામાં અનુવાદ કરો"
      - "বাংলা ভাষায় এই বাক্য"

  # Expert 5: Indian Cultural Heritage & Historical Knowledge
  # Unique markers: specific Indian names, festivals, historical figures
  - source_model: meta-llama/Llama-3.2-1B-Instruct
    positive_prompts:
      - "Emperor Ashoka's reign during Mauryan"
      - "Diwali celebration involves lighting diyas"
      - "Rabindranath Tagore wrote Gitanjali"
      - "Taj Mahal was built by Shah Jahan"
      - "Holi festival colors represent spring"
      - "Mahatma Gandhi led Salt March in"
      - "Kathakali dance originated in Kerala"
      - "Cricket World Cup victory in 1983"

  # Expert 6: Natural Sciences & Laboratory Research
  # Unique markers: scientific terminology, experimental methods, formulas
  - source_model: JingyaoLi/ScienceLLaMA-1b
    positive_prompts:
      - "Hypothesis: photosynthesis rate increases with"
      - "Experimental procedure: titrate HCl with"
      - "DNA replication occurs during S-phase"
      - "Newton's second law F=ma demonstrates"
      - "Mitochondria generate ATP through oxidative"
      - "Chemical formula C₆H₁₂O₆ represents glucose"
      - "Quantum entanglement phenomenon occurs when"
      - "Peer-reviewed study published in Nature"

  # Expert 7: Ancient Philosophy & Spiritual Texts
  # Unique markers: Sanskrit terms, philosophical concepts, scripture names
  - source_model: meta-llama/Llama-3.2-1B-Instruct
    positive_prompts:
      - "Bhagavad Gita Chapter 2 Verse 47"
      - "Advaita Vedanta teaches non-duality"
      - "Buddha's Four Noble Truths state"
      - "Upanishads describe Brahman as"
      - "Concept of Maya illusion explains"
      - "Patanjali's Yoga Sutras define chitta"
      - "Jain principle of Ahimsa nonviolence"
      - "Karma-phala means fruits of action"

  # Expert 8: Ayurveda & Traditional Wellness Practices
  # Unique markers: Ayurvedic terms, dosha names, traditional remedies
  - source_model: meta-llama/Llama-3.2-1B-Instruct
    positive_prompts:
      - "Vata dosha imbalance causes dryness"
      - "Turmeric milk recipe for immunity"
      - "Surya Namaskar yoga sequence benefits"
      - "Triphala churna powder cleanses digestive"
      - "Pitta constitution people should avoid"
      - "Pranayama Nadi Shodhana alternate nostril"
      - "Ashwagandha herb reduces cortisol stress"
      - "Kapha body type gains weight easily"

gate:
  type: topk
  k: 2
  capacity_factor: 1.5
  drop_tokens: false

💻 Usage

!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "Divyansh008/Llama3.2_MoE_1Bx8_working"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
82
Safetensors
Model size
7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Divyansh008/Llama3.2_Merge_MoE_1Bx8_v2