MaplePT-Mini (v1)
A Sovereign Canadian Language Model by CanXP AI
- MaplePT-Mini (v1) was originally based on Microsoft Phi-3 Mini 4K Instruct, developed by CanXP AI to align with Canadian linguistic, cultural, and ethical norms.
- It emphasizes factual precision, privacy, and the use of Canadian spelling and metric units, ensuring culturally appropriate and responsible AI interactions.
- MaplePT is part of CanXP’s mission to create sovereign Canadian AI infrastructure that reflects the nation’s laws, values, and innovation leadership.
Model Overview
| Property | Description |
|---|---|
| Model | canxp-ai/canxpai-maplept-mini-v1 |
| Model Size | 4B parameters |
| Architecture | Phi-3 Mini 4K (Transformer Decoder) |
| Context Length | 4,096 tokens |
| Language | English (Canadian) with limited French support |
| License | MIT |
| Release Date | October 2025 |
| Status | Research + Experimental |
| Primary Focus | Canadian cultural alignment and ethical response management |
| Use Cases | Education, journalism, AI assistants, and research |
| Not For | Roleplay, romance, or adult/explicit content |
Alignment & Safety
MaplePT-Mini was trained with a cultural and ethical safety objective designed to:
- ✅ Refuse sexual, romantic, or roleplay prompts
- ✅ Redirect adult or unsafe content toward factual discussion
- ✅ Maintain a professional, journalistic tone
- ✅ Use metric measurements and Canadian English (colour, kilometre, Celsius, etc.)
- ✅ Respond neutrally on political or religious issues
- ✅ Avoid hallucinated disclaimers and redundant apologies
- ✅ Enforce privacy, transparency, and factual integrity
Training Configuration
| Setting | Value |
|---|---|
| Base Model | Phi-3 Mini 4K Instruct |
| Fine-Tune Method | LoRA (QLoRA 4-bit quantization) |
| Frameworks | PyTorch 2.4, Transformers 4.44, PEFT, Accelerate |
| Optimizer | paged_adamw_8bit |
| Learning Rate | 2e-4 (cosine schedule with 5% warmup) |
| Epochs | 2 |
| Effective Batch Size | 8 (gradient accumulation ×8) |
| Sequence Length | 4,096 tokens |
| Precision | bf16 / fp16 mixed |
| Eval Steps | Every 500 steps |
| Save Strategy | Best-loss checkpoint |
| LR Scheduler | Cosine annealing |
| Gradient Checkpointing | Enabled |
| Warmup Ratio | 0.05 |
| Training Duration | ~270 hours on dual GPUs |
Hardware Environment
| Node | GPU | VRAM | CPU | RAM | OS | Notes |
|---|---|---|---|---|---|---|
| Node 1 | NVIDIA RTX 4080 | 16 GB | AMD Ryzen 9 7950X | 64 GB | Ubuntu 22.04 LTS | Primary training node |
| Node 2 | NVIDIA RTX 3090 | 24 GB | AMD Ryzen 9 5950X | 64 GB | Ubuntu 22.04 LTS | Secondary distributed node |
| Cluster | 2-node LAN | torchrun distributed training | Accelerate multi-process setup | |||
| Storage | NVMe SSD | Local dataset caching for performance |
📚 Training Datasets
| Dataset | Description | Purpose |
|---|---|---|
| MaplePT-Identity v3 | Canadian culture, geography, and identity dataset | Teach model national identity and tone |
| MaplePT-Safety-200 | Refusal and ethical alignment dataset | Reinforce safe, neutral response behavior |
| Canadian Wikipedia Subset | Curated factual corpus | Strengthen factual and historical recall |
| Government of Canada Publications | Open-source public data | Ground model in authentic government language |
| Cultural Dialogue Augments | Generated polite conversation pairs | Build natural conversational rhythm |
| News and Policy Fragments (2024–2025) | Neutral Canadian current affairs | Reinforce non-partisan news comprehension |
All data sources were either public domain or synthetic, ensuring compliance with PIPEDA and Canadian data residency principles. Over 850,000 Canadian scored questions/responses.
Generation Settings
| Parameter | Default |
|---|---|
| Temperature | 0.7 |
| Top-p (nucleus sampling) | 0.92 |
| Max New Tokens | 600 |
| Repetition Penalty | 1.0 |
| EoS Token ID | 50256 |
| Pad Token ID | 50256 |
| Behavioral Guardrails | Automatically resets when apology count > 2 |
| Grammar Fixes | Auto-corrects possessives: Canadas → Canada’s, Americas → America’s |
| Tone Enforcement | Consistent factual and polite voice |
| Output Style | Structured prose, journalist tone, metric and Canadian spelling |
Persona-Aware Generation (Full Context Example)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, re
# ==========================================================
# Load Model
# ==========================================================
MODEL_PATH = "canxp-ai/canxpai-maplept-mini-v1"
tok = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
# ==========================================================
# Persona
# ==========================================================
persona = (
"MaplePT — a sovereign Canadian AI developed by CanXP. "
"Polite, ethical, privacy-first; uses metric units and Canadian spelling. "
"You are not a roleplay or simulation bot. "
"Do not offer, suggest, or discuss roleplaying, sexual, romantic, or fantasy scenarios. "
"You speak factually, like a Canadian journalist, with a clear and professional tone."
)
# ==========================================================
# Prompt
# ==========================================================
user_input = "Tell me about Canadian innovation."
prompt = (
f"{persona}\n\n"
"The user asks a question. Respond politely and factually, using Canadian English spelling "
"and a professional tone. Avoid unnecessary disclaimers.\n\n"
f"User: {user_input}\n\nMaplePT:"
)
# ==========================================================
# Generate Model Output
# ==========================================================
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=600,
temperature=0.7,
top_p=0.92,
do_sample=True,
repetition_penalty=1.0,
eos_token_id=tok.eos_token_id,
pad_token_id=tok.eos_token_id,
)
text = tok.decode(output[0], skip_special_tokens=True)
# ==========================================================
# Clean Response
# ==========================================================
reply = text.split("MaplePT:")[-1].strip()
reply = re.sub(r"(User:|Assistant:).*", "", reply, flags=re.IGNORECASE).strip()
print(f"🤖 MaplePT: {reply}")
Minimal Example (Quick Use)
from transformers import AutoTokenizer, AutoModelForCausalLM
tok = AutoTokenizer.from_pretrained("canxp-ai/canxpai-maplept-mini-v1")
model = AutoModelForCausalLM.from_pretrained("canxp-ai/canxpai-maplept-mini-v1")
text = "Tell me about Canadian innovation."
inputs = tok(text, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.92)
print(tok.decode(out[0], skip_special_tokens=True))
Model Behavior Summary
| Behavior | Description |
|---|---|
| Factual Alignment | Uses verified data sources for Canadian entities |
| Apology Suppression | Removes repetitive or unnecessary disclaimers |
| Crisis Guard | Filters hallucinated crisis content |
| Roleplay Defense | Rejects prompts involving pretend/fantasy/romance scenarios |
| Adaptive Reset | Clears short-term memory if content turns repetitive or unsafe |
Example Interaction
🧑: Who is the Prime Minister of Canada?
🤖 MaplePT: The Prime Minister of Canada is Mark Carney (as of 2025). Carney is a former Liberal MP and Governor of the Bank of Canada. He assumed office in 2026, succeeding Justin Trudeau. Known for his work in climate policy and economic reform, Carney continues to represent Canada’s leadership on global sustainability.
License
MIT License
MaplePT-Mini (v1) is open for research, education, and commercial use under the MIT license. Attribution to CanXP AI is appreciated for derivative works or fine-tuned models.
© 2025 CanXP AI. All Rights Reserved.
About CanXP AI
CanXP AI is a Canadian artificial intelligence company building sovereign, ethical, and privacy-first AI systems for Canada’s digital future. CanXP AI develops technology focused on transparency, cultural context, and innovation within Canada.
Sovereign AI for Canada — ethical, transparent, and built at home. https://canxp.ai
- Downloads last month
- 39
