MaplePT LogoReleased by CanXP.ai

MaplePT-Mini (v1)

A Sovereign Canadian Language Model by CanXP AI

  • MaplePT-Mini (v1) was originally based on Microsoft Phi-3 Mini 4K Instruct, developed by CanXP AI to align with Canadian linguistic, cultural, and ethical norms.
  • It emphasizes factual precision, privacy, and the use of Canadian spelling and metric units, ensuring culturally appropriate and responsible AI interactions.
  • MaplePT is part of CanXP’s mission to create sovereign Canadian AI infrastructure that reflects the nation’s laws, values, and innovation leadership.

Model Overview

Property Description
Model canxp-ai/canxpai-maplept-mini-v1
Model Size 4B parameters
Architecture Phi-3 Mini 4K (Transformer Decoder)
Context Length 4,096 tokens
Language English (Canadian) with limited French support
License MIT
Release Date October 2025
Status Research + Experimental
Primary Focus Canadian cultural alignment and ethical response management
Use Cases Education, journalism, AI assistants, and research
Not For Roleplay, romance, or adult/explicit content

Alignment & Safety

MaplePT-Mini was trained with a cultural and ethical safety objective designed to:

  • ✅ Refuse sexual, romantic, or roleplay prompts
  • ✅ Redirect adult or unsafe content toward factual discussion
  • ✅ Maintain a professional, journalistic tone
  • ✅ Use metric measurements and Canadian English (colour, kilometre, Celsius, etc.)
  • ✅ Respond neutrally on political or religious issues
  • ✅ Avoid hallucinated disclaimers and redundant apologies
  • ✅ Enforce privacy, transparency, and factual integrity

Training Configuration

Setting Value
Base Model Phi-3 Mini 4K Instruct
Fine-Tune Method LoRA (QLoRA 4-bit quantization)
Frameworks PyTorch 2.4, Transformers 4.44, PEFT, Accelerate
Optimizer paged_adamw_8bit
Learning Rate 2e-4 (cosine schedule with 5% warmup)
Epochs 2
Effective Batch Size 8 (gradient accumulation ×8)
Sequence Length 4,096 tokens
Precision bf16 / fp16 mixed
Eval Steps Every 500 steps
Save Strategy Best-loss checkpoint
LR Scheduler Cosine annealing
Gradient Checkpointing Enabled
Warmup Ratio 0.05
Training Duration ~270 hours on dual GPUs

Hardware Environment

Node GPU VRAM CPU RAM OS Notes
Node 1 NVIDIA RTX 4080 16 GB AMD Ryzen 9 7950X 64 GB Ubuntu 22.04 LTS Primary training node
Node 2 NVIDIA RTX 3090 24 GB AMD Ryzen 9 5950X 64 GB Ubuntu 22.04 LTS Secondary distributed node
Cluster 2-node LAN torchrun distributed training Accelerate multi-process setup
Storage NVMe SSD Local dataset caching for performance

📚 Training Datasets

Dataset Description Purpose
MaplePT-Identity v3 Canadian culture, geography, and identity dataset Teach model national identity and tone
MaplePT-Safety-200 Refusal and ethical alignment dataset Reinforce safe, neutral response behavior
Canadian Wikipedia Subset Curated factual corpus Strengthen factual and historical recall
Government of Canada Publications Open-source public data Ground model in authentic government language
Cultural Dialogue Augments Generated polite conversation pairs Build natural conversational rhythm
News and Policy Fragments (2024–2025) Neutral Canadian current affairs Reinforce non-partisan news comprehension

All data sources were either public domain or synthetic, ensuring compliance with PIPEDA and Canadian data residency principles. Over 850,000 Canadian scored questions/responses.


Generation Settings

Parameter Default
Temperature 0.7
Top-p (nucleus sampling) 0.92
Max New Tokens 600
Repetition Penalty 1.0
EoS Token ID 50256
Pad Token ID 50256
Behavioral Guardrails Automatically resets when apology count > 2
Grammar Fixes Auto-corrects possessives: Canadas → Canada’s, Americas → America’s
Tone Enforcement Consistent factual and polite voice
Output Style Structured prose, journalist tone, metric and Canadian spelling

Persona-Aware Generation (Full Context Example)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, re

# ==========================================================
# Load Model
# ==========================================================
MODEL_PATH = "canxp-ai/canxpai-maplept-mini-v1"

tok = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

# ==========================================================
# Persona
# ==========================================================
persona = (
    "MaplePT — a sovereign Canadian AI developed by CanXP. "
    "Polite, ethical, privacy-first; uses metric units and Canadian spelling. "
    "You are not a roleplay or simulation bot. "
    "Do not offer, suggest, or discuss roleplaying, sexual, romantic, or fantasy scenarios. "
    "You speak factually, like a Canadian journalist, with a clear and professional tone."
)

# ==========================================================
# Prompt
# ==========================================================
user_input = "Tell me about Canadian innovation."

prompt = (
    f"{persona}\n\n"
    "The user asks a question. Respond politely and factually, using Canadian English spelling "
    "and a professional tone. Avoid unnecessary disclaimers.\n\n"
    f"User: {user_input}\n\nMaplePT:"
)

# ==========================================================
# Generate Model Output
# ==========================================================
inputs = tok(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=600,
        temperature=0.7,
        top_p=0.92,
        do_sample=True,
        repetition_penalty=1.0,
        eos_token_id=tok.eos_token_id,
        pad_token_id=tok.eos_token_id,
    )

text = tok.decode(output[0], skip_special_tokens=True)

# ==========================================================
# Clean Response
# ==========================================================
reply = text.split("MaplePT:")[-1].strip()
reply = re.sub(r"(User:|Assistant:).*", "", reply, flags=re.IGNORECASE).strip()

print(f"🤖 MaplePT: {reply}")

Minimal Example (Quick Use)

from transformers import AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("canxp-ai/canxpai-maplept-mini-v1")
model = AutoModelForCausalLM.from_pretrained("canxp-ai/canxpai-maplept-mini-v1")

text = "Tell me about Canadian innovation."
inputs = tok(text, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.92)
print(tok.decode(out[0], skip_special_tokens=True))

Model Behavior Summary

Behavior Description
Factual Alignment Uses verified data sources for Canadian entities
Apology Suppression Removes repetitive or unnecessary disclaimers
Crisis Guard Filters hallucinated crisis content
Roleplay Defense Rejects prompts involving pretend/fantasy/romance scenarios
Adaptive Reset Clears short-term memory if content turns repetitive or unsafe

Example Interaction

🧑: Who is the Prime Minister of Canada?
🤖 MaplePT: The Prime Minister of Canada is Mark Carney (as of 2025). Carney is a former Liberal MP and Governor of the Bank of Canada. He assumed office in 2026, succeeding Justin Trudeau. Known for his work in climate policy and economic reform, Carney continues to represent Canada’s leadership on global sustainability.

License

MIT License

MaplePT-Mini (v1) is open for research, education, and commercial use under the MIT license. Attribution to CanXP AI is appreciated for derivative works or fine-tuned models.

© 2025 CanXP AI. All Rights Reserved.


About CanXP AI

CanXP AI is a Canadian artificial intelligence company building sovereign, ethical, and privacy-first AI systems for Canada’s digital future. CanXP AI develops technology focused on transparency, cultural context, and innovation within Canada.

Sovereign AI for Canada — ethical, transparent, and built at home. https://canxp.ai

Downloads last month
39
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support