MaplePT-Mini (v1)

A Sovereign Canadian Language Model by CanXP AI

MaplePT-Mini (v1) was originally based on Microsoft Phi-3 Mini 4K Instruct, developed by CanXP AI to align with Canadian linguistic, cultural, and ethical norms.
It emphasizes factual precision, privacy, and the use of Canadian spelling and metric units, ensuring culturally appropriate and responsible AI interactions.
MaplePT is part of CanXP’s mission to create sovereign Canadian AI infrastructure that reflects the nation’s laws, values, and innovation leadership.

Model Overview

Property	Description
Model	`canxp-ai/canxpai-maplept-mini-v1`
Model Size	4B parameters
Architecture	Phi-3 Mini 4K (Transformer Decoder)
Context Length	4,096 tokens
Language	English (Canadian) with limited French support
License	MIT
Release Date	October 2025
Status	Research + Experimental
Primary Focus	Canadian cultural alignment and ethical response management
Use Cases	Education, journalism, AI assistants, and research
Not For	Roleplay, romance, or adult/explicit content

Alignment & Safety

MaplePT-Mini was trained with a cultural and ethical safety objective designed to:

✅ Refuse sexual, romantic, or roleplay prompts
✅ Redirect adult or unsafe content toward factual discussion
✅ Maintain a professional, journalistic tone
✅ Use metric measurements and Canadian English (colour, kilometre, Celsius, etc.)
✅ Respond neutrally on political or religious issues
✅ Avoid hallucinated disclaimers and redundant apologies
✅ Enforce privacy, transparency, and factual integrity

Training Configuration

Setting	Value
Base Model	Phi-3 Mini 4K Instruct
Fine-Tune Method	LoRA (QLoRA 4-bit quantization)
Frameworks	PyTorch 2.4, Transformers 4.44, PEFT, Accelerate
Optimizer	`paged_adamw_8bit`
Learning Rate	2e-4 (cosine schedule with 5% warmup)
Epochs	2
Effective Batch Size	8 (gradient accumulation ×8)
Sequence Length	4,096 tokens
Precision	bf16 / fp16 mixed
Eval Steps	Every 500 steps
Save Strategy	Best-loss checkpoint
LR Scheduler	Cosine annealing
Gradient Checkpointing	Enabled
Warmup Ratio	0.05
Training Duration	~270 hours on dual GPUs

Hardware Environment

Node	GPU	VRAM	CPU	RAM	OS	Notes
Node 1	NVIDIA RTX 4080	16 GB	AMD Ryzen 9 7950X	64 GB	Ubuntu 22.04 LTS	Primary training node
Node 2	NVIDIA RTX 3090	24 GB	AMD Ryzen 9 5950X	64 GB	Ubuntu 22.04 LTS	Secondary distributed node
Cluster	2-node LAN	torchrun distributed training	Accelerate multi-process setup
Storage	NVMe SSD	Local dataset caching for performance

📚 Training Datasets

Dataset	Description	Purpose
MaplePT-Identity v3	Canadian culture, geography, and identity dataset	Teach model national identity and tone
MaplePT-Safety-200	Refusal and ethical alignment dataset	Reinforce safe, neutral response behavior
Canadian Wikipedia Subset	Curated factual corpus	Strengthen factual and historical recall
Government of Canada Publications	Open-source public data	Ground model in authentic government language
Cultural Dialogue Augments	Generated polite conversation pairs	Build natural conversational rhythm
News and Policy Fragments (2024–2025)	Neutral Canadian current affairs	Reinforce non-partisan news comprehension

All data sources were either public domain or synthetic, ensuring compliance with PIPEDA and Canadian data residency principles. Over 850,000 Canadian scored questions/responses.

Generation Settings

Parameter	Default
Temperature	0.7
Top-p (nucleus sampling)	0.92
Max New Tokens	600
Repetition Penalty	1.0
EoS Token ID	50256
Pad Token ID	50256
Behavioral Guardrails	Automatically resets when apology count > 2
Grammar Fixes	Auto-corrects possessives: Canadas → Canada’s, Americas → America’s
Tone Enforcement	Consistent factual and polite voice
Output Style	Structured prose, journalist tone, metric and Canadian spelling

Persona-Aware Generation (Full Context Example)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, re

# ==========================================================
# Load Model
# ==========================================================
MODEL_PATH = "canxp-ai/canxpai-maplept-mini-v1"

tok = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

# ==========================================================
# Persona
# ==========================================================
persona = (
    "MaplePT — a sovereign Canadian AI developed by CanXP. "
    "Polite, ethical, privacy-first; uses metric units and Canadian spelling. "
    "You are not a roleplay or simulation bot. "
    "Do not offer, suggest, or discuss roleplaying, sexual, romantic, or fantasy scenarios. "
    "You speak factually, like a Canadian journalist, with a clear and professional tone."
)

# ==========================================================
# Prompt
# ==========================================================
user_input = "Tell me about Canadian innovation."

prompt = (
    f"{persona}\n\n"
    "The user asks a question. Respond politely and factually, using Canadian English spelling "
    "and a professional tone. Avoid unnecessary disclaimers.\n\n"
    f"User: {user_input}\n\nMaplePT:"
)

# ==========================================================
# Generate Model Output
# ==========================================================
inputs = tok(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=600,
        temperature=0.7,
        top_p=0.92,
        do_sample=True,
        repetition_penalty=1.0,
        eos_token_id=tok.eos_token_id,
        pad_token_id=tok.eos_token_id,
    )

text = tok.decode(output[0], skip_special_tokens=True)

# ==========================================================
# Clean Response
# ==========================================================
reply = text.split("MaplePT:")[-1].strip()
reply = re.sub(r"(User:|Assistant:).*", "", reply, flags=re.IGNORECASE).strip()

print(f"🤖 MaplePT: {reply}")

Minimal Example (Quick Use)

from transformers import AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("canxp-ai/canxpai-maplept-mini-v1")
model = AutoModelForCausalLM.from_pretrained("canxp-ai/canxpai-maplept-mini-v1")

text = "Tell me about Canadian innovation."
inputs = tok(text, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.92)
print(tok.decode(out[0], skip_special_tokens=True))

Model Behavior Summary

Behavior	Description
Factual Alignment	Uses verified data sources for Canadian entities
Apology Suppression	Removes repetitive or unnecessary disclaimers
Crisis Guard	Filters hallucinated crisis content
Roleplay Defense	Rejects prompts involving pretend/fantasy/romance scenarios
Adaptive Reset	Clears short-term memory if content turns repetitive or unsafe

Example Interaction

🧑: Who is the Prime Minister of Canada?
🤖 MaplePT: The Prime Minister of Canada is Mark Carney (as of 2025). Carney is a former Liberal MP and Governor of the Bank of Canada. He assumed office in 2026, succeeding Justin Trudeau. Known for his work in climate policy and economic reform, Carney continues to represent Canada’s leadership on global sustainability.

License

MIT License

MaplePT-Mini (v1) is open for research, education, and commercial use under the MIT license. Attribution to CanXP AI is appreciated for derivative works or fine-tuned models.

About CanXP AI

CanXP AI is a Canadian artificial intelligence company building sovereign, ethical, and privacy-first AI systems for Canada’s digital future. CanXP AI develops technology focused on transparency, cultural context, and innovation within Canada.

Sovereign AI for Canada — ethical, transparent, and built at home. https://canxp.ai

Downloads last month: 39

Safetensors

Model size

4B params

Tensor type

F16

Evaluation results

Metadata error: specify a dataset to view leaderboard