niro is an improvement over the excellent WizardLM-Evol-V2-Unfiltered model, which at the time of writting is the best 1.8 billion parameters mistral model. Keep in mind, nero is an un-trained merge, further improvements are yet to come.

benchmarks

zero-shot evaluations performed on current sota small models; mmlu is still the reason qwen models are better on average. Currently, niro is on par with the best language model below 2b parameters.

Parameters	Model	MMLU	ARC	HellaSwag	PIQA	Winogrande	Average
0.5b	qwen 2.5	47.29	31.83	52.17	70.29	57.06	51.72
0.5b	arco	26.17	37.29	62.88	74.37	62.27	52.60
0.5b	arco (exp)	25.51	38.82	63.02	74.70	61.25	52.66
1.7b	smollm	27.65	46.26	65.74	76.06	60.93	55.33
1.8B	niro-preview	41.75	40.96	72.07	77.97	65.51	59.65
1.5b	qwen 2.5	58.68	44.71	67.62	75.73	62.67	61.88

Downloads last month: 19

Safetensors

Model size

2B params

Tensor type

F16