niro

niro is an improvement over the excellent WizardLM-Evol-V2-Unfiltered model, which at the time of writting is the best 1.8 billion parameters mistral model. Keep in mind, nero is an un-trained merge, further improvements are yet to come.

benchmarks

zero-shot evaluations performed on current sota small models; mmlu is still the reason qwen models are better on average. Currently, niro is on par with the best language model below 2b parameters.

Parameters Model MMLU ARC HellaSwag PIQA Winogrande Average
0.5b qwen 2.5 47.29 31.83 52.17 70.29 57.06 51.72
0.5b arco 26.17 37.29 62.88 74.37 62.27 52.60
0.5b arco (exp) 25.51 38.82 63.02 74.70 61.25 52.66
1.7b smollm 27.65 46.26 65.74 76.06 60.93 55.33
1.8B niro-preview 41.75 40.96 72.07 77.97 65.51 59.65
1.5b qwen 2.5 58.68 44.71 67.62 75.73 62.67 61.88
Downloads last month
19
Safetensors
Model size
2B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support