nanochat-d20 / report /base-model-evaluation.md
sampathchanda's picture
Upload folder using huggingface_hub
a465972 verified

Base model evaluation

timestamp: 2025-10-14 01:41:37

  • Model: base_model (step 21400)
  • CORE metric: 0.1976
  • hellaswag_zeroshot: 0.2598
  • jeopardy: 0.0874
  • bigbench_qa_wikidata: 0.5113
  • arc_easy: 0.5354
  • arc_challenge: 0.1183
  • copa: 0.2800
  • commonsense_qa: 0.0796
  • piqa: 0.3798
  • openbook_qa: 0.1627
  • lambada_openai: 0.3839
  • hellaswag: 0.2595
  • winograd: 0.2821
  • winogrande: 0.0513
  • bigbench_dyck_languages: 0.1430
  • agi_eval_lsat_ar: 0.1304
  • bigbench_cs_algorithms: 0.3727
  • bigbench_operators: 0.1762
  • bigbench_repeat_copy_logic: 0.0312
  • squad: 0.2389
  • coqa: 0.2088
  • boolq: -0.5218
  • bigbench_language_identification: 0.1757