[Update_25.10.02]

#1
by sr-admin - opened
Samsung Research org
β€’
edited Oct 3
  1. Replaced token length information in the table with time-related measurement results:
  • Time to First Answer Token: The median value of the seconds from sending the request until the first token of the response arrives (after internal thinking, if it exists).
  • End-to-End Response Time: The median value of the seconds from sending the request until the complete response arrives.
  1. Included speed measurements per GPU for open-sourced models:
  • Speed per GPU: The median value of the number of tokens generated per second divided by the number of GPUs during inference.
  1. Updated new models.
  • GLM-4.6 FP8
  • Gemini 2.5 Flash-lite Preview
  • DeepSeek V3.1 Terminus
  • Apriel 1.5 15B Thinker
  1. Added the link and citation information for the TRUEBench paper.
sr-admin changed discussion title from [test] to [Update_25.10.03]
sr-admin changed discussion title from [Update_25.10.03] to [Update_25.10.02]

Sign up or log in to comment