Spaces:
Running
Running
[Update_25.10.02]
#1
by
sr-admin
- opened
- Replaced token length information in the table with time-related measurement results:
- Time to First Answer Token: The median value of the seconds from sending the request until the first token of the response arrives (after internal thinking, if it exists).
- End-to-End Response Time: The median value of the seconds from sending the request until the complete response arrives.
- Included speed measurements per GPU for open-sourced models:
- Speed per GPU: The median value of the number of tokens generated per second divided by the number of GPUs during inference.
- Updated new models.
- GLM-4.6 FP8
- Gemini 2.5 Flash-lite Preview
- DeepSeek V3.1 Terminus
- Apriel 1.5 15B Thinker
- Added the link and citation information for the TRUEBench paper.
sr-admin
changed discussion title from
[test]
to [Update_25.10.03]
sr-admin
changed discussion title from
[Update_25.10.03]
to [Update_25.10.02]