view article Article BigCodeArena: Judging code generations end to end with code executions By bigcode β’ 21 days ago β’ 16
view article Article BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks Jun 18, 2024 β’ 52