BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published 19 days ago • 32
DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text Paper • 2306.05540 • Published May 23, 2023
Pop Quiz! Do Pre-trained Code Models Possess Knowledge of Correct API Names? Paper • 2309.07804 • Published Sep 14, 2023 • 2
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 42
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts Paper • 2404.15247 • Published Apr 23, 2024 • 3
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity Paper • 2301.12867 • Published Jan 30, 2023
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models Paper • 2411.05830 • Published Nov 5, 2024 • 21
FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing Paper • 2305.17497 • Published May 27, 2023
Rethinking Round-Trip Translation for Machine Translation Evaluation Paper • 2209.07351 • Published Sep 15, 2022
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo Paper • 2508.18370 • Published Aug 25 • 3
CodeArena: A Collective Evaluation Platform for LLM Code Generation Paper • 2503.01295 • Published Mar 3 • 8
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 48
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models Paper • 2401.00788 • Published Jan 1, 2024 • 23
Source Code Data Augmentation for Deep Learning: A Survey Paper • 2305.19915 • Published May 31, 2023 • 1
Large Language Models Are State-of-the-Art Evaluators of Code Generation Paper • 2304.14317 • Published Apr 27, 2023 • 2
OctoPack: Instruction Tuning Code Large Language Models Paper • 2308.07124 • Published Aug 14, 2023 • 30