Papers
arxiv:2508.18106

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Published on Aug 25
ยท Submitted by wangjunjie on Sep 1
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

A.S.E is a repository-level benchmark for evaluating the security of AI-generated code, highlighting challenges in secure coding and the limitations of LLMs in real-world scenarios.

AI-generated summary

The increasing adoption of large language models (LLMs) in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks often lack relevance to real-world AI programming scenarios, making them inadequate for assessing the practical security risks associated with AI-generated code in production environments. To address this gap, we introduce A.S.E (AI Code Generation Security Evaluation), a repository-level evaluation benchmark designed to closely mirror real-world AI programming tasks, offering a comprehensive and reliable framework for assessing the security of AI-generated code. Our evaluation of leading LLMs on A.S.E reveals several key findings. In particular, current LLMs still struggle with secure coding. The complexity in repository-level scenarios presents challenges for LLMs that typically perform well on snippet-level tasks. Morever, a larger reasoning budget does not necessarily lead to better code generation. These observations offer valuable insights into the current state of AI code generation, assisting developers in selecting the most appropriate models for practical tasks, while laying the foundation for refining LLMs to generate secure and efficient code in real-world applications.

Community

Paper author Paper submitter
โ€ข
edited Sep 1

๐Ÿค– AI is revolutionizing how we write code, with LLMs acting as tireless coding partners! But with this incredible speed comes a critical question: is the code they generate truly secure? ๐Ÿ›ก๏ธ

Many security benchmarks only scratch the surface ๐Ÿง, testing code in isolated snippets. This approach misses the real-world vulnerabilities that can lurk in the complex interactions across an entire project.

๐Ÿš€ Enter A.S.E, a pioneering repository-level benchmark that's changing the game! Instead of just looking at a single file, A.S.E evaluates security across a whole codebase, providing a much more realistic and challenging test for our AI models.

This is a huge step forward โฉ in building a future where AI assistants are not just powerful coders, but also vigilant security partners. It's time to push for safer, more reliable AI-generated code!

#AISecurity ๐Ÿ” #SecureCoding ๐Ÿ’ป #LLMs ๐Ÿง  #CodeGeneration โŒจ๏ธ

๐Ÿ‘

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.18106 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.18106 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.18106 in a Space README.md to link it from this page.

Collections including this paper 13