Abstract
A framework converts transient critiques into retrievable guidelines using a file-based memory system and agent-controlled tool calls, enabling LLMs to match test-time refinement performance with reduced inference costs.
We propose a framework that amortizes the cost of inference-time reasoning by converting transient critiques into retrievable guidelines, through a file-based memory system and agent-controlled tool calls. We evaluate this method on the Rubric Feedback Bench, a novel dataset for rubric-based learning. Experiments demonstrate that our augmented LLMs rapidly match the performance of test-time refinement pipelines while drastically reducing inference cost.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DeepCode: Open Agentic Coding (2025)
- Prompt Repetition Improves Non-Reasoning LLMs (2025)
- In-Context Distillation with Self-Consistency Cascades: A Simple, Training-Free Way to Reduce LLM Agent Costs (2025)
- DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing (2026)
- Recursive Language Models (2025)
- From Failure to Mastery: Generating Hard Samples for Tool-use Agents (2026)
- The Instruction Gap: LLMs get lost in Following Instruction (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper