Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
onekq 
posted an update 1 day ago
Post
2953
Context rot is such a catchy phrase, but the problem has been identified 2+ years ago, called attention decay.
Lost in the Middle: How Language Models Use Long Contexts (2307.03172)

I spotted the same problem in coding tasks, and documented in my book (https://www.amazon.com/dp/9999331130).

Why did this problem become hot again? This is because many of us thought the problem has been solved by long context models, which is not true.

Here we were misled by benchmarks. Most long-context benchmarks build around the QA scenario, i.e. "finding needle in haystack". But in agentic scenarios, the model needs to find EVERYTHING in the haystack, and just can't afford enough attention for this challenge.

I'm not sure if those findings are concerning, to me it appears natural for instruct models.
They area finetuned to respond to input with a single answer.
The most important part is the early start which frames the question, then usually you have an increasingly complex context as you typically continue to frame the question and introduce the question variables.
Toward the end the most important content is found and the actual questions to answer are going to be written there.
So similar to a human mind the primary focus is at the start and then at the late parts of context when tasked with typical finetuning questions.

·

Do you know any work that studied how agents use context?

In this post