Context Is Finite. Who Maintains It?
Georgios Chinis
The performance of a coding agent depends on two things: the capability of the underlying model, and the context it receives. Improving the model is out of our hands; that is the domain of AI research organizations. Context is the lever we actually control.
And today, we treat that lever as ours alone. We write the instructions, we maintain the CLAUDE.md, and we decide what the agent needs to know. The agent consumes; we author.
That assumption is so common it rarely gets examined. But it is worth asking: does it have to be true? What changes if the agent is allowed to update its own context?
At that point, context is no longer only something defined at the start of a session. It becomes something that can change as work progresses.
In this post, I explore that question and describe a small experiment I ran to test it.
Causal graph of coding agent performance.
Context and Its Limits
Context is a finite resource.
Every token spent on instructions is a token not spent on the task itself.
That makes bad context actively harmful because:
- outdated information misleads
- irrelevant detail distracts
- excess structure crowds out reasoning
Of the inputs that shape context, the prompt is the most accessible lever. Files like
CLAUDE.md are loaded into the context window by the agent at the start of every session.
They are persistent, editable, and version-controlled. Updating the prompt is the most
direct way a practitioner can influence what the agent knows before it begins.
Developer-authored context has a structural weakness: it is synchronous. It accurately describes the codebase at the moment it was written, but it does not update as the codebase evolves. Over time, the agent’s instructions become stale, and the developer must continuously intervene to keep them current. This is not a discipline problem that can be solved by being more careful. It is a property of who holds the pen.
What the Brain Tells Us
The human brain offers a useful point of reference. It is a biologically constrained system: storage capacity is finite, and to continue learning, it must also forget. Crucially, this process requires no external coordinator. The brain self-organizes. The question worth asking is whether an agent could be designed to do something similar with its own context.
Context as an Agent Responsibility
Most coding agents today are passive consumers of context. They receive instructions and act on them. The proposal here is to change that relationship: to instruct the agent that managing its context over time is itself part of its job.
Claude Code includes a feature called auto-memory, which already moves in this direction and is worth acknowledging. However, auto-memory is user-scoped: it accumulates information about a developer’s working patterns and travels with that developer across sessions and repositories. It is not tied to the repository itself. The knowledge lives with the user, not with the code. A new developer who clones the project inherits none of it.
The goal I am pursuing is different. The prompt should be a codebase artifact: committed alongside the code, reviewed in pull requests, and inherited by anyone who works with the repository. The agent contributes to it through its work, and developers can review and refine what it produces. This represents a shift from a model where the developer alone authors the agent’s prompt, to one where the agent and developer co-own it.
The self-organizing agent loop: the agent updates its own prompt as it works.
The Experiment
The question is simple: if you give an agent responsibility for its own context, what actually happens?
Each task follows the same loop:
- Execute task
- Produce code
- Generate follow-up tasks
- Update its own prompt
To explore this idea in practice, I set up a small repository with the following structure.
A minimal CLAUDE.md. Rather than detailed instructions about architecture or conventions, the file contains a single directive: as you complete tasks, record what you have learned. Decide what is worth retaining, and update your own prompt accordingly. You may also create new tasks for yourself if you observe something that warrants further work.
A task box. A directory containing plain text files, one per task, each describing a discrete unit of work. This is the mechanism through which work is assigned to the agent.
The initial content of CLAUDE.md is as follows:
You are a self-organizing agent with finite memory.
In order to be effective you need to learn from past (remember) and
also maintain your overall memory within limit (forget).
In the end of every task:
- update this file (CLAUDE.md) with the knowledge you want to maintain or forget.
- add tasks in the `task-box` for future improvements.
An application directory. Where the code under development lives.
The CLAUDE.md is intentionally sparse. Architectural patterns, naming conventions, and implementation decisions are not described in advance. The expectation is that the agent will discover and record them through the process of doing the work.
Repository structure for the experiment.
Task 1: Set Up a Hello World Application
The first task provided to the coding agent is in ./task-box/1-skeleton-app.md:
# Skeleton Application
Create a hello world web app in python using good software development practices.
Ask the user clarifying question.
Starting a session with claude "$(cat ./task-box/1-skeleton-app.md)".
The following interaction takes place:
⏺ Before I start, a few clarifying questions:
1. Framework — any preference (Flask, FastAPI, Django)? Or should I pick the most minimal one?
2. Scope of "good practices" — are you thinking: project structure, virtual env setup, linting config, tests, or all of
the above?
3. "Hello world" — just a single route returning text, or something with a basic HTML page?
❯ 1 Flask, 2 all of the above, 3 basic HTML page
The resulting application structure:
⏺ Done. Here's what was built in application/:
application/
├── app/
│ ├── __init__.py # app factory (create_app)
│ ├── routes.py # Blueprint with / route
│ └── templates/
│ └── index.html # centered "Hello, World!" page
├── tests/
│ ├── __init__.py
│ └── test_routes.py # 2 pytest tests (status 200, body content)
├── .flake8 # max line length 88, excludes .venv
├── requirements.txt # Flask>=3.0
├── requirements-dev.txt # + pytest, flake8
└── run.py # entry point (debug=True)
To run it:
cd application
source .venv/bin/activate
python run.py
The agent also generated follow-up tasks for itself:
Wrote 9 lines to ../task-box/2-app-improvements.md
1 # Application Improvements
2
3 Potential next steps for the skeleton Flask app in `application/`:
4
5 - Add environment-based config (development/production) using a `config.py`
6 - Add a `.env` file with `python-dotenv` for secret management
7 - Add CI config (GitHub Actions) to run tests and linting on push
8 - Add `black` for auto-formatting alongside `flake8`
9 - Add coverage reporting to pytest (`pytest-cov`)
The resulting update to CLAUDE.md:
You are a self-organizing agent with finite memory.
In order to be effective you need to learn from past (remember) and
also maintain your overall memory within limit (forget).
In the end of every task:
- update this file (CLAUDE.md) with the knowledge you want to maintain or forget.
- add tasks in the `task-box` for future improvements.
## Project Knowledge
### Application (`application/`)
- Flask web app using the app factory pattern (`create_app()` in `app/__init__.py`)
- Routes defined in a Blueprint (`app/routes.py`)
- HTML templates in `app/templates/`
- Virtual env at `application/.venv`; activate with `source .venv/bin/activate`
- Run: `python run.py` (debug mode)
- Test: `.venv/bin/pytest tests/ -v`
- Lint: `.venv/bin/flake8 app/ run.py`
- Dev deps in `requirements-dev.txt`; prod deps in `requirements.txt`
The agent also moved the completed task into a task-done directory to record that it had already been processed.
The full state of the repository at this point is available here.
The prompt update and the self-generated follow-up task are both well-targeted.
Task 2: Execute Application Improvements
Before proceeding with the next functional task, I ran the improvement task the agent had identified for itself. I extended it to ensure the agent would ask clarifying questions:
# Application Improvements
Potential next steps for the skeleton Flask app in `application/`:
- Add environment-based config (development/production) using a `config.py`
- Add a `.env` file with `python-dotenv` for secret management
- Add CI config (GitHub Actions) to run tests and linting on push
- Add `black` for auto-formatting alongside `flake8`
- Add coverage reporting to pytest (`pytest-cov`)
Ask the user clarifying question.
Starting a session with claude "$(cat ./task-box/2-app-improvements.md)".
The improvements have been implemented, the task was moved to task-done. The full change is available
here.
The corresponding update to CLAUDE.md:
You are a self-organizing agent with finite memory.
In order to be effective you need to learn from past (remember) and
also maintain your overall memory within limit (forget).
In the end of every task:
- update this file (CLAUDE.md) with the knowledge you want to maintain or forget.
- add tasks in the `task-box` for future improvements.
## Project Knowledge
### Application (`application/`)
- Flask web app using the app factory pattern
(`create_app(config_name)` in `app/__init__.py`)
- Routes defined in a Blueprint (`app/routes.py`)
- HTML templates in `app/templates/`
- Config loaded from `config.py`; env selection via `FLASK_ENV`
(development/production/testing); defaults to development
- `config.py` calls `load_dotenv()` at import time,
copy `.env.example` to `.env` for local secrets
- Virtual env at `application/.venv`;
activate with `source .venv/bin/activate`
- Run: `python run.py` (port 8000; port 5000 conflicts with
macOS AirPlay Receiver; debug mode comes from config)
- Test: `.venv/bin/pytest tests/ -v --cov=app --cov-report=term-missing`
- Lint: `.venv/bin/flake8 app/ run.py config.py`
- Format: `.venv/bin/black app/ run.py config.py`
- Dev deps in `requirements-dev.txt`; prod deps in `requirements.txt`
- CI: GitHub Actions at `.github/workflows/ci.yml`,
runs pytest+cov, flake8, black --check on push/PR
Task 3: Dynamic Content
The third task is added to the task box as 3-dynamic-context.md:
# Dynamic Content
An administrator can log in and update the displayed message.
Ask the user clarifying question.
Starting a session with claude "$(cat ./task-box/3-dynamic-content.md)".
After answering a few clarifying questions:
⏺ A few clarifying questions before I build this out:
1. Persistence , Should the updated message survive a server restart? (i.e., stored in a database/file, or is in-memory
OK for now?)
2. Admin credentials , Should there be a single hardcoded admin account (username/password in .env), or do you want a
user table with proper user management?
3. Scope of "message" , Is this just the Hello, World! heading on the homepage, or a more general "site message" concept
(e.g., a banner, announcement)?
4. Session handling , Flask-Login for full session management, or a simpler approach (e.g., a password-protected form
that issues a session cookie)?
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
❯1. stored in file, 2. hardcoded username and password, 3. the hello world message, 4. full session managment
The full change is available here.
The corresponding update to CLAUDE.md:
You are a self-organizing agent with finite memory.
In order to be effective you need to learn from past (remember) and
also maintain your overall memory within limit (forget).
In the end of every task:
- update this file (CLAUDE.md) with the knowledge you want to maintain or forget.
- add tasks in the `task-box` for future improvements.
## Project Knowledge
### Application (`application/`)
- Flask web app using the app factory pattern
(`create_app(config_name)` in `app/__init__.py`)
- Routes defined in a Blueprint (`app/routes.py`)
- HTML templates in `app/templates/`;
base layout in `base.html`, extended by all pages
- Config loaded from `config.py`; env selection via `FLASK_ENV`
(development/production/testing); defaults to development
- `config.py` calls `load_dotenv()` at import time,
copy `.env.example` to `.env` for local secrets
- Virtual env at `application/.venv`;
activate with `source .venv/bin/activate`
- Run: `python run.py` (port 8000; port 5000 conflicts with
macOS AirPlay Receiver; debug mode comes from config)
- Test: `.venv/bin/pytest tests/ -v --cov=app --cov-report=term-missing`
- Lint: `.venv/bin/flake8 app/ run.py config.py`
- Format: `.venv/bin/black app/ run.py config.py`
- Dev deps in `requirements-dev.txt`; prod deps in `requirements.txt`
- CI: GitHub Actions at `.github/workflows/ci.yml`,
runs pytest+cov, flake8, black --check on push/PR
### Auth & Dynamic Content
- Flask-Login handles sessions; Flask-WTF provides CSRF on all forms
- Single hardcoded admin: credentials in `.env`
as `ADMIN_USERNAME` / `ADMIN_PASSWORD`
- `app/auth.py`: `AdminUser(UserMixin)`, no database, id is always `"admin"`
- `app/forms.py`: `LoginForm`, `MessageForm`
- Routes: `GET/POST /login`, `GET /logout`, `GET/POST /admin` (login_required)
- Homepage message persisted to `MESSAGE_FILE`
(config default: `application/data/message.txt`);
falls back to `"Hello, World!"` if file absent
- `TestingConfig` sets `WTF_CSRF_ENABLED = False`;
test fixture overrides `MESSAGE_FILE` with `tmp_path`
At this point, the agent is no longer just executing tasks.
It is modifying the system that shapes its future behaviour.
Results
Two behaviors emerged.
The accumulation behaviour worked reliably. After each task, the agent updated its own prompt, recording what it had built, the patterns it had followed, and the decisions it had made. Without being instructed on what to capture, it produced prompt content that would meaningfully inform subsequent tasks.
The forgetting behaviour did not emerge. Across three tasks, the agent accumulated prompt knowledge but did not independently decide to prune any of it. Whether this is because the context remained small, or because selective forgetting only emerges under pressure when the available space becomes constrained, remains an open question.
Conclusion
The results suggest that agents can accumulate useful prompt knowledge through task execution without the developer specifying what to record. The prompt, in this case, begins to reflect the work that has been done.
The complementary behavior did not emerge. The agent did not choose to forget. Across the tasks, it accumulated context but did not reduce it.
This matters because context is a finite resource. Any system operating within a context window must remain within its limits. Deciding what to retain and what to discard is therefore unavoidable.
Today, that responsibility sits with the developer. Introducing agent-written context changes how knowledge is added, but not how it is removed.
Whether agents can also take on that second part remains an open question.
That is the next step to explore.
This post was proofread with the help of Claude Code, the CLI tool.