From software engineering to context engineering

Why the developer craft is shifting from writing code to managing the context AI works in — and four concrete steps a team can take this week.

I still program. I just spend most of my time not dealing with code

For the last couple of years I've been doing something that looks like software development from the outside but is a different discipline on the inside. I still read code, I still understand the architecture. But I don't spend most of the day writing functions — I spend it managing the context AI works in. I call that the shift from software engineering to context engineering.

It's not a slogan. It's a change in where the work sits. The value used to be in how fast and cleanly I could write a solution. Today it's increasingly in how well I can hand a machine the context of our domain, our conventions, and our boundaries — and how I measure it.

And right up front, so there's no doubt: AI is a tool, not a replacement. I strengthen the team, I don't replace it. This piece isn't about agents replacing developers. It's about the craft changing.

What context engineering is in practice

Context engineering is managing what AI has at its disposal before it starts working. Anthropic describes it as "the strategy for curating the optimal set of tokens during inference." In practice, for a team it means five things:

Instructions — versioned files in the repo (copilot-instructions.md, AGENTS.md, CLAUDE.md) that describe conventions, the stack, and what's off-limits. We review them like code.
Skills — repeatable procedures (writing a PR description, generating architecture docs) as portable SKILL.md files, not as a prompt I keep in my head.
MCP servers — a bridge between the agent and your systems: a spec in Confluence, a ticket in Jira, a database schema — just-in-time, not hard-coded in.
Data / grounding — curated context (Spaces, retrieval) instead of "here's the whole monorepo, figure it out."
Measurement — without numbers it's just faith. DORA metrics, the share of merged AI suggestions, time to review. What I don't measure I can neither defend nor improve.

The key term is context rot: the fuller the window, the worse the recall. More context ≠ better context. The discipline is in curating, not in cramming everything in.

Why it's more than prompt tweaking

The most common frustration with AI I hear isn't "AI is dumb." It's "almost right, but not quite" — it generates a component with the wrong state library, the wrong import, naming that doesn't fit. And then I spend more time fixing almost-right code than I would have writing it myself.

People typically respond by improving the prompt. They add "please use our conventions," "be precise." It helps marginally. Because the problem wasn't in what I said, it was in what I provided.

Prompt engineering = what I tell AI. Context engineering = what I give AI.

When I run the same prompt twice — once with no context, once with an instruction file describing our stack — I get two completely different outputs. I didn't change the prompt. I changed the context. And there's the evidence: you don't rescue an almost-right output with better wording. You rescue it with better grounding.

How it differs from "I just use Copilot / ChatGPT"

"We use Copilot" usually means: every developer chats with autocomplete on their own, with no shared context, no measurement, no guarantees. It works for snippets. It doesn't work for a team and it doesn't scale.

Context engineering is the difference between:

ad hoc — everyone prompts their own way, quality swings from person to person;
discipline — shared versioned instructions, skills in the repo, guardrails via CODEOWNERS and code review, MCP wired into your sources of truth, and metrics that show whether it's helping at all.

The other thing I enjoy: the investment is portable. Tools will get swapped in a year. But a well-written SKILL.md or instruction file survives the migration from one coding agent to another. Don't invest in the tool, invest in the context.

And don't forget the guardrails — the human owns the output. AI proposes, the developer approves. That's not just good practice, it's also the direction regulation like the EU AI Act is pushing: traceability and human accountability for decisions.

Four things a team can do this week

No big rollout. Small, zero-risk steps on clean code, not on legacy:

Create a copilot-instructions.md (or AGENTS.md). Ten lines: stack, conventions, what's off-limits. Commit it to the repo.

# Project conventions
- State management: Zustand (NOT Redux)
- Components: functional, TypeScript strict
- Tests: Vitest, mandatory for new business logic
- NEVER edit files in /generated

Pick one boring repeated task and turn it into a skill. PR descriptions or test generation. Low risk, you see the result immediately, and the team gets a feel for what a skill looks like.
Wire up one MCP server to your source of truth. Most often Confluence/Jira/the DB schema. The goal: the agent pulls the spec just-in-time instead of hallucinating.
Measure one number before and after. Say, the share of AI suggestions that pass review without a big rewrite. Without a baseline you don't know whether you're helping yourself or just playing around.

Conclusion

Context engineering isn't magic and it isn't a trick for a better prompt. It's a discipline: curating context, versioning instructions, wiring up your data, and measuring honestly. Boring, repeatable, reviewable.

Start with a small step this week — one instruction file, one skill. A month from now you don't want a "consultant's setup." You want your own kit of instructions and skills that you review yourselves, like code. That's the shift.