Human governance for living specs

A professional view on why traceability, conflict resolution, and human control matter when specs evolve with AI.

Software engineering has always been a discipline of change. We moved from static blueprints to iterative development, from manual merges to automated CI/CD, from centralized control to distributed teams. Each shift required new governance mechanisms: version control, code review, deployment pipelines. The arrival of AI coding agents introduces another layer of change - and with it, a new governance challenge. When humans and agents both edit specifications, who changed what? How do we undo mistakes? How do we resolve conflicts without losing work? And how do we keep the human in control?

Gartner has observed that AI will not replace software engineers; it will transform their roles and may actually require more skilled engineers to manage AI systems effectively. That management is precisely what we mean by governance: version history, conflict resolution, traceability, and the ability to restore. Without it, living specs become chaotic. With it, they remain manageable and auditable.

Specs that evolve

For decades, specifications were treated as relatively static documents. You wrote them, you reviewed them, and you locked them down before implementation. Agile loosened that model - specs could evolve - but humans were still the primary authors. With AI agents, specs evolve at machine speed. An agent can update dozens of files in minutes. Multiple agents can work in parallel. Humans and agents can edit the same document simultaneously. The traditional model - sequential human authorship with occasional merges - breaks down.

Andreessen Horowitz, in their analysis of emerging developer patterns, notes that "we may begin to layer in richer metadata, such as which agent or model made a change, which sections are protected, and where human oversight is required." The shift is from who wrote the line to who or what changed it, and with what intent. Version control must evolve to track not just diffs, but provenance: human vs agent vs system.

Academic research has begun to quantify the challenge. The CONGRA benchmark (2024) evaluates LLMs on automatic merge conflict resolution across nearly 45,000 real-world conflicts from projects including TensorFlow and the Linux kernel. Results show that AI-generated resolutions are accepted at higher rates in practice than automatic metrics predict - suggesting a promising role for human-in-the-loop workflows where developers review and approve AI suggestions rather than resolving from scratch. But the key phrase is human-in-the-loop. Human oversight remains essential to verify semantic correctness and project-specific requirements.

Expected changes and innovations

We can anticipate several innovations:

Provenance-aware versioning. Every change will carry metadata: author type (human, agent, system), model identifier, and optional intent. Block-level diffs will show not just what changed, but who or what changed it. This enables audit trails and selective rollbacks - e.g., "revert all agent changes from the last hour."

Conflict resolution without overwrites. When humans and agents both edit, the system should create branches, not overwrites. Resolution options: keep current (canonical), use branch (agent's version), or merge manually. The human chooses. Research from Microsoft (DeepMerge) and others has shown that neural approaches can assist in merge resolution, but human approval is critical.

Async agent work with clear handoff. a16z describes a shift toward "asynchronous workflows where agents operate in the background, pursue parallel threads of work, and report back when they've made progress." In that model, branching and delegating to agents becomes "the new Git branch - not a static fork of code, but a dynamic thread of intent, running asynchronously until it's ready to land." Governance means the human decides when to land, what to keep, and what to discard.

Protected sections and human oversight. Some sections of a spec may be marked as requiring human approval before an agent can modify them. Governance becomes configurable: strict for critical contracts, looser for draft sections.

Governance in the tool layer. As agents increasingly access specs via MCP (Model Context Protocol) and other tool integrations, governance must live where the specs live. An agent that reads specs from a canonical source and writes changes back should encounter the same version history, conflict handling, and provenance tracking as a human. Otherwise, the governance layer becomes a bottleneck or is bypassed entirely.

What the research shows

The CONGRA benchmark (2024) and related work offer concrete data. CONGRA evaluates LLMs on nearly 45,000 real-world merge conflicts from projects like TensorFlow and the Linux kernel. Surprisingly, models with massive context windows do not always outperform smaller models; semantic understanding of comments and variable names matters. General-purpose LLMs sometimes outperform specialized code models. And critically: user studies with developers suggest that AI-generated resolutions are accepted at higher rates in practice than automatic metrics predict. The implication is clear: AI can assist conflict resolution effectively, but human review remains essential. The ideal workflow is AI-assisted resolution with human approval - governance in action.

Microsoft's DeepMerge (2022) achieved 37–78% accuracy depending on conflict complexity. MergeBERT achieved 63–68% using transformer-based three-way differencing. These numbers underscore that fully autonomous resolution is not yet reliable. Human oversight is not a nice-to-have; it is a requirement for correctness and project-specific context.

Pain points today

The pain is already real. When an agent overwrites a human edit, there is no easy undo. When two agents edit the same file, the result can be nonsensical. When a spec evolves over weeks with mixed human and agent contributions, nobody can confidently say what the "current" state is or how it got there. Version history, when it exists, is line-oriented - optimized for code, not for specifications with block-level structure. Conflict resolution is ad hoc: manual three-way merges, lost work, and frustration.

As agent adoption grows, the pain intensifies. More agents mean more concurrent edits. More automation means faster divergence if there is no governance. Teams that lack traceability and restore capability will find themselves in a dangerous position: living specs that have "grown" without accountability.

Consider a real scenario: a senior engineer approves an agent's refactor of an API spec. The agent merges several sections and rephrases requirements. A week later, a downstream consumer reports a breaking change. The team needs to restore the previous version - but the version history is sparse, and the diff between "before" and "after" is buried in a thousand-line markdown file. Without block-level history and one-action restore, recovering takes hours. With proper governance, it would be a single click.

Solutions

We need systems that:

Provide full version history. Every edit creates a new version. Users can browse history, compare any two versions, and restore any previous state. No lost work, even after agent edits.
Offer block-level diff. Specifications are not code. Paragraph-level or block-level comparison is more meaningful than line-by-line. Diffs should show added, removed, and changed blocks - and who changed them.
Support conflict resolution with clear options. When conflicts occur, the human chooses: keep current (canonical), use branch (e.g., agent's version), or merge on desktop. No silent overwrites.
Track provenance. Each version carries metadata: human, agent, or system. Author labels ("You," "Agent," "System") make it clear who changed what. This enables accountability and targeted rollbacks.
Enable one-action restore. Restoring a previous version should be a single action - no complex merge, no manual copy-paste. The human stays in control.
Keep specs living while remaining auditable. Governance does not mean locking specs down. It means that evolution is traceable, reversible, and under human oversight. Specs stay living, manageable, and auditable.
Scale with agent adoption. As more agents join the workflow, governance must scale. Batch restores (e.g., "revert all agent changes in this session"), bulk provenance filters ("show only human edits"), and clear delegation ("this section is agent-editable; this section requires approval") become essential. Governance that works for one human and one agent must work for ten humans and five agents.

Organizational and economic implications

Governance is not merely a technical concern. It has organizational and economic dimensions. When specs evolve without traceability, accountability breaks down. Who approved that change? Was it the product owner or an agent? In regulated industries, audit trails are mandatory. In fast-moving startups, they prevent "how did we get here?" confusion. Provenance tracking - human vs agent vs system - is not just a feature; it is a prerequisite for responsible deployment of AI in critical workflows.

Gartner's observation that AI may require more skilled engineers is instructive. The engineers of the future will spend less time writing boilerplate and more time orchestrating agents, reviewing outputs, and maintaining governance. The teams that build governance into their spec workflow now will have a structural advantage: they can scale agent usage without losing control.

Regulatory frameworks are also evolving. As AI is used in critical systems - healthcare, finance, infrastructure - provenance and auditability become compliance requirements, not merely best practices. Governance that tracks human vs agent edits, provides full history, and supports one-action restore will be table stakes for teams operating in regulated domains.

Why the human stays in control

Marc Andreessen has argued that "agents are eating software" - that we are moving from static applications to dynamic, agent-driven systems. But consumption is not the same as abdication. The human remains the orchestrator. The human defines intent, approves changes, and retains final authority. AI assists; it does not replace judgment.

Replit CEO Amjad Masad has emphasized that coding agents transform development through a partnership model. AI handles routine implementation; humans focus on higher-level direction, creativity, and judgment. That partnership requires governance: clear handoffs, traceability, and the ability to correct course when the agent goes astray.

Academic work on AI-assisted conflict resolution reinforces this. User studies with developers on real-world conflicts suggest that AI-generated resolutions are most valuable when paired with human review. The ideal workflow is not fully autonomous resolution, but AI-assisted resolution with human approval. The human stays in control.

Looking ahead

The next few years will test whether the industry adopts governance early or reacts after painful incidents. Teams that treat specs as living, versioned, and provenance-tracked will scale agent usage with confidence. Teams that treat specs as flat files in a repo will hit limits: overwrites, lost work, and opaque evolution. The tools exist today to build governance into the spec workflow. The question is whether organizations will prioritize it before the pain becomes acute.

The stakes

The last thirty years have shown that governance mechanisms determine who thrives in paradigm shifts. Teams that adopted version control early could collaborate at scale. Teams that adopted code review could maintain quality. Teams that adopted CI/CD could ship faster without sacrificing stability. The shift to agentic development demands a new governance layer: version history, conflict resolution, traceability, and restore. The teams that implement it now will have living specs that evolve safely. The teams that delay will face chaos - specs that have grown without accountability, changes that cannot be undone, and a loss of control just when it matters most.

Specularis exists to provide that governance: version history and block-level diff, conflict resolution with clear options, provenance tracking (human vs agent vs system), and one-action restore. You stay in control while specs stay living, manageable, and auditable. The paradigm is shifting. Governance is how we ensure it shifts in our favor.

References

Andreessen Horowitz (Yoko Li), "Emerging Developer Patterns for the AI Era," May 2025. a16z.com
Gartner, "AI Will Not Replace Software Engineers (and May, in Fact, Require More)."
CONGRA: Benchmarking Automatic Conflict Resolution (2024), OpenReview/PromptLayer.
Microsoft Research, "DeepMerge: Learning to Merge Programs" (TSE 2022).
Marc Andreessen / a16z, "Software ate the world. Agents are eating software."
Replit CEO Amjad Masad on AI coding agents, autonomy, and the partnership model.