A Practical Framework for Safe Deployment of Autonomous AI Agents
Motivation
Autonomous coding agents can greatly accelerate productivity, but they also introduce operational, financial, and security risks.
The author decided to publish and circulate this paper because the topic is timely, given the proliferation of fully autonomous agents with unrestricted access to resources, which could and will result in disastrous consequences.
This paper presents an architecture for safer autonomy:
explicit Functional Requirements instead of vague prompts,
operating system–enforced containment,
staged promotion controls,
deterministic stop conditions,
and auditable execution artifacts.
The paper is written to help engineering leaders, security teams, and executives move from experimentation to controlled deployment without treating (often inadequate) model obedience as a security mechanism.
This is the author’s practical approach rather than a universal policy template.
The objective is practical adoption through OS-enforced authority, measurable progress, and reversible outcomes.
Overview
Improving productivity by at least one order of magnitude is not only attractive but also becoming a competitive necessity for businesses.
However, autonomous software agents that can read repositories, generate code, execute commands, and interact with chat, email, data, and Internet systems represent a qualitative shift in operational risk.
These systems are not productivity tools in the traditional sense. They are actors capable of modifying state, consuming financial resources, and introducing security vulnerabilities in milliseconds.
The central challenge is not whether agents increase developer velocity. They do. The challenge is whether their authority is bounded, observable, and reversible.
This paper presents a practical framework for deploying autonomous or semi-autonomous coding agents in a controlled environment. It is written for engineers designing systems, security teams responsible for governance, and corporate decision-makers accountable for operational risk.
The core thesis is simple: autonomous agents are best treated as high-privilege distributed systems. They benefit from containment, separation of duties, economic guardrails, and auditability by design.
This requirement becomes more critical as organizations push agent continuous runs from minutes to hours, days, or weeks. Without strict control loops, long-running autonomy can consume tens of thousands of dollars while delivering mediocre outcomes. The framework below treats long-running agent execution as an operations discipline with explicit ownership, measurable progress, and clear stop conditions.
Scope and Non-Goals
This framework targets coding agents with execution authority, including agents that can modify files, run commands, execute tests, and initiate delivery workflows.
It does not focus on read-only advisory models (chat) that cannot change the system state.
It assumes that baseline engineering infrastructure already exists, including version control, CI pipelines, and protected branches.
Organizations and individuals can adapt these controls to fit their own regulatory, technical, and operational context.
Reframing the Risk Model
Traditional developer tooling assumes a human-in-the-loop at all times. An IDE may suggest code, but a human compiles, runs, and deploys it.
Autonomous agents collapse these boundaries.
An agent that can:
modify source files
execute shell commands
access network resources
run tests
open pull requests
deploy artifacts
is functionally equivalent to a junior engineer with root access, creativity, and infinite stamina.
Risk, therefore, shifts from “prompting” with “incorrect suggestion” to:
silent corruption of data or code
lateral movement across filesystem boundaries
accidental exposure of credentials
runaway cloud API costs
infinite execution loops
dependency injection attacks
supply chain manipulation
system crashes
Security posture needs to evolve accordingly. It is useful to treat agents less as assistants and more as privileged processes.
Functional Requirements as Execution Contracts
Long-running agent runs should not start from a vague prompt.
Each run should bind to explicit Functional Requirements (FRs) that are testable and auditable (following S.M.A.R.T. best practices).
Each FR should define:
FR identifier and owner.
Objective and non-goals.
Allowed input and output boundaries.
Acceptance criteria tied to executable tests.
Security constraints and prohibited actions.
Cost ceiling and runtime ceiling.
Done criteria and rollback criteria.
If an objective cannot be expressed as an FR with executable acceptance criteria, it is likely not ready for autonomous execution.
Directory-Scoped Agent Threads
One practical pattern for safe scaling is to treat each repository directory as an independent agent thread with its own contract, tests, and policy boundary.
Agents (and human engineers) have limited context windows.
In this model, autonomy is partitioned by directory rather than by a whole repository-wide agent context.
Each directory-scoped thread should maintain:
Its own Functional Requirements document.
Its own unit test scope and acceptance thresholds.
Its own “WORK_PLAN.json” for declared intent and stop conditions.
Its own “config.yaml” for runtime limits, tool allowlists, and policy references.
Its own security review artifact is linked to execution history.
Operational boundary rules should be explicit:
A thread may read and write only within its directory scope and declared shared interfaces.
Cross-directory modification requires a formal handoff or integration gate.
An agentic thread should not mutate another thread’s policy artifacts without explicit promotion approval.
Shared dependency or infrastructure changes should be routed through a dedicated integration thread.
This pattern introduces practical benefits by converting autonomy from a single broad permission domain into many small, auditable execution domains.
The governance benefits are substantial:
Smaller blast radius for faulty behavior.
Cleaner ownership and accountability by subsystem.
More deterministic rollback and incident containment.
Per-thread budget tracking and performance measurement.
Safer parallel execution across independent workstreams.
This pattern also introduces complexity and should be adopted with discipline:
Fragmented policies can drift across directories.
Integration risk can move from the code level to the contract level.
Operational overhead increases if standards are inconsistent.
These risks can be controlled through shared policy templates, schema validation for config.yaml and WORK_PLAN.json, and periodic integration verification across thread boundaries.
Principles for Safe Agent Deployment
The following principles form a practical baseline.
Least Privilege by Construction
Do not rely only on prompt instructions to restrict behavior. Enforce boundaries at the operating system and runtime levels.
Concrete measures:
Run agents under dedicated OS users.
Use access-limiting disk mounts for repository sources.
Allow writes only to explicitly declared staging directories.
Remove undesirable network traffic access by default.
Provide whitelisted command wrappers instead of raw shell access.
If an agent needs to run tests, expose a single controlled binary, such as `run_tests.sh`. Avoid unrestricted shell access.
Separation of Duties
Avoid designing a single all-powerful agent.
Instead, fragment authority across specialized agents:
Planning agent: read-only access, writes append-only plan files.
Code generation agent: writes only to staging.
Security review agent: repository read-only, produces risk reports.
Test execution agent: allowed to execute only approved test directories.
Cost control service: enforces budget policy and can terminate runs.
Progress evaluator: evaluates no-progress runs from test outcomes.
Promotion gate: a deterministic program that validates policy compliance before merge.
Each agent should have a single narrow capability.
No single agent should be able to read, write, execute, and deploy unilaterally.
This mirrors established enterprise controls such as code review workflows and financial approval chains.
Deterministic Guardrails
Autonomous systems benefit from hard limits:
Maximum wall-clock runtime per job.
Maximum tool calls per job.
Maximum file modifications per run.
Maximum token or API budget per request and per month.
Explicit iteration caps in reasoning loops.
Explicit no-progress iteration caps based on test outcomes.
Maximum CPU, GPU, and memory usage to prevent system crashes.
Each run should declare these limits in a machine-readable policy before execution begins.
When limits are exceeded, the system should slow or stop and avoid circumventing constraints.
Stop behavior should be deterministic:
Stop all new tool calls.
Record an append-only stop event with a reason code.
Persist the execution manifest, logs, and diff snapshot.
Revert to the last safe checkpoint or quarantine the workspace.
Require explicit human re-authorization before any resume action.
Operating System–Enforced Sandbox Isolation
Prompt restrictions are (only) advisory controls, and they fail.
Autonomous execution requires enforceable containment at the operating system boundary, where permissions, process identity, and mount policy determine what an agent can and cannot do.
Why macOS Is Suitable for This Model
The author uses macOS as the reference platform in this section because it provides the strong security needed for autonomous-agent workloads.
The macOS provides practical primitives for this containment architecture:
A strong Unix permission model for deterministic access control.
First-class disk image support for mount-based writable isolation.
User-level isolation for dedicated execution identities.
Full-disk encryption support for protected data at rest.
Large unified memory (64-512 GB of RAM) and Metal (hardware) acceleration for local LLM workloads.
Mature BSD-derived process and sandboxing architecture.
Dedicated Execution Identity
Agent workloads should run under a separate, non-administrative macOS user account dedicated to automation.
This account should have no sudo privileges and no access to sensitive user directories, credential stores, or administrative control paths.
All agent subprocesses inherit this identity.
As a result, every child process inherits the same permission limits, producing deterministic containment across the execution tree.
Isolated Filesystem Boundary
A dedicated APFS disk image mounted as a sandbox volume provides a clean, writable boundary for autonomous runs.
A mounted volume is preferable to a normal directory because mount lifecycle, OS group ownership, and access controls are explicit and independently managed from the host filesystem hierarchy.
The sandbox volume should be the only writable location available to the agent runtime. Read access outside this boundary should be denied by OS, narrowly scoped, and policy-defined.
Detaching the volume serves as an immediate kill switch for agent writes and constrains further state mutation.
Group-Based Access Control
A dedicated operating system group should define which users and services are allowed to access the sandbox volume.
Access is enforced by Unix permissions and ACL policies, not by model instructions.
Without explicit operating-system reconfiguration, agent processes cannot extend their file-system scope beyond the mounted boundary.
Execution Surface Minimization
Autonomous runtimes should expose whitelisted command wrappers rather than unrestricted shell execution.
Execution rights should be narrowly scoped to predefined scripts and approved operational interfaces.
Privilege is granted through file system and process policies, not by prompt language. This reduces ambiguity and prevents instruction-level bypass attempts.
Optional Network and Resource Controls
Outbound network access should be explicitly permitted where required (e.g., for API calls) and denied by default elsewhere.
Additional containment layers should define CPU, memory, and runtime ceilings for agent workloads to limit runaway execution (system crashes) and protect host stability.
These controls complement filesystem isolation and provide bounded behavior under failure conditions.
Architectural Conclusion
Operating system enforcement transforms agent control from instruction-based guidance to policy-enforced boundaries.
Autonomous execution should be constrained first by operating system primitives, then by higher-level controls such as Functional Requirements, unit tests, work planning, and security review.
Economic Controls
Agentic systems introduce a new class of financial risk: algorithmic spending.
A poorly bound reasoning loop can consume thousands of dollars in API usage without producing meaningful output.
In practice:
All external API calls should flow through a single gateway.
Per-request cost estimation should occur before execution.
Monthly budget caps should be enforced programmatically.
Context size should be minimized by design.
Static content should be cached to avoid repeated transmission.
Real-time spend should be tracked during execution, not only at the end of the job.
Economic policy is most reliable when implemented in code rather than merely documented in guidelines.
Practical budget policy for multi-day runs:
Warning threshold: 70% of the allocated budget.
Escalation threshold at 90%, requiring explicit approval to continue.
Hard stop at 100% with automatic job termination.
Optional degrade-to-local LLM mode only if FR acceptance criteria still remain achievable.
Observability and Auditability
Autonomous behavior should be fully inspectable.
Every run should produce a structured execution manifest including:
Agent identity
FR identifiers
WORK_PLAN hash
Start and end timestamps
Files read
Files written
Diff summary
External calls made
Token usage
Cumulative cost versus budget
Per-iteration progress score
Exit condition
Stop reason code, if terminated
Logs should be append-only. Overwriting history should be avoided.
Security teams should be able to reconstruct any run without relying on agent self-description.
In addition, anomaly detection rules should be implemented:
Unexpected file type modifications.
Large diffs beyond defined thresholds.
New dependencies introduced.
Changes to security-critical directories.
Repeated iteration without state improvement.
Reversibility and Recovery
Agent actions should be reversible where feasible.
Modifications should occur in:
Version-controlled repositories, or
Disposable sandbox volumes.
Rollback should be achievable with a single command or mount discard. If recovery requires manual reconstruction, the system is usually not production-ready.
Periodic snapshotting of working directories provides an additional safety net.
WORK_PLAN.json as the Execution Contract
Before modifying code, agents should declare intent in a structured format using WORK_PLAN.json.
A WORK_PLAN.json document should include:
Unique plan identifier.
Referenced FR identifiers.
Declared target files.
Expected changes.
Risk classification.
Test coverage requirements.
Iteration and no-progress limits.
Budget limits and stop thresholds.
Git branch and promotion policy.
Security gate policy.
Mandatory stop conditions.
Subsequent diffs can be compared against declared intent. If actual modifications exceed the declared scope, execution is halted.
This creates a contract between intention and effect.
Development and Staging Git Policy
Long-running autonomous work is safer when isolated from production branches.
Recommended policy:
Agents can write only to ephemeral agent branches.
Direct commits to protected branches should be avoided.
Merging from the agent to development requires tests and security gates.
Merging from development to staging requires policy validation.
Promotion from staging to main requires explicit human approval or a pre-approved policy gate.
Every commit message should include plan_id and the iteration number for auditability.
Force-push is disallowed on protected branches.
Human-in-the-Loop at the Right Boundary
Full manual review of every command does not scale; that is the chat mode.
However, a single approval gate at the transition from staging to production is defensible.
Recommended pattern:
Agents operate in staging.
Security and test gates run automatically.
Promotion requires explicit human approval, or policy-based automated approval only for pre-scoped low-risk changes with zero critical findings and a full test pass.
This balances velocity and accountability.
Testing Beyond Functionality
Unit tests validate correctness. They do not validate containment.
Minimum unit-test requirements for autonomous execution:
The existing unit test suite should pass before any run starts.
Changed production files should have corresponding unit tests in the same run.
Coverage on changed files should meet the threshold declared in WORK_PLAN.json.
Global test coverage regression beyond declared tolerance triggers an automatic stop.
New FR acceptance criteria should map to executable test cases.
No-progress detection should be test-driven:
An iteration counts as progress only when it reduces failing tests, satisfies new acceptance tests, or improves declared quality metrics.
If no measurable progress is made for x consecutive iterations, the run should terminate automatically.
The x should be declared before the run starts in WORK_PLAN.json; it should not be changed mid-run without explicit approval.
Additional test categories are required:
Security tests
Path traversal attempts.
Unauthorized write attempts.
Network access attempts.
Secret information exfiltration simulation.
Economic tests
Simulated high-cost prompt loops.
Budget overflow scenarios.
Behavioral tests
Iteration cap enforcement.
Proper abort on missing success criteria.
Testing should intentionally attempt to break the system.
Security Review and Mandatory Stop Conditions
The security review should be independent of the code-generation role.
Required controls:
Static analysis and dependency vulnerability scanning on every run.
Secret scanning before any push or pull request creation.
Policy checks for unauthorized paths, network calls, and command usage.
Security review output stored as an append-only artifact linked to plan_id.
Recommended stop conditions for strict deployments:
Any critical or high-severity security violation.
Any confirmed secret information exposure.
Any unauthorized access to a filesystem or network.
Any budget limit breach.
Any no-progress limit breach.
On stop, the system should terminate execution, record the reason, and require human approval before restarting.
Standardized stop reason codes improve auditability:
`STOP_SECURITY_VIOLATION`
`STOP_SECRET_EXPOSURE`
`STOP_UNAUTHORIZED_ACCESS`
`STOP_BUDGET_EXCEEDED`
`STOP_NO_PROGRESS`
`STOP_RUNTIME_EXCEEDED`
The Role of the Software Engineer in an Agentic Development Landscape
In an agentic development landscape, software engineers remain the accountable technical authority for system intent, control design, and production impact.
Software engineers in this landscape are:
Idea generators
Architects
Technical requirements creators
Novel algorithm designers
Unit test designers (not coders)
Code functionality assessors
Idea Generator
Engineers define the problem space, scope, constraints, and non-goals.
Agents execute within defined intent; engineers are responsible for the clarity and correctness of that intent.
Poorly framed problems produce well-executed but misaligned outcomes.
Architect
Engineers design system boundaries, trust zones, and promotion gates.
They define how autonomy is partitioned, isolated, and governed.
Architecture is the primary mechanism for safe autonomy.
Technical Requirements Creator
Engineers translate objectives into explicit Functional Requirements with measurable acceptance criteria.
They define runtime ceilings, budget ceilings, security constraints, and mandatory stop conditions.
Agents should not operate on vague goals; engineers are accountable for formalizing execution contracts.
Novel Algorithm Designer
Engineers remain responsible for designing new algorithms, strategies, and system-level innovations.
Agents may implement known patterns efficiently, but conceptual and strategic breakthroughs remain (for the time being) a responsibility of human engineering.
Architectural trade-offs and long-term system evolution remain under engineering ownership.
Unit Test Designer (Not Coder)
Engineers design the tests that encode correctness and safety.
They define coverage expectations, regression safeguards, and metrics for no-progress detection.
Tests serve as control instruments for autonomous execution, not merely validation tools.
Engineers are responsible for the integrity and completeness of test design.
Code Functionality Assessor
Engineers evaluate whether the generated code satisfies intent beyond superficial test success.
They assess edge cases, performance characteristics, maintainability, architectural consistency, and security implications.
Final promotion decisions should remain grounded in human judgment and technical expertise.
Autonomous agents execute within constraints. Software engineers design the constraints. Responsibility for system behavior, safety, and production impact remains with human engineers.
Governance and Organizational Implications
Corporate leadership should recognize that deploying autonomous coding agents is a governance decision, not only a tooling decision.
Key considerations:
Who owns agent configuration?
Who defines privilege scopes?
Who monitors cost and behavior?
What is the incident response plan for agent-induced failures?
Are there audit requirements for regulated environments?
Agent infrastructure should be reviewed with the same rigor as CI/CD pipelines and production deployment systems.
Conclusion
Autonomous agents can materially improve developer productivity and system maintenance.
However, their deployment is most effective when engineered with explicit constraints.
The appropriate mental model is not “advanced autocomplete.”
It is “a distributed automated contributor with privileged access.”
Safe deployment requires:
Least privilege enforcement.
Separation of duties.
Deterministic guardrails.
Functional requirements as executable contracts.
Economic caps.
Full observability.
Reversible changes.
Structured WORK_PLAN.json declarations.
Test-driven progress gates.
Independent security review.
Enforced the git promotion policy.
A defined human approval boundary.
Autonomy is not inherently unsafe. Unbounded autonomy is.
Organizations that treat agent systems as first-class operational actors, subject to the same architectural discipline as production services, will capture their benefits while containing their risks.
Maturity Levels
A practical adoption path avoids false binary choices between “no agents” and “full autonomy.”
Level 0: Prompt-only experimentation.
Level 1: Functional Requirements-driven, staging-only agent writes.
Level 2: Disk-restricted, gated performance verification and budget controls.
Level 3: OS-policy, long-running autonomous programs.


The 70/90/100% budget gate structure is elegant - cleaner than the fuzzy version I've been running. The OS-enforced sandbox point is where I keep landing too. Self-imposed rules don't hold under pressure; the constraint needs to be architectural, not a promise. What I added on top: explicit 'blast radius' documentation before each agent run. It states what it can affect. Reviewability is the piece nobody builds first but ends up most valuable for overnight automation: https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1 That review layer compounds in value over time.