Case Study: The PM Agent Team
This is the story of one Compound Sprint, start to finish, with every artifact shown. We replaced a project coordinator role with an agent team — five AI agents handling the coordination work that had consumed twenty hours a week of human labor. It took about six weeks from Signal to Deliver, with a platform pivot in the middle that nearly derailed the whole thing. Every stage of the Compound Sprint is here with its artifacts and design decisions. If you’ve read the framework chapters and wondered what a full sprint looks like in practice, this is it.
The role belonged to a subcontractor we’ll call Rachel. She was hired in August 2024 for graphic design and content management, then transitioned into project coordination by September. The transition happened fast — within weeks she’d moved from creating visual assets to owning the operational backbone of how work moved through our project management system. By the time we decided to replace the role with agents, her responsibilities had expanded to cover project tracking, scorecard updates, meeting facilitation, social media scheduling, task creation from transcripts, video editing coordination, and QA work alongside our VP of Operations, Sofia.
The decision to replace the role wasn’t just structural. The work was overwhelmingly rule-based — the kind of work agents handle well. And Rachel’s performance had become a problem. She was slow, non-responsive, and increasingly free with her time without explanation. Sofia flagged it directly in one of our design sessions: “A lot of things are not done properly” and “it’s becoming a habit for her to be very free with her time and not give any explanations.” The constraint was the combination: rule-based work that agents can handle, performance issues with the person doing it, and $24K/year in subcontractor cost while we were falling behind on projects.
Signal: Naming the Constraint
The constraint showed up before it showed up in any conversation. We were getting behind on projects, and that was impacting our billing — we couldn’t bill because we weren’t finishing work on time, so cash flow was suffering.
But the constraint wasn’t just financial. Sofia had been watching Rachel’s output deteriorate. She put it plainly: “She has a high salary, and for somebody that is working 20 hours a week, that’s not fair.” She went further: “If I don’t or we don’t see any change in the next few weeks, I wouldn’t even keep her.”
There wasn’t a single “this is the constraint” moment. The signal accumulated. Revenue was falling short. Rachel’s performance was declining. And the more we looked at what the role actually entailed, the clearer it became that most of the work was information movement — taking data from one place, putting it in another, following rules that didn’t require judgment. Sofia and I had been having versions of this conversation for months before we formalized it.
Artifact: Constraint Statement
| Field | Detail |
|---|---|
| Constraint | A $24K/year subcontractor role is consuming budget on work that is 70-80% rule-based data operations (CRM updates, scorecard maintenance, task creation, scheduling), while the person in the role is underperforming and the company is falling behind on project delivery — causing billing delays and cash flow pressure. |
| Cost | $24K/year subcontractor cost + opportunity cost of not reallocating that budget to revenue-generating work + cost of slow/missed coordination on active projects |
| Owner | Jesse Flores / Sofia Reyes (VP of Operations, Human Orchestrator) |
| Evidence | Multiple leadership conversations documenting staffing issues, cash flow impact from delayed project delivery, and Sofia flagging Rachel’s performance and time management |
Source: Mapping the Knowledge
Sofia was the key to this stage. She had trained Rachel, created her 90-day plan, and was accountable for operations and project delivery. If anyone knew what Rachel actually did — not the job description version, the real version — it was Sofia.
The source mapping happened organically through our design sessions. Sofia and I sat down for a two-hour working session that became the origin story for the PM Agent Team. We pulled up the Hybrid Accountability Chart, started walking through what needed to happen, and the picture became clear fast. Sofia summarized the shape of the work: “A lot of the effort that we expend is literally moving information from one place or one person to another. And there’s not really a lot of — there’s no value add other than coordination.”
That was the Source insight. The knowledge domains weren’t exotic. They were data operations wrapped in a job title.
But when we started listing what Rachel “did,” we realized we were conflating different things. Her accountability chart had roles on it — project coordination, QA support, content management — and those roles contained tasks, and the tasks had different characteristics. We needed to pull those apart before we could design anything.
Artifact: Roles, Tasks, and Accountability Deconstruction
| Role (from Accountability Chart) | Tasks within Role | Task Type | Judgment Required | Current Owner |
|---|---|---|---|---|
| Project Coordination | Update CRM project records | Data entry | None — rule-based | Rachel |
| Create tasks from project charters | Pattern extraction | Low — follows templates | Rachel | |
| Assign tasks to team members | Decision-making | Medium — requires capacity awareness | Rachel / Sofia | |
| Flag overdue and blocked work | Monitoring | None — data-driven | Rachel | |
| Manage project handoff process | Process execution | Low — follows checklist | Rachel | |
| Scorecard & Reporting | Pull weekly scorecard numbers | Data aggregation | None — rule-based | Rachel |
| Enter numbers into scorecard template | Data entry | None — rule-based | Rachel | |
| Interpret variance and flag issues | Analysis | Medium — requires context | Sofia | |
| Meeting Support | Schedule and send agendas | Scheduling | None — rule-based | Rachel |
| Capture action items from transcripts | Extraction | Low — pattern matching | Rachel | |
| Create CRM tasks from action items | Data entry | None — rule-based | Rachel | |
| Content & Asset Management | Create visual assets | Creative production | High — brand judgment | Rachel / Lucas |
| Schedule social media posts | Scheduling | Low — follows calendar | Rachel | |
| Coordinate video editing workflow | Creative coordination | Medium — editorial judgment | Rachel / Lucas | |
| Client Operations | Support client onboarding process | Relationship + process | High — relationship context | Rachel / Sofia |
| Manage client communication cadence | Relationship | High — trust and context | Sofia | |
| Quality Assurance | Execute QA checklists | Process execution | Medium — judgment calls | Rachel / Sofia |
The deconstruction made the design decision obvious. The roles that looked like one job were actually a collection of tasks with wildly different judgment requirements. The low-judgment, rule-based tasks clustered together — and they were the majority of Rachel’s hours.
Artifact: Knowledge Map
| Source | Type | Owner | Status | Pipeline | Notes |
|---|---|---|---|---|---|
| CRM project records | Digital | Rachel (maintained) / Sofia (accountable) | Clean | Connected — API available | Core system of record for all project tracking |
| Scorecard data | Digital | Rachel (updated weekly) | Clean | Manual — Rachel pulled and entered numbers | Numbers came from multiple sources; Rachel was the aggregator |
| Meeting transcripts | Digital | Automated (transcription service) | Needs Work | Manual — Rachel read transcripts and created tasks by hand | Unstructured; required human interpretation to extract action items |
| Visual asset files | Digital | Rachel | Clean | Google Drive / Canva | Original hire scope; became secondary after coordinator transition |
| Client onboarding process | Organic | Sofia / Rachel | Needs Work | Manual | Some documented, some in Rachel’s head |
| QA checklists and processes | Organic | Sofia / Rachel | Partially documented | Manual | Sofia and Rachel coordinated; Sofia retained the judgment calls |
| Project handoff procedures | Organic | Rachel | Needs Work | Manual | Rachel had a documented handoff process but unclear how complete |
| Client relationship context | Organic | Rachel | At Risk | None — lived in Rachel’s head | Which clients needed extra lead time, unwritten rules per account |
| Video editing workflow | Organic | Rachel / Lucas | Coordination role | Manual | Rachel was coordinating editors, not editing herself |
Artifact: Source Classification
| Source | Structured / Unstructured | Durable / Ephemeral | AI Tier |
|---|---|---|---|
| CRM project records | Structured | Durable | Tier 1 — AI can use directly via API |
| Scorecard data | Structured | Durable | Tier 1 — once pipeline is automated |
| Meeting transcripts | Unstructured | Durable | Tier 2 — AI can process with extraction pipeline |
| Visual asset files | Unstructured | Durable | Tier 3 — creative judgment required for creation; storage/retrieval is Tier 1 |
| Client onboarding process | Unstructured | Durable (if documented) | Tier 2 — needs capture before AI can execute |
| QA checklists | Semi-structured | Durable | Tier 1 — if documented; Tier 3 if judgment-dependent |
| Project handoff procedures | Semi-structured | Durable | Tier 2 — needs formalization |
| Client relationship context | Unstructured | Ephemeral | Tier 3 — requires human capture; some elements can become Tier 2 |
| Video editing workflow | Unstructured | Durable | Tier 2 — coordination is automatable; editorial judgment is not |
Artifact: Source Completeness Test
| Check | Result |
|---|---|
| Designer test. Could someone who wasn’t in the room look at this map and know what they’re designing against? | Yes — sources, types, owners, and pipeline status are all named. The Roles/Tasks deconstruction makes it clear which work is rule-based and which requires judgment. |
| Constraint scope test. Every row connects to the constraint. | Yes — every source feeds Rachel’s coordination work. No extraneous systems included. |
| Pipeline test. Every digital source has a pipeline status. | Yes — CRM is Connected, scorecard is Manual, transcripts are Manual, assets are Connected. |
| At-risk test. Single points of failure are named. | Yes — client relationship context is flagged as At Risk (lives in Rachel’s head, no capture pipeline). |
| Gap test. The Missing column isn’t empty. | Gap identified: no historical data on task estimation accuracy (would validate whether AI-generated estimates match human judgment). No documented exception-handling rules for client-specific processes. |
| One-page test. The map fits on one page. | Yes — scoped to the coordination constraint, not a general data audit. |
Design: Allocating the Work
The design happened across two sessions — both with Sofia, totaling about three hours.
The first session was the big one. We pulled up the Hybrid Accountability Chart, started walking through every function, and Sofia started articulating the shape of the problem. The conversation wasn’t “who should we hire?” It was “what does this role actually do, and how much of it requires a human?”
I framed it as a manufacturing problem. If you think about digital work the way you think about physical production — input, transformation, output — then most of what Rachel was doing was assembly-line work:
“If you thought about digital work as being manufactured the same way you would physical work — if we thought about making a car — you’ve got to source raw materials, you’ve got to have some machining done, you’ve got to start assembling different pieces. The only reason robots work in a factory is because the processes are so well thought through that it’s really easy for a robot to say, once something comes here, I get this thing, I do this thing, and then I move it over here. Input, transformation, output.”
That framing — the information factory — became the mental model for how we designed the agent team. If you could map the inputs, transformations, and outputs for each piece of Rachel’s work, you could figure out which robots to build.
Sofia got it immediately: “I think if this goes well, it will be only a matter of figuring out — yeah, doing a couple of iterations. But I think this is where you can see, man, there’s so many interesting applications that could work for so many different kinds of companies.” She also flagged the prerequisite: “It kind of depends how well you manage your initial data. We know we have almost everything in the CRM. So that makes it easier. I think other companies will have to face that question first — where’s our information? How good is it?”
In that session, I designed four agents on a whiteboard. I didn’t just describe them — I designed them. The difference matters. Describing an agent is saying “we need something that creates tasks.” Defining an agent means specifying its inputs, outputs, triggers, and guardrails. Every agent went through both stages in that session.
Artifact: Agent Description to Definition
| Agent | Described (Conceptual) | Defined (Spec’d) |
|---|---|---|
| Task Agent | “We need something that creates tasks from project charters” | Input: Project charter + milestone definitions from CRM API. Output: Complete task descriptions with SMART outcomes, point estimates, dependency ordering — written back to CRM via API. Trigger: Daily poll at 7:15am for projects needing tasks. Guardrails: Cannot modify existing tasks; creates only. All generated tasks posted to Slack for review. |
| Coordination Agent | “We need something that assigns work and flags problems” | Input: All open tasks + team capacity data + skills matrix from CRM. Output: Task assignments, conflict flags, daily summary posted to Slack, review task for Sofia. Trigger: Daily at 7:45am. Guardrails: Cannot override Sofia’s manual assignments. Excluded users list (Sofia, Jesse). Creates a daily review task so Sofia sees everything before the team does. |
| Reporting Agent | “We need something that generates scorecards automatically” | Input: CRM project data + time tracking data (Kimai). Output: Daily reports (completed tasks, overdue items, open cases) + weekly reports (cost variance, schedule variance, team productivity). Uses earned value management: EV = estimated points x completion %, AC = hours logged. Trigger: Daily at 7:30am, weekly summary on Fridays. Guardrails: Read-only access to source systems. Reports posted to Slack, not sent to clients. |
| Garbage Collector | “We need something that cleans up bad data” | Input: Full scan of CRM task and project descriptions. Output: Flagged items with specific quality issues (missing descriptions, incomplete project definitions, unestimated tasks). Trigger: Daily at 7:00am. Guardrails: Cannot modify data — flags only. Sofia reviews flagged items and decides what to fix. |
The second session continued the design work. We reviewed the diagram, talked through capacity planning, and I pushed Sofia to think about how the agents would need data pipelines — where each agent gets its input and where it sends its output. Sofia raised the practical concern about the coordination agent needing to plan ahead: “I gave it a couple weeks, and for now, I think that’s enough.” I’d suggested 90 days; she was right to scope it down for the first version.
I named the whole thing on the whiteboard: “PM Agent Team.” Sofia would take the design and build from where I left off.
Artifact: Work Deconstruction
| Task | Before (Owner) | After (Owner) | Source / Input | Output / Destination | Rationale |
|---|---|---|---|---|---|
| CRM record updates | Rachel (manual) | Agent — Task Agent + Coordination Agent | Project charters, milestone definitions via CRM API | Updated CRM records, Slack confirmation post | Rule-based data entry; structured source, API available |
| Scorecard maintenance | Rachel (manual pull + entry) | Agent — Reporting Agent | CRM project data + Kimai time tracking data | Scorecard reports posted to Slack + archived in CRM | Data aggregation from known sources; no judgment required |
| Task creation from transcripts | Rachel (manual) | Agent — Task Agent | Project charters in CRM | SMART task descriptions written to CRM via API + Slack summary | Pattern extraction from project charters; AI handles well |
| Social media scheduling | Rachel (manual) | Agent — Content Scheduling Agent | Content calendar + approved assets in Google Drive | Scheduled posts in social platform + confirmation to Slack | Calendar-based; content decisions stay human, scheduling doesn’t |
| Meeting coordination | Rachel | Agent — Coordination Agent | Calendar events + agenda templates | Agendas sent to attendees + action items extracted to CRM tasks | Scheduling and agenda prep are rule-based; facilitation judgment stays human |
| Client onboarding coordination | Rachel / Sofia | Human (Sofia) + Agent assist | Client intake form + onboarding checklist in CRM | CRM client record + welcome sequence triggered + onboarding tasks created | Relationship component stays human; checklist execution goes agent |
| QA coordination | Rachel / Sofia | Human (Sofia) | QA checklists + project deliverables | QA sign-off records in CRM | Judgment-driven; Sofia retained full ownership |
| Visual asset creation | Rachel | Agent with Lucas review | Brand guidelines + creative brief | Draft assets in Google Drive for Lucas review | AI generates, human reviews for brand consistency |
| Video editing coordination | Rachel | Human (Lucas) | Raw footage + editorial direction from Sofia | Edited video delivered to client | Creative coordination requires relationship and editorial judgment |
| Project handoff management | Rachel | Agent — Coordination Agent | Handoff checklist + project records in CRM | Handoff completion record in CRM + notification to receiving team | Process-driven; automated if procedures are documented |
| Client relationship management | Rachel | Human (Sofia / account manager) | Client communication history + CRM contact records | Client emails, meeting notes logged in CRM | Judgment, trust, context — stays human |
Artifact: Hybrid Accountability Chart Entry
| Function | Human Role | Who | Agent Role | Which Agent |
|---|---|---|---|---|
| Task creation and estimation | Reviews and approves generated tasks | Sofia | Generates tasks from charters, estimates points, orders dependencies | Task Agent |
| Daily work assignment | Makes strategic prioritization calls | Sofia | Assigns tasks based on skills/capacity, flags conflicts | Coordination Agent |
| Progress reporting | Interprets reports, makes decisions | Jesse / Sofia | Generates daily and weekly variance reports | Reporting Agent |
| Data quality | Reviews flagged items | Sofia | Scans for incomplete descriptions, bad data | Garbage Collector |
| Client communication | Owns all client relationships | Sofia | Drafts status updates for review | Coordination Agent |
| QA coordination | Full ownership | Sofia | None | N/A |
| Visual assets | Reviews for brand consistency | Lucas | Generates drafts | AI generation tools |
| Video editing | Full ownership of creative direction | Lucas | None | N/A |
| Strategic decisions | Capacity allocation, project prioritization | Jesse / Sofia | Provides data for decisions | Reporting Agent |
Artifact: Information Flow — Client Communication
The Client Communication row in the Hybrid Accountability Chart involves the tightest collaboration between human and agent. Sofia owns all client relationships, but the Coordination Agent drafts status updates, pulls project data, and prepares the information Sofia needs to communicate. The swim lane below shows how information moves through that accountability.
The handoff points are where governance matters most. The Coordination Agent can pull data and draft — it cannot send anything to a client. Sofia reviews every outbound communication. That boundary is non-negotiable.
Artifact: Governance and Guardrails
| Domain | What Agents Can Do | What Agents Cannot Do | Escalation Trigger | Review Cadence | Kill Switch Condition |
|---|---|---|---|---|---|
| Task creation | Generate task descriptions with SMART outcomes, point estimates, and dependency ordering from project charters | Modify or delete existing tasks; override manually created tasks | Task Agent generates a task with zero confidence on scope or estimation | Sofia reviews generated tasks daily via Slack summary | Agent enters an infinite loop or generates more than 50 tasks in a single run |
| Work assignment | Assign tasks based on skills matrix and capacity data; flag conflicts | Override Sofia’s manual assignments; assign work to excluded users (Sofia, Jesse) | Coordination Agent detects a scheduling conflict it cannot resolve | Sofia reviews assignments daily before the team sees them | Agent assigns work to excluded users or assigns the same task to multiple people |
| Reporting | Generate daily and weekly variance reports using earned value management | Send reports directly to clients; modify source data in CRM or time tracking | Report shows variance exceeding 20% on cost or schedule | Jesse and Sofia review weekly reports every Friday | API token spend exceeds $50 in a single day or agent fails to post for two consecutive days |
| Data quality | Scan for incomplete descriptions, missing estimates, and data quality issues | Modify any data — flagging only | Garbage Collector flags more than 30% of active tasks as deficient | Sofia reviews flagged items weekly | Agent attempts a write operation on any record |
| Client communication | Draft status updates; pull project data for Sofia’s review | Send any communication to a client; access client email directly | Draft contains language Sofia hasn’t approved or references confidential data | Sofia reviews every draft before sending | Agent sends any outbound communication without human approval |
| Reminders | Send personalized Slack DMs to team members listing overdue and blocked tasks | Contact anyone outside the internal team; escalate on its own | Team member reports receiving inaccurate or duplicate reminders | Sofia reviews reminder logs weekly | Agent sends more than 10 DMs to a single person in one day |
Artifact: Design Brief — PM Agent Team
Design Brief — PM Agent Team
This is the design brief as it existed when we moved from Design into Build. It captures the full picture of what the agent team would do, who it affected, and what it needed to work.
Workflow Summary
The PM Agent Team replaces the coordination work previously done by a subcontractor. Five agents run on weekday mornings between 7:00am and 7:45am, each handling a specific coordination function: task creation from project charters, daily work assignment and conflict flagging, variance reporting with earned value management, personalized reminders for overdue and blocked work, and data quality scanning. All agent output posts to Slack for human review. Sofia Reyes supervises the team as Human Orchestrator, monitoring by exception rather than approving every action.
Stakeholders
| Stakeholder | Relationship to Agent Team |
|---|---|
| Sofia Reyes (VP of Operations) | Human Orchestrator — supervises all agents, reviews output daily, intervenes on exceptions, owns client relationships |
| Jesse Flores | Strategic oversight — sets policy, reviews weekly reports, makes capacity allocation decisions based on agent-generated data |
| Lucas (Creative Lead) | Receives task assignments from Coordination Agent; reviews AI-generated visual assets for brand consistency |
| Delivery team members | Receive daily Slack reminders and task assignments; interact with agent output without needing to know it’s agent-generated |
| Clients | Indirect — receive faster status updates and more consistent project delivery; never interact with agents directly |
Systems Involved
| System | Role in Agent Team | Access Method |
|---|---|---|
| CRM (EspoCRM-based) | System of record for all projects, tasks, contacts, and cases | REST API with scoped API keys per agent |
| Kimai (time tracking) | Source of actual hours logged; feeds earned value calculations | API integration |
| Slack | Output channel for all agent summaries, reminders, and flags | Slack API with bot token |
| Claude API | AI reasoning for task generation, estimation, assignment, and reporting | Direct HTTPS calls; Sonnet for Task Agent, Opus for Coordination Agent |
| Google Drive / Canva | Storage for visual assets and brand materials | Connected via existing integrations |
| Node.js agent server | Runtime environment for all agents; handles scheduling, API calls, error recovery | Self-hosted; each agent runs as a standalone service |
Data Requirements
| Data | Source | Freshness Required | Access |
|---|---|---|---|
| Project charters and milestone definitions | CRM | Real-time (API poll) | Task Agent reads at 7:15am daily |
| Open tasks with status, assignee, and estimates | CRM | Real-time | Coordination Agent reads at 7:45am daily |
| Team skills matrix and capacity data | CRM | Updated weekly by Sofia | Coordination Agent reads for assignment logic |
| Hours logged per task | Kimai | Previous day’s entries | Reporting Agent reads at 7:30am daily |
| Task and project descriptions | CRM | Real-time | Garbage Collector reads at 7:00am daily |
| Overdue and blocked task flags | CRM | Real-time | Reminder Agent reads at 7:00am daily |
Success Criteria
| Criterion | Measurement | Target |
|---|---|---|
| Cost reduction | Annual spend on coordination work | From $24K/year to under $8K/year |
| Report timeliness | Daily reports posted to Slack by 8:00am | 95% on-time delivery |
| Task creation accuracy | Percentage of agent-generated tasks Sofia approves without edits | Above 80% within first month |
| Assignment accuracy | Percentage of assignments Sofia does not override | Above 85% within first month |
| Data quality improvement | Percentage of active tasks with complete descriptions and estimates | Increase from baseline within 60 days |
| Team satisfaction | Qualitative feedback from delivery team on task clarity and communication speed | Positive or neutral — no degradation from previous state |
Constraints and Guardrails
Agents operate on fixed schedules only — no event-driven triggers in v1. All agent output posts to Slack before reaching the team. Agents cannot modify existing data unless explicitly designed to do so (Task Agent creates only; Garbage Collector flags only). No agent communicates with clients. Sofia reviews all output daily. API token spend is monitored; any single-day spend exceeding $50 triggers an alert. If an agent enters an infinite loop, exhausts memory, or generates runaway API calls, it is killed and does not restart until Sofia or Jesse investigates.
Autonomy level: Automated (human-on-the-loop). Sofia monitors by exception.
Three ways a human can relate to an agent team’s work:
- Human-in-the-loop. The human approves every action before it executes. Nothing ships without a human sign-off. This is AI-assisted mode — the agent drafts, the human decides.
- Human-on-the-loop. The agents run autonomously on schedule. The human reviews output and intervenes when something is wrong. This is monitoring by exception — you’re not approving every action, you’re catching the ones that go sideways.
- Human-over-the-loop. The human sets policy and strategy. Agents execute within those boundaries. The human reviews aggregate results periodically — weekly reports, monthly trends — not individual outputs.
We started the PM Agent Team at human-on-the-loop. Sofia reviews agent output daily, intervenes when something looks off, and trusts the system to handle the routine correctly. She doesn’t approve every task assignment or every report — she reads the Slack summaries and acts on exceptions.
Human Orchestrator: Sofia Reyes. She built it, she runs it, she’s accountable for it.
Artifact: Design Gate Checklist
| Gate Item | Status | Notes |
|---|---|---|
| Constraint validated with numbers | Yes | Cash flow impact from delayed project delivery + $24K/yr subcontractor cost documented |
| Knowledge Map complete | Yes | Sources mapped across digital and organic; Completeness Test passed |
| Every accountability assigned (human or agent) | Yes | Work Deconstruction table complete; all items assigned |
| Human Orchestrator named | Yes | Sofia Reyes |
| Autonomy level set | Yes | Automated (human-on-the-loop) for all agents |
| Guardrails defined | Yes | Agents run on schedule, post to Slack for review; Sofia checks daily |
| Escalation path documented | Yes | Agents flag conflicts and overdue items; Sofia triages |
| Data access scoped | Yes | CRM API, time tracking system, Slack, code repositories |
Build: What Got Built
The build happened in two phases, and the first one failed.
Phase 1: n8n (September 2025 - February 2026). We started with n8n — a workflow automation tool — running scheduled workflows that fetched API data and fed it to AI for analysis. In the first design session, I was still designing around n8n. I drew the architecture on the whiteboard — a CRM workflow triggering an n8n webhook, an AI agent in n8n with tools for searching tasks and making API calls back to the CRM.
It was brittle. The webhook approach worked for simple triggers but fell apart when we needed agents to chain decisions, maintain context across multiple API calls, and handle the kind of error recovery that production systems demand. We couldn’t get it reliable enough to trust.
Phase 2: Agent Server (late February - March 2026). We abandoned n8n and built a dedicated agent server. Sofia drove the build.
Here’s something worth pausing on: Sofia isn’t a developer. She’s an operations leader who learned how to use Claude Code and how agent teams work — enough to build this system. She had me to help, but she didn’t need much. Once she learned the tools and had some pre-configured skills, her domain expertise let her build the suite better than I could have. She knew the CRM inside and out, understood every edge case in the coordination workflow, and could spec the agents’ behavior from lived experience rather than documentation. The designs and even builds can happen with people who aren’t traditionally technical — if they’re willing to learn the tools and they have deep knowledge of the domain.
The main agent suite consisted of five agents, each running as a standalone Node.js service with built-in cron scheduling:
| Agent | Schedule | What It Does |
|---|---|---|
| Task Agent | 7:15am Mon-Fri | Polls for projects needing tasks, gathers context from the CRM, calls Claude to generate task descriptions with SMART outcomes, point estimates, and dependency ordering. Writes everything back to the CRM via API. |
| Coordination Agent | 7:45am Mon-Fri | Gathers all open tasks, uses Claude to estimate unestimated tasks and assign unassigned ones based on team skills and capacity. Posts flags and summary to Slack. Creates a daily review task for Sofia. |
| Reporting Agent | 7:30am Mon-Fri | Generates daily reports (completed tasks, overdue items, open cases) and weekly reports (project cost variance, schedule variance, team productivity, case responsiveness). Uses earned value management: EV = estimated points x completion %, AC = hours logged from time tracking. |
| Reminder Agent | 7:00am Mon-Fri | Sends personalized Slack DMs to each team member listing overdue and blocked tasks. Flags completed tasks missing point estimates or evidence of completion. No AI needed — purely data-driven. |
| Garbage Collector | 7:00am (via n8n) | Scans for data quality issues — poor task descriptions, incomplete project definitions. |
The architecture is straightforward: native Node.js modules, no heavy dependencies. CRM API auth via API key header. Claude API via direct HTTPS. Each agent has three run modes: HTTP server with cron (default for production), CLI one-shot, and CLI with arguments for testing.
We also built a complementary scheduled agent that operates as a first-class user inside the CRM. It has its own user account, API key, and role. It handles task types like reviews, reports, digests, and emails — picking up assigned tasks, gathering context, calling Claude, and writing structured output back. It can also create follow-up tasks automatically, with traceability lines injected into each description.
By early March, the shift was visible. Sofia had been doing project charters manually — “I used to do the project charters. Before I started, there were no project charters in projects. It was like a brief description and you had to kind of guess what you had to do. So I started putting together project charters, but I was doing them manually. It took me a long time.” I created an agent for it. That was the first time I ever sat down and documented a step-by-step process for turning a human workflow into an agent workflow.
Sofia was cautious about autonomy, and that was the right call: “I’m taking it step by step. I won’t give it full autonomy right now. I will test it for a few months. We’ll see how it goes.”
By mid-March, the agents were connected to the CRM and handling real work. I described the state of play in a team conversation: “Sofia is building a team of agents to help us with project task and capacity management. A lot of the things that Sofia spends time on and that Rachel was spending time on are things we’ve realized — okay, if we chain together AI agents, we can get most of that work done by the AI agents. And so we can spend our time on things that are more valuable, like actually communicating with customers, new products, all that kind of stuff.”
I also laid out the governance model in that same conversation: “The human responsibility at the moment should be that the team lead sends this email, AI reviews, creates tasks, associates to the project. The next person gets it in their queue tomorrow and is able to complete it. In that case, the human in the loop wasn’t even you, it was the team member. Bypassing a lot of that delegation in the first place.” The long-term vision: “Once we feel like the agent system prompt is working the way it’s supposed to, the team is working, the agent team is working the way it’s supposed to, then you look at it less and less and less, and all of that time that we spend on this starts to go away.”
Artifact: Build Spec
| Section | Detail |
|---|---|
| Solution name | PM Agent Team |
| Constraint addressed | $24K/yr subcontractor cost on rule-based coordination work + performance issues + billing delays from slow project delivery |
| Systems connected | CRM (EspoCRM-based) via API, time tracking system, Slack (notifications + DMs), code repositories, Claude API (Sonnet for Task Agent, Opus for Coordination Agent) |
| Tools / platforms | Node.js agent server (five agents), scheduled CRM agent (operates as first-class CRM user), n8n (Garbage Collector only) |
| Agent capabilities | Task generation + estimation, daily work assignment + conflict flagging, daily/weekly variance reporting, personalized reminders, data quality scanning |
| Access / permissions | Each agent has its own CRM user with scoped API key and role-based permissions |
| Guardrails | Agents run on fixed schedules (7:00-7:45am Mon-Fri), post all output to Slack for review, create review tasks for Sofia, excluded users list prevents agents from touching Sofia’s or Jesse’s work |
| Human Orchestrator | Sofia Reyes |
Deliver: Shipping It
Rachel left right before the agent team went live. There was no parallel transition period — no window where Rachel and the agents ran side by side. Rachel was gone, and the agents picked up where she left off.
Only Sofia was trained on supervising the agent team. That’s because she built it. She shared the system with me and documented everything in our knowledge base. There was no broader training needed — the rest of the team interacted with the agents’ output (Slack messages, CRM tasks, daily reports) without needing to know or care that an agent generated them.
The team didn’t notice the difference. What they noticed was that task management got better and faster. Delegation happened automatically. Status updates arrived on time. The work that used to sit waiting for Rachel to get to it just happened.
We tracked the same metrics we’d always tracked: cost variance, time variance, schedule variance. The Reporting Agent automated what Rachel had been doing manually with scorecards — and did it with actual earned value management math instead of self-reported numbers.
The biggest early adjustment was the n8n-to-agent-server migration itself. We tried n8n first. It was brittle. The webhook architecture worked for simple triggers but couldn’t handle the chaining, context management, and error recovery that production agent work demands. The decision to rebuild as a dedicated agent server — each agent as a standalone Node.js service with its own cron, its own system prompt, its own CRM user — was the build-phase pivot that made everything else possible.
The cost comparison tells the story. Agent tooling runs about $500/month — roughly $200 for server hosting and $300 for Claude API tokens. That’s about $6K/year. Rachel’s role cost $24K/year. A 75% cost reduction, and the agents don’t call in sick or deteriorate over time.
Artifact: Delivery Metrics
| Metric | Before (with Rachel) | After (PM Agent Team) | Delta |
|---|---|---|---|
| Coordinator cost | $24K/year (subcontractor) | ~$6K/year (~$500/mo for server + Claude API) | 75% cost reduction |
| Report generation | Manual, self-reported, weekly | Automated daily + weekly, earned value math | From lagging self-reports to daily automated variance tracking |
| Task creation | Manual — Rachel read charters and transcripts | Automated — Task Agent generates from charters with SMART outcomes | From hours of manual work to minutes of agent processing |
| Work assignment | Manual — Rachel/Sofia assigned tasks by hand | Automated — Coordination Agent assigns daily based on skills/capacity | From ad-hoc delegation to systematic daily assignment |
| Team notification | Manual — Rachel/Sofia messaged individuals | Automated — Reminder Agent sends personalized Slack DMs | From inconsistent follow-up to daily personalized reminders |
| Data quality | No systematic review | Automated — Garbage Collector flags poor descriptions | From no QA to continuous automated scanning |
| Communication speed | Delayed — waited for Rachel to relay priorities | Immediate — agents post directly to Slack each morning | Team knows what’s past due and what’s priority before standup |
Compound: What the Next Sprint Inherits
The biggest learning from this Compound Sprint was that we could move even more work to agentic teams than we initially expected. When we started, we assumed things like work assignment and capacity management would stay human for a long time — they felt too nuanced for agents. Sofia proved that wrong. She figured out how to make those automatable sooner than either of us predicted. She’s now focused on exceptions rather than coordination — which is exactly where a Human Orchestrator should be spending her time.
The scope expanded after initial deployment, and it continues to expand as agents become more powerful and Sofia gets better at identifying new signals worth solving. Every week she finds another coordination task that follows a pattern the agents can learn. The boundary between “requires human judgment” and “agents can handle this” keeps moving — not because the agents are getting smarter (though they are), but because Sofia is getting better at decomposing the work.
Work assignment and capacity management were the surprise. We’d marked those as “stays human” in the original Work Deconstruction. Sofia had them automated within weeks. The Coordination Agent turned out to be good enough at matching skills to tasks and flagging capacity conflicts that Sofia could shift to reviewing its decisions rather than making them herself.
The PM Agent Team sprint was one of several sprints that contributed to our headcount reduction from 13 to 8. We lost the coordinator role entirely. Sofia got significant time back — hours each week that had been consumed by coordination and reporting now handled by agents. Communication flowed faster to the team: what was past due, what was priority, what needed attention today. That information used to pass through a human bottleneck. Now it flows directly.
Email triage emerged as the next coordination challenge — how to categorize and route incoming email so agents could handle more of the coordination automatically. That became the next Compound Sprint. That became the next Compound Sprint.
Artifact: Sprint Retrospective
| Category | Detail |
|---|---|
| What worked | The information factory framing — mapping inputs, transformations, and outputs for each piece of work made it clear which pieces were automatable. Sofia building the system herself meant the Human Orchestrator understood it deeply from day one. The Roles/Tasks deconstruction in Source prevented us from designing against a job title instead of the actual work. |
| What didn’t work | n8n as the initial platform. Too brittle for production agent work. The pivot to a dedicated agent server cost time but was necessary. |
| What changes for next sprint | Agent teams can absorb more coordination work than we assumed. Start future sprints with higher ambition for what goes to agents, and let the Design Gate pull it back if the judgment requirements are real. |
| New constraints surfaced | Email triage emerged as the next coordination challenge — categorizing and routing incoming email so agents could handle more coordination automatically. |
| Knowledge captured | Sofia documented the agent system in the knowledge base. Rachel’s organic knowledge (client relationship context, unwritten rules per account) was partially lost — this is the cost of not running Source before the person leaves. |
| HAC update | PM Agent Team row confirmed in Hybrid Accountability Chart. Sofia as Human Orchestrator, five agents handling task creation, coordination, reporting, reminders, and data quality. |