Why we put MCP at the center of FinButler
A short, candid memo on the decision to make agents and the Model Context Protocol first-class — and what changed in the product the day we stopped treating them as a feature.
By Erick Agrazal · Founder
We almost shipped a “sparkles” icon last quarter.
The PR was queued. A modal with a little starburst, a Suggest categories button, the usual. It worked — sort of. The model picked a category, the user accepted or rejected it, the modal closed, the dashboard moved on. The whole thing felt like a tab we’d added to a product that didn’t really want it there.
The thing that bothered me wasn’t the design. It was that on the same call that morning, two of us had been using Claude to triage a client’s transactions for a tax filing. Claude could read the spreadsheet. Claude could not change anything in FinButler. So we’d open the assistant, paste numbers, get suggestions, switch tabs, edit the transaction, save, switch back, paste the next row. The “AI feature” we were about to ship had nothing to do with how we, the people building this, were actually using AI.
So we pulled the PR.
This post is about what we did instead, why MCP is now the spine of FinButler, and what changed in the product the day we stopped treating agents as a feature.
The honest brief
The brief we wrote for ourselves after pulling that PR was small, but it bound everything else:
A person and an assistant should be able to do the same task in FinButler, under the same rules, without anyone copying data into a chat window.
That sentence does more work than it looks like. Three constraints fall out of it:
- Same data, same rules. Whatever an assistant can see has to go through the same permission model a teammate goes through. No backdoor service account, no “AI mode” that quietly elevates scope.
- Same surface area. Whatever an assistant can do should be a thing the product itself does. If the agent can categorise, a human can categorise the same way. If a human can invite a client, the agent can too — with the same approvals.
- No data exfiltration. Numbers should not be copy-pasted into an LLM context as the price of using an LLM. We can’t fix every workflow on the internet, but we can make ours not require it.
Those three lines mean you can’t bolt an agent onto a product. You have to bend the product around the agent. We didn’t love that, because it meant rewriting parts we’d shipped. But it’s the right call, and I’ll explain why with the part that took us the longest to figure out.
Workspaces had to come first
FinButler already had workspaces — team, firm, client. They existed because real finance work has multiple parties: a small business owner doesn’t need the same view as their bookkeeper, who doesn’t need the same view as their accountant, who definitely doesn’t need the same view as the bank.
The mistake we’d been making was treating the workspace boundary as a UI concern. You picked your workspace at the top of the page, and the product filtered. That worked when humans were doing the filtering with their eyes. It was a disaster the first time we asked an agent, “summarize this week.” Whose week? At what scope? With what visible-vs-hidden data?
So we did the boring, slow work first. The workspace boundary became a policy fabric — a single source of truth that every read, every write, every report, every export, and every agent call goes through. The UI sits on top of it. The agents sit on top of it. The MCP server sits on top of it. The dashboard, the accountants app, and the mobile app are three lenses on the same fabric.
If you take one architectural idea from this post, take that one: the permission system is the product, not a wrapper on top of it.
Why MCP, not a custom GPT plugin
Once the workspace fabric was real, the question was how an assistant should talk to it. We considered three options:
- A custom GPT in OpenAI.
- A Claude desktop extension.
- An MCP server we host.
We seriously considered shipping all three. We weren’t purists about MCP; we wanted what worked. What pushed us decisively to MCP was a meeting where someone asked, “If a customer’s firm standardises on Claude, do we have to maintain a separate plugin from the ChatGPT one? And what happens when Cursor or another tool shows up that we want to support?”
Maintaining N plugins for the same workspace was a future tax we couldn’t afford. MCP let us write the integration once. Claude, ChatGPT (through MCP-aware connectors), Cursor, Continue, and anything else MCP-friendly all point at the same hosted endpoint and see the same list_transactions, categorize_pending, create_budget_review, invite_client, export_report. The protocol is dull. That’s why it’s good.
The other reason was emotional, and I’ll own it. We didn’t want to be a “custom GPT company” the way some products are “Slack-app companies” — a brand riding on someone else’s platform, hostage to their roadmap. MCP feels more like USB-C than a vendor lock-in. We can support new clients as they show up without rebuilding our integration story.
Held-for-review is not a feature, it’s the default
The other thing that changed once we stopped bolting agents on: held-for-review stopped being an “advanced setting” and became the default behaviour.
When a person opens FinButler and clicks Suggest categories, the agent runs and proposes. Nothing is applied. You see the changes pending and you decide.
When an assistant calls categorize_pending through MCP, the agent runs and proposes. Nothing is applied. The assistant tells you what it suggested. You decide.
These are the same code path. There is no “automation mode” that quietly applies things; there’s a per-workspace threshold setting that says, “for transactions under $50, auto-apply categorisation that matches my last 90 days of decisions.” The user wrote that rule. The agent doesn’t override it. The MCP client doesn’t override it. Even a future “fully autonomous” agent would have to obey it.
This is the part our early users keep telling us is the difference. Not the model. Not the chat. The fact that approval rules belong to the user, not the platform.
What we cut
If this all sounds tidy, the trade-off was that we cut a lot. We removed a half-built “AI insights” page that lived next to the dashboard, because it duplicated state. We dropped a plan-tier called “AI” because it implied that AI was a feature; it isn’t. We rewrote two onboarding flows because they introduced data into the system that didn’t fit the workspace fabric.
We are also still slower than I want to be at adding banks. Putting workspaces and MCP at the centre took weeks we hadn’t budgeted. The right answer to that is: ok, it took weeks, and now everything we ship afterwards inherits the structure for free. Slowness up front, speed afterwards.
What you should expect from us
If you’re looking at FinButler and trying to decide whether this is the right place to put your numbers, here is what I want you to be able to count on:
- Your data stays in the workspace. Agents and assistants query it; they don’t take it home.
- Approvals are yours, by default. If you turn “auto” on, you wrote the rule.
- The MCP endpoint is hosted. You don’t maintain servers, but you can audit every action.
- New assistant clients will keep showing up, and we’ll support them through the same MCP.
If we ever ship a sparkles modal that ignores any of that, please email me. I owe you a coffee.
— Erick