How Code Execution Enables Accurate AI Accounting

Philip Andersson

At Bluebook, we are building AI that understands accounting deeply enough to execute real financial workflows, not just describe them as checklists. Whether it is summing a spreadsheet, calculating KPIs from a P&L, or generating long amortization schedules, precision is non-negotiable. Every number must be right.

As we have scaled our use of AI to automate financial operations, one truth has become clear: accuracy depends as much on how an AI uses context as it does on what it knows. This is where code execution has become a game changer.

From Reasoning to Execution

Anthropic’s recent article on Code Execution with MCP captured something we have felt firsthand at Bluebook. Two patterns in large-scale agent workflows repeatedly create inefficiency and inaccuracy:

Tool definitions overloading the context window
Intermediate tool results consuming unnecessary tokens

These issues are not just about higher cost or latency. They degrade task accuracy by overloading the model with irrelevant or redundant information. In accounting, that can mean a miscalculated KPI, an incomplete journal entry, or a broken reconciliation.

The solution Anthropic describes; code execution with MCP, allows agents to use context more intelligently. Instead of reasoning through every step in natural language, the agent can load tools on demand, filter data before it reaches the model, and execute complex logic in a single step. It is also far more efficient than looping between multiple tool calls and “sleep” commands through the agent cycle.

This shift changes everything about how we think of AI for accounting.

Why Bluebook Is Leaning into Code Execution

At Bluebook, we are designing an AI system that sits on top of existing accounting and finance tools like QuickBooks Online, NetSuite, and Xero. Our goal is to automate key workflows that are still heavily manual today, things like prepaid schedules, accruals, FP&A variance analyses, and spreadsheet-based reconciliations.

These workflows require both reasoning and calculation. Traditional LLM-based agents can understand instructions, but they struggle to maintain accuracy when asked to perform step-by-step computations. By combining reasoning with on-demand code execution, our system can reason about accounting logic and then execute the actual computation with deterministic precision.

For example:

When generating a prepaid expense schedule, the model can identify relevant transactions in QuickBooks, determine the appropriate amortization period, and then generate code to allocate the expense month by month.
In an FP&A scenario, it can run SQL queries against structured data to calculate variances or margins, limiting both the number of rows and columns before analysis to stay context-efficient.
For accruals, the agent can automatically identify unbilled expenses, compute the adjusting entries, and post summaries back into the accounting system. All through executable code.

Managing the Dimensions of Data

Financial data is both wide and deep. A P&L may have hundreds of accounts and dozens of metrics. Pulling all that data into a model context is wasteful. Code execution lets us manage not only the volume but also the dimensions of data before the model sees it.

Using SQL and GraphQL, we can query only the rows and columns relevant to a given workflow. For example, when calculating SaaS KPIs, the agent can request only the “Revenue” and “Operating Expense” accounts from the general ledger instead of the entire chart of accounts. When deeper manipulation is needed, we use Pandas for in-memory transformations; aggregating, pivoting, or cleaning data without ever leaving the execution sandbox.

The impact is smaller and more precise context, lower cost, faster execution, and more consistent results. The model can focus on reasoning instead of data wrangling.

Privacy-Preserving Operations

In finance, privacy is not optional. Traditional AI agent architectures can unintentionally expose sensitive data because intermediate results often pass through the model context. Code execution eliminates this risk.

With this approach, intermediate results remain inside the execution environment by default. The agent only sees what is explicitly returned. That means sensitive data such as individual transactions, payroll details, or customer identifiers, can flow through the workflow without ever entering the model’s context.

This design enables privacy-preserving operations by default. At Bluebook, we see this as a foundation for trust. Accountants and finance teams can safely delegate tasks like reconciliations, allocations, or report generation to the AI while ensuring confidential data stays contained.

Type Safety and Predictable Behavior

Allowing an AI to execute code raises an obvious question: how do we keep it predictable and safe?

Our answer is TypeScript + Zod. Together, they provide type safety and runtime validation for every interaction between the model and the execution environment. When the model writes code, we validate its structure, enforce constraints, and verify outputs before execution. This ensures every generated calculation, whether it is a deferred revenue schedule or a variance report, behaves deterministically.

Type safety gives our system the same kind of reliability controls that accountants rely on in traditional workflows: reconciliation, auditability, and repeatability.

Efficiency and Governance at Scale

Efficiency and governance usually pull in opposite directions. In accounting automation, you often get one at the cost of the other. Code execution changes that balance.

By handling data transformations outside the model, we minimize both token usage and exposure risk. Every piece of information that enters the model has already been filtered, validated, and reduced to what is strictly necessary. This makes the system not only faster but also more auditable.

Each code execution can be logged, versioned, and re-run which is a key feature for financial systems that require traceability. When the AI generates a journal entry or a forecast, we can trace exactly how it got there, what data it used, and what logic it applied. That level of reproducibility is rare in AI systems and essential in finance.

Real Examples in Action

Here are a few examples of how this approach works inside Bluebook:

Automating accruals:
The AI identifies vendor invoices that have not yet been received in QuickBooks. It calculates accruals based on prior patterns, generates adjusting entries, and posts them automatically without exposing any underlying invoice data to the model.

Prepaids and amortization:
It extracts prepaid transactions, determines the recognition period, and executes code that builds an amortization schedule in Pandas. The final entries are summarized and pushed back to the accounting system.

FP&A and reporting:
By connecting to data sources like NetSuite or Google Sheets, the AI can calculate budget-to-actuals, margins, or headcount ratios. Using SQL and GraphQL queries, it limits both the dataset size and dimensions before performing analysis. The model then explains the results in plain English.

Each of these workflows demonstrates how AI code execution can turn accounting logic into something executable, auditable, and secure.

A New Era for Accounting AI

We are only beginning to see what is possible when AI agents can write and execute code safely. The ability to sit on top of existing systems like QuickBooks Online and NetSuite and to orchestrate tasks across spreadsheets, CRMs, and databases which opens the door to a new generation of AI for accounting.

This is not about replacing accountants. It is about giving them a system that understands accounting deeply enough to automate the repetitive tasks, reconciling, allocating, forecasting etc, while keeping them under control.

Case Study

Nov 20, 2025

How a Biotech Startup Runs Accounting on Bluebook

After adopting Bluebook, biotech software companies consistently see measurable ROI. In practice, Bluebook automates up to 80 percent of recurring accounting work, reducing individual controller workload by at least 30 hours per month. Biotech startups also report 30 percent faster month-end close cycles and tenfold improvements in journal entry accuracy, driven by AI-generated accruals, prepaids, fixed asset schedules, and policy-aligned coding.

Testimonials

Mar 25, 2025

Baker Tilly Member Firms Streamline Accounting Workflows with Bluebook

Bluebook’s generative AI is a game-changer in accounting – intuitive, efficient, and secure. It frees our team from routine tasks, allowing us to focus on strategic advice and high-impact decisions.