From Pilots to Production: California's Operational Playbook for Governed AI
AI is already in the building. Not because every agency has formally adopted a frontier model, but because employees are bringing it into their workflows and vendors are embedding it into their products. The question facing California's public sector is no longer should we try AI? It's how do we move the experiments we're already running safely into production?
That was the throughline of our recent fireside chat, where Darwin AI's Chief AI Officer Dustin Haisler and Linxar Senior Managing Director Bharat Bagaria sat down to talk through what it actually takes to scale governed AI in California. Here's the playbook that emerged.
California set the narrative. Now comes the hard part.
California moved early. Executive Order N-12-23 put a stake in the ground on safe, responsible public-sector AI, directing procurement reforms and introducing risk inventories and ethical guardrails. The statewide GenAI policy and the "Choose Your Own GenAI Journey" framework followed, and the legislative pipeline hasn't slowed since—roughly 30 AI-related bills have crossed over to a second chamber, touching everything from state operations down to local implementation.
Layer on the federal picture, where Washington has signaled it wants a say in which models even reach the public, and you get what Dustin called a patchwork: overlapping rules, shifting deadlines, and real uncertainty about whose legislation ultimately sticks.
His advice on that uncertainty is blunt: it is not a reason to do nothing. Existing rules already apply to AI today—acceptable use policies, data security standards, open-records laws. "This is not a license to say, let's let the courts figure it out and we'll do something afterwards," he noted. The agencies pulling ahead are the ones building on the foundation they already have, in a way that can adapt as the rules change.
Why pilots stall
Bharat sees the same three bottlenecks again and again in the field, and they're rarely about the technology itself.
First, visibility. Pilots are happening in pockets—different vendors, different departments, different IT corners—and nobody has a single picture of what's running, what's working, and what's quietly failing.
Second, procurement. California already attaches a GenAI supplement to contracts, but teams sign it without fully understanding what it commits them to, and some are afraid to check the box even when they are using GenAI. The result is a gap between what procurement expects and what vendors actually deliver, and pilots stall in the middle of it.
Third, data readiness. A pilot runs beautifully on a clean, small dataset—then breaks the moment real production data flows in. As Bharat put it, garbage in, garbage out: if the underlying data isn't clean, the output won't be either, and hallucinations stop being theoretical.
What governance actually looks like
Say governance and eyes glaze over. But Dustin argues it starts somewhere concrete: you can't govern what you can't see. Before any policy, agencies need an honest, real-time view of where AI actually lives in their environment—not what a survey says, but what employees and vendors are genuinely doing.
His analogy stuck with the room: deploying AI today is like handing every government employee a McLaren F1 and putting them on an open road with no speed-limit signs. Good governance doesn't confiscate the car. It puts up clear, easy-to-understand boundaries—guardrails the average employee doesn't have to look up in a manual.
In practice, that means meeting people where the work happens. Darwin demonstrated an employee pasting sensitive text into a free consumer chatbot to draft a reply. In the moment, the PII was redacted, the unsanctioned tool was flagged for IT, and the employee was educated—without driving the behavior onto a personal device. Over-restrict, and you simply push AI use into the shadows.
As for structure, there's no single right model—embedded governance, a center of excellence, a chief AI officer, or, increasingly at the state level, a hybrid of all three. The future is adaptive and contextual to each organization's culture.
A use case, end to end
To make it real, Bharat walked through social-services eligibility. Today a caseworker hops across several legacy systems to verify citizenship, income, and household details, plus fraud checks—work that can stretch to days per case.
The path to production follows a repeatable arc: intake (define the use case, secure a sponsor, name the business outcome—here, clearing the eligibility backlog), data (bring the sources together and assess the risk in aging, disconnected systems), procurement (nobody writes models from scratch anymore—find vendors with the right ones), pilot (run a small set of cases and check accuracy), then production with a crawl-walk-run rollout.
At one client, that approach cut average case processing from roughly four hours to three—about 20% savings, with a human still in the loop. Across 200,000 cases, the ROI landed at three to four times the investment.
Dustin's caution: don't do AI for AI's sake. Ground every use case in a business problem, give it a clear owner (too many owners and it stalls in groupthink), and remember that multi-model is the future—Copilot, ChatGPT, Claude, and vendor-embedded AI will coexist, which makes a common data layer essential to avoid new silos.
The costs nobody budgets for
Here's the number that surprises people: the software is only 10–15% of the total. Bharat's breakdown puts data cleanup and migration at 30–40% (some California efforts spent two years just getting data ready), transformation, governance, and IV&V setup at another 20–25%, and then the line items that get quietly ignored—change management, which has to start *before* the POCs, and ongoing model monitoring, which is heavier than traditional application monitoring.
Start small, start now
Small agency without a dedicated AI team? You may be better positioned than the giants. You can be nimble—you just need a clear owner, a process, and guardrails, not a chief AI officer. Get scrappy: partner with local colleges hunting for real-world test cases, or band together with a neighboring city or county.
The first move on Monday morning is the same for everyone: get visibility into the AI already in your environment, apply the rules you already have, and bring your staff along with honest change management. Inventory what's happening, then pick one use case and run it well.
---
Darwin AI and Linxar are partnering to help California agencies turn governance from a PDF into operational reality. If you'd like the follow-up resources from this session—a use case prioritization scorecard, a cost and ROI model, or a first 100-day plan—reach out and connect with us on LinkedIn.