IE11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

California's AI Executive Order Is a Good Start. Here's What Needs to Come Next.

The state has advanced meaningful policy on its use of AI, but Daniel C. Kim explores how it should continue to evolve.

Gov. Gavin Newsom signed Executive Order N-5-26 at the end of March, and it deserves a careful read. It builds on his 2023 order but moves the state from policy design into implementation. It directs deliverables from the Department of General Services (DGS) and the California Department of Technology (CDT) within 120 days, opens vetted generative AI (GenAI) tools to state employees, and lays the groundwork for new vendor standards around bias and civil rights. There is a lot to like here, and a lot that will be hard to execute.

Having spent years inside state government, I know how these directives play out. The 2023 order was the right move at the right time. AI was moving fast, risks were poorly understood, and the state needed guardrails before departments started using these tools in consequential decisions.

But parts of that framework are showing their age. CIOs and procurement officials tell me the current process doesn't appropriately weigh risk. Disclosure requirements get triggered on contracts where AI plays no meaningful role.

Routine low-risk uses sit in the approval queue for months alongside high-stakes proposals.

The framework is also slowing down work that should be happening now. One CIO at a large department recently told me that agentic AI tools could compress code development that takes months into hours. The state should be racing toward this rather than leaving it idling at the curb. We should test drive it as we write the rules of the road.

WHERE THE NEW ORDER GETS IT RIGHT


N-5-26 makes some smart moves. Giving state employees access to vetted GenAI tools is the most overdue piece. Under the current rules, vendors must still disclose GenAI use, complete risk assessments, and wait for CDT before they can proceed. That process needs streamlining too, while still preventing the illegal content, harmful bias and civil rights violations the order wants to address. The state has moved past asking whether AI is in the product. It is rightly asking whether AI can cause harm. But that's where implementation gets complicated, and where the order still has work to do.

NOT ALL AI CARRIES THE SAME RISK


The framework groups very different uses of AI into the same process, a "one size fits all" approach. There is a meaningful difference between AI that influences decisions affecting residents and AI that speeds up internal work. Where a benefits eligibility decision turns on a straightforward calculation or formula, AI can do that work faster and more consistently than a human. Where the decision involves judgment, interpretation or weighing competing factors, we need more human review. Either way, there should still be human supervision, the same way a supervisor reviews the work of staff. The question is the intensity of that review, not whether it exists.

That distinction needs to live somewhere in policy. CDT is the natural place to make those calls, but not in a vacuum. Agency information officers across state government, along with the vendor community, should have a real seat at the table. We talk often about collaborative product development. We should apply that same idea to collaborate policy development.

WHAT THE CERTIFICATION STANDARDS SHOULD ACCOUNT FOR


The order doesn't create a vendor certification framework on its own. It directs DGS and CDT to recommend one within 120 days. I am glad to see the administration revisiting its approach, and now is the right moment to think carefully about what those certifications should try to do.

A certification may ask vendors to attest that their AI does not display harmful bias or violate civil rights. The intent is right, but a certification alone cannot guarantee that outcome. Bias rarely exists as an abstract property. It shows up in different applications and data sets, under conditions nobody fully anticipated. Asking a vendor to certify the absence of bias before deployment is asking them to guarantee how their tool will behave once a department configures it and puts it to use. It's like asking a car manufacturer to promise that drivers won't crash their cars.

Certifications still have value in setting expectations. But they work best paired with something that catches problems where they emerge, in the way a system gets used.

A MORE PRACTICAL APPROACH: UAT AND ONGOING MONITORING


The state already knows how to validate systems before they go live. We call it user acceptance testing (UAT), and it has been a core part of every major IT implementation I have been involved with. Instead of relying on a vendor's pre-deployment certification, the state should require joint testing in a controlled environment that mirrors how the system will operate. Test with representative data and build measurable baselines into the contract.

Ironically, agentic AI makes this oversight easier. Building test personas and edge cases used to require weeks of manual work from business analysts, and we never had enough test scripts. With AI tools, analysts can generate a much richer set of scenarios, then refine and approve them before launch. The work shifts from production to supervision, exactly where experienced state staff add the most value. Humans serve as the architects, engineers and inspectors. AI hammers the nails and lays the pipe.

For departments that lack the capacity to lead this testing, the state could consider an independent adviser along the lines of the criteria architect model used in state construction. It should not be mandatory, but available for departments that want it.

UAT should be the primary check, but not the only one. The order is silent on what happens after a system goes live, and that is the gap I would most urge the state to close. The state needs a systematic way to conduct ongoing AI audits, not a one-time exercise but a routine part of operating any AI-enabled system. A risk-scoring tool might be audited quarterly against a fresh sample of cases to confirm its outputs still align with policy. Systems need revalidation as policies evolve, populations shift and data changes.

AUDIT LOGS AND VENDOR TRANSPARENCY


Auditability is the third piece. A well-designed audit log lets the state understand how a decision was reached, which inputs mattered and what would have changed the outcome. This is one area where AI can outperform humans, but only if someone reviews the logs. The state should define when and how that review happens — by a complaint, a statistical anomaly, periodic sampling or a formal audit.

Vendor cooperation will be essential, but this will likely generate some pushback. Most large state IT systems are configurable COTS or MOTS platforms. The business rules, prompts, training approaches and configuration belong to the state or should be transparent to it. Beyond legitimately proprietary intellectual property, the state owns the work performed on its dime, and it should not accept vendor lock-in as the cost of doing business.

USING THE 120 DAYS WELL


The 120-day window is a real opportunity to sharpen this. The challenge now is an implementation framework that focuses oversight where real risk exists, frees up lower risk uses and builds the ongoing monitoring that makes the rest credible. California has a chance to set a meaningful standard here, not just for itself but for other states watching.
Daniel C. Kim is director of procurement for the Weideman Group. His 25+ years of experience in state and local government includes serving as director of California’s Department of General Services under two governors, in executive positions at three counties, and as president of the National Association of State Chief Administrators.