News

How AI Projects Beget Data Work in State Government

At the recent California Government Innovation Summit, representatives of several agencies discussed their AI projects — and the prep work they needed to do on the data front before they could achieve their goals.

September 04, 2025 •

Rows of data points forming 3D waves in bright blue. Dark blue background.

If a government agency has ambitions to work with artificial intelligence, there’s a good chance there’s other IT work they’ll need to do first — specifically, getting its data house in order.

That was a repeated theme of discussions centering around AI at last week’s California Government Innovation Summit* in Sacramento. One panel discussion centered around the project, spearheaded by the Office of Data and Innovation (ODI), to run proof of concept projects for generative AI use cases at different state departments.

Two of those projects were at the Department of Transportation (Caltrans). When asked what investment he would prioritize for AI readiness, Inder Preet Singh of Caltrans said: “Data, data, data.”

The Caltrans work involved making use of the massive amounts of data the department receives daily from, among other things, thousands of sensors on roads across the state.

“We have a lot of disparate data sets,” said Singh, deputy division chief of transformational mobility at Caltrans, during the panel. “Something is in one organization, within one division, another division — to bring that all and layer it in one area, show it on a GIS map, I think that opened a lot of stuff for a lot of people. Even that by itself, without even prompting, was a huge success, for people to know ‘Oh OK, all of this is available? I did not know that, because I’ve never used it.’”

That kind of data work was not only necessary to enable the kinds of AI use cases Caltrans had in mind, it’s also the kind of data framework that can be useful for other projects.

A similar thing happened with the proof of concept project at the Department of Public Health (CDPH). There, the department was seeking ways to streamline the process of health-care facility inspection — some 900 state surveyors regularly enter those facilities and note deficiencies that need correcting. Sometimes those notes are handwritten, and they need to be compared with two manuals to become properly reported.

For all that manual effort, CDPH sought to use AI to speed things up, and it did so with success. But along the way, the department needed to do data prep work to reach its goals. That included using optical character recognition technology for the handwritten notes — technology that can have broad applications when introduced in an organization.

Efrain Cornejo, chief of the Business Operations Branch within CDPH’s Center for Health Care Quality, said the team would have benefited from doing more discovery work surrounding its data before beginning the project. That might have made the sensitivities of that health-care data — and what they needed to do to protect it — more apparent earlier in the process.

“The nature of our data, we were very transparent, it’s in the solicitation, what it is. We’re working with investigative notes, everybody knew that, it wasn’t a secret to anybody at all in this process,” Cornejo said. “But everyone felt surprised or acted surprised when we got to a point of actually using the data in the solution. Had we spent more time in discovery, we probably could have come up with a better way to do this. But hindsight’s always 20/20. So, I would say more time in dialog with the vendors before getting in and working with the architectural diagram would’ve been a huge benefit.”

Another example emerged from a separate panel focused on an AI project at the Department of Social Services (CDSS). The department was looking to address a substantial theft problem in programs such as CalWORKS. Through methods such as installing card skimmers on ATMs, thieves have been able to steal billions from the programs, according to Monica Bobra, a principal data scientist with ODI.

To combat the problem, CDSS actually turned to third-party data — it obtained a data set of more than 300 million transactions across a three-year period, and built a machine learning model to identify which transactions were illegitimate.

The department had never had such thorough transaction data before. It not only enabled CDSS to report on thefts within 72 hours — as opposed to two months later — it also gave the department much better options for intervening before theft occurred. As a result, theft has declined about 80 percent, said Konrad Franco, a principal researcher at CDSS.

As said by Darren Pulsipher, chief solutions architect at Intel, who sat on the panel: “That’s a story about data management, not a story about the model.”

*The Government Innovation Summit was hosted by Government Technology, Industry Insider — California's sister publication. Both are part of e.Republic.

Tags:

Ben Miller

Ben Miller is the associate editor of data and business for Government Technology.

See More Stories by Ben Miller

IE11 Not Supported

How AI Projects Beget Data Work in State Government

At the recent California Government Innovation Summit, representatives of several agencies discussed their AI projects — and the prep work they needed to do on the data front before they could achieve their goals.

Tags: