IE11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Data Collaborative: How California can become the world’s open data capital

Governments in California and beyond use data sharing to address housing challenges, meet water efficiency goals and support other community efforts.

snowflake-logo-140rgb.jpg
Governor Newsom has set an ambitious public technology agenda that has seen a groundswell of support from civic minded technologists. This post will highlight successful examples of how local governments have come together to share data and improve how public problems like our state’s pronounced housing crisis or California’s unique water issues are addressed.

These examples share common features and highlight opportunities to build on California’s strengths attracting top tier talent, diverse local communities and global connections to cross pollinate successful practices. In general:

  • Meaningfully opening up government data requires more than simply posting machine readable datasets on the web.
  • Successful examples of standardized, regularly curated open data builds from a collaborative community of practice and sustainable resources for the work.
  • The state can play a strategic role in supporting and coordinating domain specific data collaboratives to tackle common challenges like housing affordability.
Case Study: Why open data is critical for California’s housing crisis

Our current housing crisis offers an illustrative example. New homes require natural resources like water and thus increased efficiency in both existing and new housing stock to meet California’s newly legislated water efficiency goals.California has a gap of 3.5 million homes by 2025. Yet even the most basic data on what land is zoned for which type of land uses, critical information to build more homes, is incredibly opaque and hard to come by.

See here for a blog post detailing the errors and issues with utilizing publicly available parcel level land use data for calculating residential water budgets across our state. This reality is particularly pronounced when attempting to link in additional supplemental information like transit access to zoning data. Here is a legislative analysis of a historic housing proposal that was only able to be done in LA County (dbl check) due to data availability.

In 2013, in Sierra Club vs. Orange County California’s Supreme Court ruled that “GIS-formatted database[s]… are public records that, unless otherwise exempt, must be produced upon request at the actual cost of duplication.” While that ruling is the law of the land, in practice users must navigate a byzantine series of strange bureaucratic hoops and outdated technologies to acquire the data they need.

That means that Californians struggle to find meaningful information about where they can build what type of housing and other structures. It’s radically common sense for current or potential property owners to be able to access such information yet vastly different vendor systems and the lack of consistent data standards make the underlying zoning data difficult to navigate.

Linking that parcel level zoning data with contextual information like customer level water demand or available social services requires a great deal of finesse and collaboration across local governments — and increasingly private entities — that stewards such data. Those integrations are critical for answering basic questions about how to support sustainable new housing development or find ways to build housing so it doesn’t increase traffic or creating the right mix of remedies to address the underlying conditions creating homelessness.

Why public data collaboratives are the future of open government data

“Open data” has served as a useful rallying cry and point of unity in the civic technology community to put government data on the web and in a machine readable format. Many efforts have called for increased standardization in government data. The biggest public data wins are still primarily the often cited weather and transit information.

Increasingly more attention in the civic minded technology community is focusing on the more subtle opportunity to streamline the sharing and increase the openness of data that can’t be made entirely accessible to anyone. The Data Spectrum Diagram below from the Open Data Institute, a British Think Tank, provides a succinct summary of the various types of data sharing available.

Many valuable public datasets — like social service records or customer water usage — contain personally identifiable information that emphatically should not be open and accessible to anyone. A canonical example of closed government data creating public value is the New York City Fire Inspection optimization. By bringing together data across siloed city departments and using the full operational intelligence of the City’s data, the New York Mayor’s Office of Data and Analytics was able to achieve transformative improvements in inspection efficacy.

Many municipalities however lack the size and scale of a New York City. A growing number of global examples highlight models show how governments are collaborating across agencies and with other sectors to achieve improvements in public service delivery. New York University’s (“NYU”) GovLab defines a “Data Collaborative” as the public and private sectors exchanging data to create public value.

The Gov Lab catalogs over 150 examples. Noteworthy international examples include:

  • The United Kingdom launched the Administrative Data Research Network to provide a comprehensive repository of the country’s data for social science research purposes. The initiative also includes handy explainer videos highlighting the value of this data sharing for the general public.
  • New Zealand Integrated Data Research Network provides a similar service of creating a proverbial “one stop shop” for academic researchers to access sensitive government data for important social science research. The goal is similar to improve the quality of public policy for the benefit of the country’s people.
America’s federal system makes those sorts of unified, nation-scale initiatives more difficult. Several federal government data sharing initiatives have been launched, including a high level precision medicine initiative led by DJ Patil during the Obama administration. That federal leadership has been complimented by local actions including many regional healthcare data sharing collaboratives as well as other municipal data sharing efforts.

  • The NYU Center for Urban Science and Progress Data Facility which stewards data from the City of New York for social science research.
  • Western Pennsylvania Regional Data Center which provides open data services, including education and training, across that region.
California has several local government and academic partnerships that can be built upon and developed as the state works to achieve its ambitious public technology goals.

  • Los Angeles Data Science Federation streamlines how data is shared across the city’s departments with academic universities across the region.
  • California Policy Lab which connects academic experts at the University of California Los Angeles and Berkeley with policy experts.
  • The California Water Data Collaborative which automates the standardization of customer level water usage along with key contextual information.
These examples showcase the importance of deeper connection across sectors and collaboration to unlock the potential of open government data. In 2013, then President Obama enacted an executive order “making open and machine readable the new default” in federal government data. That strategy of simply data on the web saw mixed success.

The biggest users of open government data tend to be staff that previously were siloed from other parts of the government bureaucracies. Better open data technologies developed to share data with the public may not have resulted in most citizens spending their leisure time poring through obscure zoning ordinances, yet those tools have made the work of analysts that work with and inside government much easier.

A radically common sense approach to unlock the potential of open government data

Like an iceberg, much of the work to meaningfully open up government data lies beneath the surface. Quality technology to ensure appropriate levels of secure data sharing and access is necessary but far from sufficient. Data by itself does nothing. Putting a whiz bang tool in front of a decision maker similarly does not inevitably lead to impact. Human interpretation and analysis must also play a role. The “iceberg of open data” diagram below shows at an abstract level the roles generally necessary for a data collaborative effort to achieve impact.

Data users are the analysts, statisticians, app developers, and others who actually work with the data to generate meaningful insights. Those users ideally come from a diverse community of practice with a healthy mix of organizations and sectors — such as the business community, local news outlets, government staff and the larger social sector. That enables dialogue and deliberation about what the data means for key policy, management and operational decisions.

Data stewards curate data and work to make sure the information available adheres to agreed upon standards and is up to date. Often data stewards have an administrative or operational role at the organization that is responsible for generating the dataset in question. There can also be additional roles for stewarding data that is standardized across a consortium of many municipalities.

Those data users and stewards work in the trenches though must also be in dialogue with the actual decision makers who can take the results of their analysis to create public impact. It’s important for those groups to proceed from a place of humility and remember that data is only an imperfect map of reality. Human history is littered with examples that highlight the importance of that lesson.

Civic minded technologists should advocate for that more thoughtful, collaborative approach in tackling domain specific challenges like California’s housing crisis. Those types of public problems takes sustainable funding and a multi-year commitment to meaningfully address — not just volunteers working nights and weekends or philanthropically funded fellows parachuting in. Governor Newsom’s visionary commitment of financial resources offers a new hope to realize the long standing potential of open government data.

California’s opportunity to pioneer the future of open government data

California has the opportunity to build on its early successes and make strategic data sharing the new normal in government operations. A housing data collaborative could help develop standardized parcel level zoning information. A related data collaborative could focus on streamlining data collected from social services provided to homeless and at risk populations. California can build a leading global model for such collaborations.

Many other areas of government operations could benefit from data collaboratives to enable more efficient, effective and imaginative public service delivery. For example, the City of Los Angeles has led the development of a path breaking mobility data specification for dockless scooters. Many other innovations are coming down the pipeline and cities must also grapple with maintaining legacy infrastructure like potholes.

Those data collaboratives will necessarily be interrelated and overlap as new housing development for example creates new demands on urban mobility systems. Working in the open by default and clear lines of authority can enable coordinated action even with many disparate operating teams. California’s water industry has a long tradition of deploying such polycentric governance schemes to successfully manage Southern California groundwater.

Here California’s age old challenge of hyper-fragmented local governmentscan be abstracted away as increased data sharing enables coordinated operations and service delivery.

The Unique Civic Data Opportunity in Southern California

A century and a half ago most municipalities simply spent money until they ran out for the year. Formal budgeting with agreed upon professional standards was once a radical innovation. NYU’s Gov Lab estimates that only $1 out of every $100 spent by American taxpayers is backed by rigorous evidence. By rallying around not just open and machine readable but a truly collaborative community of practice we can continue to close that evidence gap.

The beautiful thing about digital technology is that progress made in California can be replicated elsewhere across the globe, at a different pace and scale than legacy government processes.

How to get involved

Please share this article to build momentum for better open government data and champion the need for thoughtful, deliberative data collaboration in California and beyond. If you’re interested in learning more about future California Public Technology Roundtable events, please complete this form.





Snowflake is the only data warehouse built for the cloud, enabling the data-driven enterprise with instant elasticity, secure data sharing and per-second pricing, across multiple clouds. Snowflake combines the power of data warehousing, the flexibility of big data platforms and the elasticity of the cloud at a fraction of the cost of traditional solutions.