Managing high risk data, whether Protected Health Information or social security numbers, is difficult for academic researchers across many domains. Each institution has its own guidelines to safeguard different kinds of datasets, and governmental agencies and funding organizations have their own regulations and compliance requirements. To address these challenges, Stanford Research Computing Center (SRCC) teamed up with Stanford’s School of Medicine and Google Cloud to fund, design, and launch Carina, a customizable high-risk data platform for Stanford researchers. Powered by Google Anthos and Kubernetes, Carina aims to reduce lead time for project setup through a scalable yet compliant compute environment that meets the different needs of each research project. “The privacy as well as the security of the data are paramount. That means we need to architect technological solutions that are tighter in many ways,” says Ruth Marinshaw, SRCC’s CTO for Research Computing. “Our goal was to make reproducible science easier on our platforms. Carina fills the need for a secure on-premise compute environment for high-risk data.” Started in 2021 and rolled out to beta users in 2022, the platform is now ready for Stanford’s research community to access on demand.
SRCC advances research at Stanford by offering and supporting traditional high-performance computing (HPC) systems, as well as systems for high throughput and data-intensive computing, platforms for working with high-risk data, and data storage at scale. “But it's not just about the hardware,” says Nan McKenna, SRCC’s Senior Director of Research Computing. “Team members also help researchers transition their analyses and models from the desktop to more capable and plentiful resources, providing the opportunity to explore their data and answer research questions (on-premise or in the cloud) at a scale typically not possible on desktops or departmental servers.” The group partners with other campus organizations to offer training and learning opportunities around high-end computing tools and technologies. In addition, SRCC provides consultation to help researchers find the best solution for the kinds of computing and analytics they want to do.
Cutting workflows from one day to one hour
Stanford has had a longstanding relationship with Google, so when SRCC began working on their own platform for high risk data it made sense to start on Google Cloud. “There's a good community of support for Kubernetes, and that seemed to meet the needs for what we were trying to do,” says Addis O’Connor, Director, Research Computing Systems at SRCC. “Researchers come to us with a variety of requests for packages or workflows they need to run. We would like to make it as easy as possible for them to get up and running.” Google Anthos allows for simple and consistent administration and management across various Kubernetes compute clusters, regardless of their location. “Leveraging tooling from Google allows us to automate and streamline the way we deploy all these different containers,” says O’Connor. “That frees up resources and staff for other things. Having cluster infrastructure and deployment as code within source repositories helps to easily identify problems and audit changes in real time,” adds Neal Soderquist, Research Services Manager with SRCC.
In an initial pilot with internal beta testers, SRCC was able to deploy bare metal and cloud clusters successfully while adhering to Kubernetes CIS Benchmarks. They also added two primary tools–JupyterHub and Slurm–to meet researchers’ needs. Now, Carina is running on-premise high-risk data for over 100 Stanford researchers conducting research ranging from natural language processing of legal texts to analyzing COVID outcomes for the School of Medicine. O’Connor estimates that workflows that used to take a day and a half to analyze on a faculty laptop now take about an hour on Carina.
The SRCC team expects to continue iterating on Carina to streamline workflows as the tools and technologies evolve and mature. They are already in conversations with other peer institutions to share knowledge for greater collaboration in secure settings. O’Connor believes they reached their goal: “we've organized the platform in a unique and secure way that gives researchers a lot of flexibility and compute power to make discoveries and potentially change patient outcomes or improve understanding in their fields.”
To find out how you can get started with generative AI for higher education, sign up for an interactive half-day workshop with Google Cloud and partners Nuvalence and Carahsoft. Participants will work with experts in small groups to design a gen AI strategy package customized for their needs. To learn more about funding opportunities, check out the eligibility for cloud training and academic research credits.
Managing high risk data, whether Protected Health Information or social security numbers, is difficult for academic researchers across many domains. Each institution has its own guidelines to safeguard different kinds of datasets, and governmental agencies and funding organizations have their own regulations and compliance requirements. To address these challenges, Stanford Research Computing Center (SRCC) teamed up with Stanford’s School of Medicine and Google Cloud to fund, design, and launch Carina, a customizable high-risk data platform for Stanford researchers. Powered by Google Anthos and Kubernetes, Carina aims to reduce lead time for project setup through a scalable yet compliant compute environment that meets the different needs of each research project. “The privacy as well as the security of the data are paramount. That means we need to architect technological solutions that are tighter in many ways,” says Ruth Marinshaw, SRCC’s CTO for Research Computing. “Our goal was to make reproducible science easier on our platforms. Carina fills the need for a secure on-premise compute environment for high-risk data.” Started in 2021 and rolled out to beta users in 2022, the platform is now ready for Stanford’s research community to access on demand.
SRCC advances research at Stanford by offering and supporting traditional high-performance computing (HPC) systems, as well as systems for high throughput and data-intensive computing, platforms for working with high-risk data, and data storage at scale. “But it's not just about the hardware,” says Nan McKenna, SRCC’s Senior Director of Research Computing. “Team members also help researchers transition their analyses and models from the desktop to more capable and plentiful resources, providing the opportunity to explore their data and answer research questions (on-premise or in the cloud) at a scale typically not possible on desktops or departmental servers.” The group partners with other campus organizations to offer training and learning opportunities around high-end computing tools and technologies. In addition, SRCC provides consultation to help researchers find the best solution for the kinds of computing and analytics they want to do.
Cutting workflows from one day to one hour
Stanford has had a longstanding relationship with Google, so when SRCC began working on their own platform for high risk data it made sense to start on Google Cloud. “There's a good community of support for Kubernetes, and that seemed to meet the needs for what we were trying to do,” says Addis O’Connor, Director, Research Computing Systems at SRCC. “Researchers come to us with a variety of requests for packages or workflows they need to run. We would like to make it as easy as possible for them to get up and running.” Google Anthos allows for simple and consistent administration and management across various Kubernetes compute clusters, regardless of their location. “Leveraging tooling from Google allows us to automate and streamline the way we deploy all these different containers,” says O’Connor. “That frees up resources and staff for other things. Having cluster infrastructure and deployment as code within source repositories helps to easily identify problems and audit changes in real time,” adds Neal Soderquist, Research Services Manager with SRCC.
In an initial pilot with internal beta testers, SRCC was able to deploy bare metal and cloud clusters successfully while adhering to Kubernetes CIS Benchmarks. They also added two primary tools–JupyterHub and Slurm–to meet researchers’ needs. Now, Carina is running on-premise high-risk data for over 100 Stanford researchers conducting research ranging from natural language processing of legal texts to analyzing COVID outcomes for the School of Medicine. O’Connor estimates that workflows that used to take a day and a half to analyze on a faculty laptop now take about an hour on Carina.
The SRCC team expects to continue iterating on Carina to streamline workflows as the tools and technologies evolve and mature. They are already in conversations with other peer institutions to share knowledge for greater collaboration in secure settings. O’Connor believes they reached their goal: “we've organized the platform in a unique and secure way that gives researchers a lot of flexibility and compute power to make discoveries and potentially change patient outcomes or improve understanding in their fields.”
To find out how you can get started with generative AI for higher education, sign up for an interactive half-day workshop with Google Cloud and partners Nuvalence and Carahsoft. Participants will work with experts in small groups to design a gen AI strategy package customized for their needs. To learn more about funding opportunities, check out the eligibility for cloud training and academic research credits.