Jump to Content
Data Analytics

Discover datasets to enrich your analytics and AI initiatives

May 20, 2021
Michael Hamamoto Tribble

Head of Datasets for Google Cloud

With the mission of accelerating data-powered innovation for our customers, Google Cloud has always put data first. Recognizing that various organizations within Google have robust catalogs of data available for public or commercial use, we’re delighted to introduce a more unified view of those programs– Google Cloud datasets solutions. Building upon the trends we're seeing across businesses of every size, our datasets solutions highlight the importance of high-value, curated data assets in strengthening and accelerating decision-making.

https://storage.googleapis.com/gweb-cloudblog-publish/images/Dataset_categories_for_blog.max-1200x1200.png

Building upon the success of our existing Public Datasets Program, we’ve expanded the aperture to include commercial datasets, synthetic datasets, and first-party Google data assets that can be used to increase the value of analytics and AI initiatives. Since its launch in 2016, the Google Cloud Public Datasets Program has provided a catalog of curated public data assets in optimized formats on BigQuery and Cloud Storage in partnership with a number of data providers including the National Oceanic and Atmospheric Administration (NOAA), National Institutes of Health (NIH), and the United States Census Bureau. Their data supports the analytics workloads of many industries; for example, NOAA's severe storm event details public dataset can be JOIN'd to a retailer's private inventory dataset to better understand the impact severe weather has on sales. Another example is how property insurers can use weather data insights to inform policy pricing.  These are but two of hundreds of examples of what’s possible when cross-pollinating data from previously orthogonal domains.

In adding commercial, synthetic, and first-party data to the program, we hope to further enhance our customers' ability to unearth unique insights through data analytics and artificial intelligence. What's more, datasets made available through the catalogs from Earth Engine and Kaggle are available to those who wish to discover and take advantage of them.

To support our customers, we are also announcing an open source reference architecture for dataset onboarding so that even those customers who currently lack their private datasets on Google Cloud can begin their analytics journey. Learn more about this work and how you can utilize the same architecture for your data onboarding on our Developers & Practitioners blog

With time, our goal is to grow each corpus of data across these various vectors to increase utility for our customers. We view it as imperative to expand our program to include more than simply public data. As we grow our program with new datasets and solutions, we'll continue to post regular updates on our datasets solution page, so be sure to check it out.

Posted in