GCP – Expanding BigQuery geospatial capabilities with Earth Engine raster analytics
At Google Cloud Next 25, we announced a major step forward in geospatial analytics: Earth Engine in BigQuery. This new capability unlocks Earth Engine raster analytics directly in BigQuery, making advanced analysis of geospatial datasets derived from satellite imagery accessible to the SQL community.
Before we get into the details of this new capability and how it can power your use cases, it’s helpful to distinguish between two types of geospatial data and where Earth Engine and BigQuery have historically excelled:
-
Raster data: This type of data represents geographic information as a grid of cells, or pixels, where each pixel stores a value that represents a specific attribute such as elevation, temperature, or land cover. Satellite imagery is a prime example of raster data. Earth Engine excels at storing and processing raster data, enabling complex image analysis and manipulation.
-
Vector data: This type of data represents geographic features such as points, lines, or polygons. Vector data is ideal for representing discrete objects like buildings, roads, or administrative boundaries. BigQuery is highly efficient at storing and querying vector data, making it well-suited for large-scale geographic analysis.
Earth Engine and BigQuery are both powerful platforms in their own right. By combining their geospatial capabilities, we are bringing the best of both raster and vector analytics to one place. That’s why we created Earth Engine in BigQuery, an extension to BigQuery’s current geospatial capabilities that will broaden access to raster analytics and make it easier than ever before to answer a wide range of real-world enterprise problems.
Earth Engine in BigQuery: Key features
You can use the two key features of Earth Engine in BigQuery to perform raster analytics in BigQuery:
-
A new function in BigQuery: Run
ST_RegionStats()
, a new BigQuery geography function that lets you efficiently extract statistics from raster data within specified geographic boundaries. -
New Earth Engine datasets in BigQuery Sharing (formerly Analytics Hub): Access a growing collection of Earth Engine datasets in BigQuery Sharing (formerly Analytics Hub), simplifying data discovery and access. Many of these datasets are analysis-ready, immediately usable for deriving statistics for an area of interest, and providing valuable information such as elevation, emissions, or risk prediction.
Five easy steps to raster analytics
The new ST_RegionStats()
function is similar to Earth Engine’s reduceRegion function, which allows you to compute statistics for one or more regions of an image. The ST_RegionStats()
function is a new addition to BigQuery’s set of geography functions invoked as part of any BigQuery SQL expression. It takes an area of interest (e.g., a county, parcel of land, or zip code) indicated by a geography and an Earth Engine-accessible raster image and computes a set of aggregate values for the pixels that intersect with the specified geography. Examples of aggregate statistics for an area of interest would be maximum flood depth or average methane emissions for a certain county.
These are the five steps to developing meaningful insights for an area of interest:
-
Identify a BigQuery table with vector data: This could be data representing administrative boundaries (e.g., counties, states), customer locations, or any other geographic areas of interest. You can pull a dataset from BigQuery public datasets or use your own based on your needs.
-
Identify a raster dataset: You can discover Earth Engine raster datasets in BigQuery Sharing, or you can use raster data stored as a Cloud GeoTiff or Earth Engine image asset. This can be any raster dataset that contains the information you want to analyze within the vector boundaries.
-
Use ST_RegionStats() to bring raster data into BigQuery: The
ST_RegionStats()
geography function takes the raster data (raster_id
), vector geometries (geography
), and optional band (band_name
) as inputs and calculates aggregate values (e.g., mean, min, max, sum, count) on the intersecting raster data and vector feature. -
Analyze the results: You can use the output of running
ST_RegionStats()
to analyze the relationship between the raster data and the vector features, generating valuable insights about an area of interest. -
Visualize the results: Geospatial analysis is usually most impactful when visualized on a map. Tools like BigQuery Geo Viz allow you to easily create interactive maps that display your analysis results, making it easier to understand spatial patterns and communicate findings.
Toward data-driven decision making
The availability of Earth Engine in BigQuery opens up new possibilities for scaled data-driven decision-making across various geospatial and sustainability use cases, by enabling raster analytics on datasets that were previously unavailable in BigQuery. These datasets can be used with the new ST_RegionStats()
geography function for a variety of use cases, such as calculating different land cover types within specific administrative boundaries or analyzing the average elevation suitability within proposed development areas. You can also find sample queries for these datasets in BigQuery Sharing’s individual dataset pages. For example, if you navigate to the GRIDMET CONUS Drought Indices dataset page, you can find a sample query for calculating mean Palmer Drought Severity Index (PDSI) for each county in California, used to monitor drought conditions across the United States.
Let’s take a deeper look at some of the use cases that this new capability unlocks:
1. Climate, physical risk, and disaster response
Raster data can provide critical insights on weather patterns and natural disaster monitoring. Many of the raster datasets available in BigQuery Sharing provide derived data on flood mapping, wildfire risk assessment, drought conditions, and more. These insights can be used for disaster risk and response, urban planning, infrastructure development, transportation management, and more. For example, you could use the Wildfire Risk to Communities dataset for predictive analytics, allowing you to assess wildfire hazard risk, exposure of communities, and vulnerability factors, so you can develop effective resilience strategies. For flood mapping, you could use the Global River Flood Hazard dataset to understand regions in the US that have the highest predicted inundation depth, or water height above ground surface.
2. Sustainable sourcing and agriculture
Raster data also provides insights on land cover and land use over time. Several of the new Earth Engine datasets in BigQuery include derived data on terrain, elevation, and land-cover classification, which are critical inputs for supply chain management and assessing agriculture and food security. For businesses that operate in global markets, sustainable sourcing requires bringing transparency and visibility to supply chains, particularly as regulatory requirements are shifting commitments to deforestation-free commodity production from being voluntary to mandatory. With the new Forest Data Partnership maps for cocoa, palm and rubber, you can analyze where commodities are grown over time, and add in the Forest Persistence or the JRC Global Forest Cover datasets to understand if those commodities are being grown in areas that had not been deforested or degraded before 2020. With a simple SQL query, you could, for instance, determine the estimated fraction of Indonesia’s land area that had undisturbed forest in 2020.
3. Methane emissions monitoring
Reducing methane emissions from the oil and gas industry is crucial to slow the rate of climate change. The MethaneSAT L4 Area Sources dataset, which can be used as an Earth Engine Image asset with the ST_RegionStats()
function, provides insights into small, dispersed area emissions of methane from various sources. This type of diffuse but widespread emissions can make up the majority of methane emissions in an oil and gas basin. You can analyze the location, magnitude, and trends of these emissions to identify hotspots, inform mitigation efforts, and understand how emissions are characterized across large areas, such as basins.
4. Custom use cases
In addition to these datasets, you can bring your own raster datasets via Cloud Storage GeoTiffs or Earth Engine image assets, to support other use cases, while still benefiting from BigQuery’s scalability and analytical tools.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e4f833ed370>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Bringing it all together with an example
Let’s take a look at a more advanced example based on modeled wildfire risk and AI-driven weather forecasting technology. The SQL below uses the Wildfire Risk to Communities dataset listed in BigQuery Sharing, which is designed to help communities understand and mitigate their exposure to wildfire. The data contains bands that index the likelihood and consequence of wildfire across the landscape. Using geometries from a public dataset of census-designated places, you can compute values from this dataset using ST_RegionStats()
to compare communities’ relative risk exposures. You can also combine weather data from WeatherNext Graph forecasts to see how imminent fire weather is predicted to affect those communities.
To start, head to the BigQuery Sharing console, click “Search listings”, filter to “Climate and environment,” select the “Wildfire Risk to Community” dataset (or search for the dataset in the search bar), and click “Subscribe” to add the Wildfire Risk dataset to your BigQuery project. Then search for “WeatherNext Graph” and subscribe to the WeatherNext Graph dataset.
With these subscriptions in place, run a query to combine these datasets across many communities with a single query. You can break this task into subqueries using the SQL WITH
statement for clarity:
First, select the input tables that you subscribed to in the previous step.
Second, compute the weather forecast using WeatherNext Graph forecast data for a specific date and for the places of interest. The result is the average and maximum wind speeds within each community.
Third, use the ST_RegionStats()
function to sample the Wildfire Risk to Community raster data for each community. Since we are only concerned with computing mean values within regions, you can set the scale to 1 kilometer in the function options in order to use lower-resolution overviews and thus reduce compute time. To compute at the full resolution of the raster (in this case, 30 meters), you can leave this option out.
The result is a table containing the mean values of wildfire risk for both bands within each community and wind speeds projected over the course of a day. In addition, you can combine the computed values for wildfire risk, wildfire consequence, and maximum wind speed into a single composite index to show relative wildfire exposure for a selected day in Colorado.
WITH
-- Step 1: Select inputs from datasets that we've subscribed to
wildfire_raster AS (
SELECT
id
FROM
`wildfire_risk_to_community_v0_mosaic.fire`
),
places AS (
SELECT
place_id,
place_name,
place_geom AS geo,
FROM
`bigquery-public-data.geo_us_census_places.places_colorado`
),
-- Step 2: Compute the weather forecast using WeatherNext Graph forecast data
weather_forecast AS (
SELECT
ANY_VALUE(place_name) AS place_name,
ANY_VALUE(geo) AS geo,
AVG(SQRT(POW(t2.`10m_u_component_of_wind`, 2)
+ POW(t2.`10m_v_component_of_wind`, 2))) AS average_wind_speed,
MAX(SQRT(POW(t2.`10m_u_component_of_wind`, 2)
+ POW(t2.`10m_v_component_of_wind`, 2))) AS maximum_wind_speed
FROM
`weathernext_graph_forecasts.59572747_4_0` AS t1,
t1.forecast AS t2
JOIN
places
ON
ST_INTERSECTS(t1.geography_polygon, geo)
WHERE
t1.init_time = TIMESTAMP('2025-04-28 00:00:00 UTC')
AND t2.hours < 24
GROUP BY
place_id
),
-- Step 3: Combine with wildfire risk for each community
wildfire_risk AS (
SELECT
geo,
place_name,
ST_REGIONSTATS( -- Wildfire likelihood
geo, -- Place geometry
(SELECT id FROM wildfire_raster), -- Raster ID
'RPS', -- Band name (Risk to Potential Structures)
OPTIONS => JSON '{"scale": 1000}' -- Computation resolution in meters
).mean AS wildfire_likelihood,
ST_REGIONSTATS( -- Wildfire consequence
geo, -- Place geometry
(SELECT id FROM wildfire_raster), -- Raster ID
'CRPS', -- Band name (Conditional Risk to Potential Structures)
OPTIONS => JSON '{"scale": 1000}' -- Computation resolution in meters
).mean AS wildfire_consequence,
weather_forecast.* EXCEPT (geo, place_name)
FROM
weather_forecast
)
-- Step 4: Compute a composite index of relative wildfire risk.
SELECT
*,
PERCENT_RANK() OVER (ORDER BY wildfire_likelihood)
* PERCENT_RANK() OVER (ORDER BY wildfire_consequence)
* PERCENT_RANK() OVER (ORDER BY average_wind_speed)
AS relative_risk
FROM
wildfire_risk
Mean values of wildfire risk and wind speeds for each community
You can save this output in Google Sheets to visualize how wildfire risk and consequences are related among communities statewide.
Google sheet visualizing relationship between wildfire risk (x-axis) and wildfire consequence (y-axis) colored by wind speed
Alternatively, you can visualize relative wildfire risk exposure in BigQuery GeoViz with the single composite index to show relative wildfire exposure for a selected day in Colorado.
GeoViz map showing a composite index combining values for wildfire risk, wildfire consequence, and maximum wind speed for each community
What’s next for Earth Engine in BigQuery?
Earth Engine in BigQuery marks a significant advancement in geospatial analytics, and we’re excited to further expand raster analytics in BigQuery, making sustainability decision-making easier than ever before. Learn more about this new capability in the BigQuery documentation for working with raster data, and stay tuned for new Earth Engine capabilities in BigQuery in the near future!
Read More for the details.