AWS CodePipeline now enables you to use AWS Secrets Manager credentials in your Commands actions by specifying the secrets as environment variables in the action declaration. Additionally, Commands actions now support Windows commands and larger instance types, allowing you to run more complex workloads and accelerate execution times. To learn more about these new capabilities, visit our documentation.
For more information about AWS CodePipeline, visit our product page. This feature is available in all regions where AWS CodePipeline is supported.
Today, AWS announces significant enhancements to Amazon Q Developer in Amazon SageMaker AI Jupyter Lab, introducing customization of code suggestions based on private code repositories and the ability to include entire workspace context for improved code assistance. These new features empower organizations to leverage their proprietary code and improve the relevance of code suggestions, ultimately enhancing developer productivity and code quality within Jupyter Lab environments.
With the new customization feature, Amazon Q Developer can now assist with software development in ways that conform to your team’s internal libraries, proprietary algorithmic techniques, and enterprise code style. An Amazon Q Developer customization is a set of elements that enables Amazon Q to provide suggestions based on your company’s code-base. This ensures that code suggestions, both inline and chat based, align perfectly with your organization’s specific coding practices and standards.
Additionally, the workspace context enables Amazon Q Developer to locate files, understand how code is used across multiple files, and generate code that leverages multiple files, including those that aren’t currently opened. This contextual awareness results in more accurate and relevant code assistance, helping developers better understand their entire project structure before they start coding. Users can access the workspace features through the chat interface, ensuring a seamless development experience that takes into account the full scope of their project.
These enhancements to Amazon Q Developer in Amazon SageMaker AI Jupyter Lab are now available in All Regions where Amazon SageMaker AI is offered.
To learn more about these new features see documentation.
BigQuery delivers optimized search/lookup query performance by efficiently pruning irrelevant files. However, in some cases, additional column information is required for search indexes to further optimize query performance. To help, we recently announced indexing with column granularity, which lets BigQuery pinpoint relevant data within columns, for faster search queries and lower costs.
BigQuery arranges table data into one or more physical files, each holding N rows. This data is stored in a columnar format, meaning each column has its own dedicated file block. You can learn more about this in the BigQuery Storage Internals blog. The default search index is at the file level, which means it maintains mappings from a data token to all the files containing it. Thus, at query time, the search index helps reduce the search space by only scanning those relevant files. This file-level indexing approach excels when search tokens are selective, appearing in only a few files. However, scenarios arise where search tokens are selective within specific columns but common across others, causing these tokens to appear in most files, and thus diminishing the effectiveness of file-level indexes.
For example, imagine a scenario where we have a collection of technical articles stored in a simplified table named TechArticles with two columns — Title and Content. And let’s assume that the data is distributed across four files, as shown below.
Our goal is to search for articles specifically related to Google Cloud Logging. Note that:
The tokens “google”, “cloud”, and “logging” appear in every file.
Those three tokens also appear in the “Title” column, but only in the first file.
Therefore, the combination of the three tokens is common overall, but highly selective in the “Title” column.
Now, let’s say, we create a search index on both columns of the table with the following DDL statement:
CREATE SEARCH INDEX myIndex ON myDataset.TechArticles(Title, Content);
The search index stores the mapping of data tokens to the data files containing the tokens, without any column information; the index looks like the following (showing the three tokens of interest: “google”, “cloud”, and “logging”):
With the usual query SELECT * FROM TechArticles WHERE SEARCH(Title, "Google Cloud Logging"), using the index without column information, BigQuery ends up scanning all four files, adding unnecessary processing and latency to your query.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e4f83a65a00>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Indexing with column granularity
Indexing with column granularity, a new public preview feature in BigQuery, addresses this challenge by adding column information in the indexes. This lets BigQuery leverage the indexes to pinpoint relevant data within columns, even when the search tokens are prevalent across the table’s files.
Let’s go back to the above example. Now we can create the index with COLUMN granularity as follows:
CREATE SEARCH INDEX myIndex ON myDataset.TechArticles(Title, Content)
OPTIONS (default_index_column_granularity = 'COLUMN');
The index now stores the column information associated with each data token. The index is as follows:
Using the same query SELECT * FROM TechArticles WHERE SEARCH(Title, "Google Cloud Logging") as above but using the index with column information, BigQuery now only needs to scan file1 since the index lookup is the intersection of the following:
Files where Token=’google’ AND Column=’Title’ (file1)
Files where Token=’cloud’ AND Column=’Title’ (file1, file2, file3, and file4)
Files where Token’=’logging’ AND Column=’Title’ (file1).
Performance improvement benchmark results
We benchmarked query performance on a 1TB table containing Google Cloud Logging data of an internal Google test project with the following query:
SELECT COUNT(*)
FROM `dataset.log_1T`
WHERE SEARCH((logName, trace, labels, metadata), 'appengine');
In this benchmark query, the token ‘appengine’ appears infrequently in the columns used for query filtering, but is more common in other columns. The default search index already helped reduce a large portion of the search space, resulting in half the execution time, reducing processed bytes and slot usage. By employing column granularity indexing, the improvements are even more significant.
In short, column-granularity indexing in BigQuery offers the following benefits:
Enhanced query performance: By precisely identifying relevant data within columns, column-granularity indexing significantly accelerates query execution, especially for queries with selective search tokens within specific columns.
Improved cost efficiency: Index pruning results in reduced bytes processed and/or slot time, translating to improved cost efficiency.
This is particularly valuable in scenarios where search tokens are selective within specific columns but common across others, or where queries frequently filter or aggregate data based on specific columns.
Best practices and getting started
Indexing with column granularity represents a significant advancement in BigQuery’s indexing capabilities, letting you achieve greater query performance and cost efficiency.
For best results, consider the following best practices:
Identify high-impact columns: Analyze your query patterns to identify columns that are frequently used in filters or aggregations and would benefit from column-granularity indexing.
Monitor performance: Continuously monitor query performance and adjust your indexing strategy as needed.
Consider indexing and storage costs: While column-granularity indexing can optimize query performance, be mindful of potential increases in indexing and storage costs.
At Google Cloud Next 25, we announced a major step forward in geospatial analytics: Earth Engine in BigQuery. This new capability unlocks Earth Engine raster analytics directly in BigQuery, making advanced analysis of geospatial datasets derived from satellite imagery accessible to the SQL community.
Before we get into the details of this new capability and how it can power your use cases, it’s helpful to distinguish between two types of geospatial data and where Earth Engine and BigQuery have historically excelled:
Raster data: This type of data represents geographic information as a grid of cells, or pixels, where each pixel stores a value that represents a specific attribute such as elevation, temperature, or land cover. Satellite imagery is a prime example of raster data. Earth Engine excels at storing and processing raster data, enabling complex image analysis and manipulation.
Vector data: This type of data represents geographic features such as points, lines, or polygons. Vector data is ideal for representing discrete objects like buildings, roads, or administrative boundaries. BigQuery is highly efficient at storing and querying vector data, making it well-suited for large-scale geographic analysis.
Earth Engine and BigQuery are both powerful platforms in their own right. By combining their geospatial capabilities, we are bringing the best of both raster and vector analytics to one place. That’s why we created Earth Engine in BigQuery, an extension to BigQuery’s current geospatial capabilities that will broaden access to raster analytics and make it easier than ever before to answer a wide range of real-world enterprise problems.
Earth Engine in BigQuery: Key features
You can use the two key features of Earth Engine in BigQuery to perform raster analytics in BigQuery:
A new function in BigQuery: Run ST_RegionStats(), a new BigQuery geography function that lets you efficiently extract statistics from raster data within specified geographic boundaries.
New Earth Engine datasets in BigQuery Sharing (formerly Analytics Hub): Access a growing collection of Earth Engine datasets in BigQuery Sharing (formerly Analytics Hub), simplifying data discovery and access. Many of these datasets are analysis-ready, immediately usable for deriving statistics for an area of interest, and providing valuable information such as elevation, emissions, or risk prediction.
Five easy steps to raster analytics
The new ST_RegionStats() function is similar to Earth Engine’s reduceRegion function, which allows you to compute statistics for one or more regions of an image. The ST_RegionStats() function is a new addition to BigQuery’s set of geography functions invoked as part of any BigQuery SQL expression. It takes an area of interest (e.g., a county, parcel of land, or zip code) indicated by a geography and an Earth Engine-accessible raster image and computes a set of aggregate values for the pixels that intersect with the specified geography. Examples of aggregate statistics for an area of interest would be maximum flood depth or average methane emissions for a certain county.
These are the five steps to developing meaningful insights for an area of interest:
Identify a BigQuery table with vector data: This could be data representing administrative boundaries (e.g., counties, states), customer locations, or any other geographic areas of interest. You can pull a dataset from BigQuery public datasets or use your own based on your needs.
Identify a raster dataset: You can discover Earth Engine raster datasets in BigQuery Sharing, or you can use raster data stored as a Cloud GeoTiff or Earth Engine image asset. This can be any raster dataset that contains the information you want to analyze within the vector boundaries.
Use ST_RegionStats() to bring raster data into BigQuery: The ST_RegionStats() geography function takes the raster data (raster_id), vector geometries (geography), and optional band (band_name) as inputs and calculates aggregate values (e.g., mean, min, max, sum, count) on the intersecting raster data and vector feature.
Analyze the results: You can use the output of running ST_RegionStats() to analyze the relationship between the raster data and the vector features, generating valuable insights about an area of interest.
Visualize the results: Geospatial analysis is usually most impactful when visualized on a map. Tools like BigQuery Geo Viz allow you to easily create interactive maps that display your analysis results, making it easier to understand spatial patterns and communicate findings.
Toward data-driven decision making
The availability of Earth Engine in BigQuery opens up new possibilities for scaled data-driven decision-making across various geospatial and sustainability use cases, by enabling raster analytics on datasets that were previously unavailable in BigQuery. These datasets can be used with the new ST_RegionStats() geography function for a variety of use cases, such as calculating different land cover types within specific administrative boundaries or analyzing the average elevation suitability within proposed development areas. You can also find sample queries for these datasets in BigQuery Sharing’s individual dataset pages. For example, if you navigate to the GRIDMET CONUS Drought Indices dataset page, you can find a sample query for calculating mean Palmer Drought Severity Index (PDSI) for each county in California, used to monitor drought conditions across the United States.
Let’s take a deeper look at some of the use cases that this new capability unlocks:
1. Climate, physical risk, and disaster response
Raster data can provide critical insights on weather patterns and natural disaster monitoring. Many of the raster datasets available in BigQuery Sharing provide derived data on flood mapping, wildfire risk assessment, drought conditions, and more. These insights can be used for disaster risk and response, urban planning, infrastructure development, transportation management, and more. For example, you could use the Wildfire Risk to Communities dataset for predictive analytics, allowing you to assess wildfire hazard risk, exposure of communities, and vulnerability factors, so you can develop effective resilience strategies. For flood mapping, you could use the Global River Flood Hazard dataset to understand regions in the US that have the highest predicted inundation depth, or water height above ground surface.
2. Sustainable sourcing and agriculture
Raster data also provides insights on land cover and land use over time. Several of the new Earth Engine datasets in BigQuery include derived data on terrain, elevation, and land-cover classification, which are critical inputs for supply chain management and assessing agriculture and food security. For businesses that operate in global markets, sustainable sourcing requires bringing transparency and visibility to supply chains, particularly as regulatory requirements are shifting commitments to deforestation-free commodity production from being voluntary to mandatory. With the new Forest Data Partnership maps for cocoa, palm and rubber, you can analyze where commodities are grown over time, and add in the Forest Persistence or the JRC Global Forest Cover datasets to understand if those commodities are being grown in areas that had not been deforested or degraded before 2020. With a simple SQL query, you could, for instance, determine the estimated fraction of Indonesia’s land area that had undisturbed forest in 2020.
3. Methane emissions monitoring
Reducing methane emissions from the oil and gas industry is crucial to slow the rate of climate change. The MethaneSAT L4 Area Sources dataset, which can be used as an Earth Engine Image asset with the ST_RegionStats() function, provides insights into small, dispersed area emissions of methane from various sources. This type of diffuse but widespread emissions can make up the majority of methane emissions in an oil and gas basin. You can analyze the location, magnitude, and trends of these emissions to identify hotspots, inform mitigation efforts, and understand how emissions are characterized across large areas, such as basins.
4. Custom use cases
In addition to these datasets, you can bring your own raster datasets via Cloud Storage GeoTiffs or Earth Engine image assets, to support other use cases, while still benefiting from BigQuery’s scalability and analytical tools.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3e4f833ed370>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
Bringing it all together with an example
Let’s take a look at a more advanced example based on modeled wildfire risk and AI-driven weather forecasting technology. The SQL below uses the Wildfire Risk to Communities dataset listed in BigQuery Sharing, which is designed to help communities understand and mitigate their exposure to wildfire. The data contains bands that index the likelihood and consequence of wildfire across the landscape. Using geometries from a public dataset of census-designated places, you can compute values from this dataset using ST_RegionStats() to compare communities’ relative risk exposures. You can also combine weather data from WeatherNext Graph forecasts to see how imminent fire weather is predicted to affect those communities.
To start, head to the BigQuery Sharing console, click “Search listings”, filter to “Climate and environment,” select the “Wildfire Risk to Community” dataset (or search for the dataset in the search bar), and click “Subscribe” to add the Wildfire Risk dataset to your BigQuery project. Then search for “WeatherNext Graph” and subscribe to the WeatherNext Graph dataset.
1
2
3
With these subscriptions in place, run a query to combine these datasets across many communities with a single query. You can break this task into subqueries using the SQL WITH statement for clarity:
First, select the input tables that you subscribed to in the previous step.
Second, compute the weather forecast using WeatherNext Graph forecast data for a specific date and for the places of interest. The result is the average and maximum wind speeds within each community.
Third, use the ST_RegionStats() function to sample the Wildfire Risk to Community raster data for each community. Since we are only concerned with computing mean values within regions, you can set the scale to 1 kilometer in the function options in order to use lower-resolution overviews and thus reduce compute time. To compute at the full resolution of the raster (in this case, 30 meters), you can leave this option out.
The result is a table containing the mean values of wildfire risk for both bands within each community and wind speeds projected over the course of a day. In addition, you can combine the computed values for wildfire risk, wildfire consequence, and maximum wind speed into a single composite index to show relative wildfire exposure for a selected day in Colorado.
WITH
-- Step 1: Select inputs from datasets that we've subscribed to
wildfire_raster AS (
SELECT
id
FROM
`wildfire_risk_to_community_v0_mosaic.fire`
),
places AS (
SELECT
place_id,
place_name,
place_geom AS geo,
FROM
`bigquery-public-data.geo_us_census_places.places_colorado`
),
-- Step 2: Compute the weather forecast using WeatherNext Graph forecast data
weather_forecast AS (
SELECT
ANY_VALUE(place_name) AS place_name,
ANY_VALUE(geo) AS geo,
AVG(SQRT(POW(t2.`10m_u_component_of_wind`, 2)
+ POW(t2.`10m_v_component_of_wind`, 2))) AS average_wind_speed,
MAX(SQRT(POW(t2.`10m_u_component_of_wind`, 2)
+ POW(t2.`10m_v_component_of_wind`, 2))) AS maximum_wind_speed
FROM
`weathernext_graph_forecasts.59572747_4_0` AS t1,
t1.forecast AS t2
JOIN
places
ON
ST_INTERSECTS(t1.geography_polygon, geo)
WHERE
t1.init_time = TIMESTAMP('2025-04-28 00:00:00 UTC')
AND t2.hours < 24
GROUP BY
place_id
),
-- Step 3: Combine with wildfire risk for each community
wildfire_risk AS (
SELECT
geo,
place_name,
ST_REGIONSTATS( -- Wildfire likelihood
geo, -- Place geometry
(SELECT id FROM wildfire_raster), -- Raster ID
'RPS', -- Band name (Risk to Potential Structures)
OPTIONS => JSON '{"scale": 1000}' -- Computation resolution in meters
).mean AS wildfire_likelihood,
ST_REGIONSTATS( -- Wildfire consequence
geo, -- Place geometry
(SELECT id FROM wildfire_raster), -- Raster ID
'CRPS', -- Band name (Conditional Risk to Potential Structures)
OPTIONS => JSON '{"scale": 1000}' -- Computation resolution in meters
).mean AS wildfire_consequence,
weather_forecast.* EXCEPT (geo, place_name)
FROM
weather_forecast
)
-- Step 4: Compute a composite index of relative wildfire risk.
SELECT
*,
PERCENT_RANK() OVER (ORDER BY wildfire_likelihood)
* PERCENT_RANK() OVER (ORDER BY wildfire_consequence)
* PERCENT_RANK() OVER (ORDER BY average_wind_speed)
AS relative_risk
FROM
wildfire_risk
Mean values of wildfire risk and wind speeds for each community
You can save this output in Google Sheets to visualize how wildfire risk and consequences are related among communities statewide.
Google sheet visualizing relationship between wildfire risk (x-axis) and wildfire consequence (y-axis) colored by wind speed
Alternatively, you can visualize relative wildfire risk exposure in BigQuery GeoViz with the single composite index to show relative wildfire exposure for a selected day in Colorado.
GeoViz map showing a composite index combining values for wildfire risk, wildfire consequence, and maximum wind speed for each community
What’s next for Earth Engine in BigQuery?
Earth Engine in BigQuery marks a significant advancement in geospatial analytics, and we’re excited to further expand raster analytics in BigQuery, making sustainability decision-making easier than ever before. Learn more about this new capability in the BigQuery documentation for working with raster data, and stay tuned for new Earth Engine capabilities in BigQuery in the near future!
Today, we are announcing the availability of Route 53 Resolver Query Logging in the Asia Pacific (Thailand) and Mexico (Central) Regions, enabling you to log DNS queries that originate in your Amazon Virtual Private Clouds (Amazon VPCs). With query logging enabled, you can see which domain names have been queried, the AWS resources from which the queries originated – including source IP and instance ID – and the responses that were received.
Route 53 Resolver is the Amazon DNS server that is available by default in all Amazon VPCs. Route 53 Resolver responds to DNS queries from AWS resources within a VPC for public DNS records, Amazon VPC-specific DNS names, and Amazon Route 53 private hosted zones. With Route 53 Resolver Query Logging, customers can log DNS queries and responses for queries originating from within their VPCs, whether those queries are answered locally by Route 53 Resolver, or are resolved over the public internet, or are forwarded to on-premises DNS servers via Resolver Endpoints. You can share your query logging configurations across multiple accounts using AWS Resource Access Manager (RAM). You can also choose to send your query logs to Amazon S3, Amazon CloudWatch Logs, or Amazon Kinesis Data Firehose.
There is no additional charge to use Route 53 Resolver Query Logging, although you may incur usage charges from Amazon S3, Amazon CloudWatch, or Amazon Kinesis Data Firehose. To learn more about Route 53 Resolver Query Logging or to get started, visit the Route 53 product page or the Route 53 documentation.
Starting today, the Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances, optimized for generative AI, are generally available in the AWS Asia Pacific (Seoul) Region. Amazon EC2 Inf2 instances deliver up to 40% lower inference costs over comparable Amazon EC2 instances.
You can use Inf2 instances to run popular applications such as text summarization, code generation, video and image generation, speech recognition, personalization, and more. Inf2 instances are the first inference-optimized instances in Amazon EC2 to introduce scale-out distributed inference supported by NeuronLink, a high-speed, nonblocking interconnect. Inf2 instances offer up to 2.3 petaflops and up to 384 GB of total accelerator memory with 9.8 TB/s bandwidth.
The AWS Neuron SDK integrates natively with popular machine learning frameworks, so you can continue using your existing frameworks to deploy on Inf2. Developers can get started with Inf2 instances using AWS Deep Learning AMIs, AWS Deep Learning Containers, or managed services such as Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon SageMaker.
Inf2 instances are now available in four sizes: inf2.xlarge, inf2.8xlarge, inf2.24xlarge, inf2.48xlarge in 14 AWS Regions as On-Demand Instances, Reserved Instances, and Spot Instances, or as part of a Savings Plan.
Amazon S3 Tables are now available in eleven additional AWS Regions: Africa (Cape Town), Asia Pacific (Hong Kong), Asia Pacific (Hyderabad), Asia Pacific (Jakarta), Asia Pacific (Malaysia), Asia Pacific (Melbourne), Canada West (Calgary), Europe (Milan), Europe (Zurich), Israel (Tel Aviv), and Middle East (Bahrain). S3 Tables deliver the first cloud object store with built-in Apache Iceberg support, and the easiest way to store tabular data at scale.
Amazon Web Services (AWS) has expanded the regional availability for Amazon AppStream 2.0. Starting today, AWS customers can deploy their applications and desktops in the AWS Europe (Paris) Region and stream them using AppStream 2.0.
Deploying your applications on AppStream 2.0 in a region closer to your end users helps provide a more responsive experience. Additionally, European Union customers now have another AWS region option to deploy their workloads on AppStream 2.0.
Amazon AppStream 2.0 is a fully managed, secure application streaming service that provides users with instant access to their desktop applications from anywhere. It allows users to stream applications and desktops from AWS to their devices, without requiring them to download, install, or manage any software locally. AppStream 2.0 manages the AWS resources required to host and run your applications, scales automatically, and provides access to your users on demand.
To get started with Amazon AppStream 2.0, sign into the AppStream 2.0 management console and select Europe (Paris) Region. For the full list of Regions where AppStream 2.0 is available, see the AWS Region Table. AppStream 2.0 offers pay-as-you-go pricing. For more information, see Amazon AppStream 2.0 Pricing.
Amazon Elastic Compute Cloud (Amazon EC2) C8gd instances, Amazon EC2 M8gd instances, and Amazon EC2 R8gd instances with up to 11.4 TB of local NVMe-based SSD block-level storage are now available in AWS Region Europe (Frankfurt). These instances are powered by AWS Graviton4 processors, delivering up to 30% better performance over Graviton3-based instances. They have up to 40% higher performance for I/O intensive database workloads, and up to 20% faster query results for I/O intensive real-time data analytics than comparable AWS Graviton3-based instances. These instances are built on the AWS Nitro System and are a great fit for applications that need access to high-speed, low latency local storage.
Each instance is available in 12 different sizes. They provide up to 50 Gbps of network bandwidth and up to 40 Gbps of bandwidth to the Amazon Elastic Block Store (Amazon EBS). Additionally, customers can now adjust the network and Amazon EBS bandwidth on these instances by 25% using EC2 instance bandwidth weighting configuration, providing greater flexibility with the allocation of bandwidth resources to better optimize workloads. These instances offer Elastic Fabric Adapter (EFA) networking on 24xlarge, 48xlarge, metal-24xl, and metal-48xl sizes.
These instances are now available in AWS Regions US East (Ohio, N. Virginia), US West (Oregon), and Europe (Frankfurt).
Starting today, Amazon Elastic Compute Cloud (Amazon EC2) R7g instances are available in AWS Middle East (UAE), Canada West (Calgary), Europe (Paris, Zurich), Asia Pacific (Jakarta, Osaka), and Israel (Tel Aviv) Regions. These instances are powered by AWS Graviton3 processors that provide up to 25% better compute performance compared to AWS Graviton2 processors, and built on top of the the AWS Nitro System, a collection of AWS designed innovations that deliver efficient, flexible, and secure cloud services with isolated multi-tenancy, private networking, and fast local storage.
Amazon EC2 Graviton3 instances also use up to 60% less energy to reduce your cloud carbon footprint for the same performance than comparable EC2 instances. For increased scalability, these instances are available in 9 different instance sizes, including bare metal, and offer up to 30 Gbps networking bandwidth and up to 20 Gbps of bandwidth to the Amazon Elastic Block Store (EBS).
Amazon Managed Service for Prometheus is now available in the AWS Canada (Central) region. Amazon Managed Service for Prometheus is a fully managed Prometheus-compatible monitoring service that makes it easy to monitor and alarm on operational metrics at scale.
The list of all supported regions where Amazon Managed Service for Prometheus is generally available can be found on the user guide. Customers can send up to 1 billion active metrics to a single workspace and can create many workspaces per account, where a workspace is a logical space dedicated to the storage and querying of Prometheus metrics.
To learn more about Amazon Managed Service for Prometheus collector, visit the user guide or product page.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) now provides seamless certificate renewal for all MSK Provisioned clusters, allowing the brokers in your cluster to receive the most up-to-date encryption certificates without going through a restart. The encryption certificates used by Amazon MSK require renewal every 13 months. With this launch, certificate renewals on Amazon MSK Provisioned clusters now occur seamlessly without disruption to client connectivity.
This feature is available in all AWS Regions where MSK Provisioned is supported. To learn more, visit the Amazon MSK Developer Guide.
AWS Systems Manager now enables customers to customize their configurations when enabling the new Systems Manager experience, which provides centralized node management capabilities across AWS accounts and Regions. Customers can choose to enable or disable default EC2 instance permissions for Systems Manager connectivity, set the frequency of inventory metadata collection, and define how often the SSM Agent automatically updates.
These options allow customers to tailor their Systems Manager setup while centrally managing their nodes. The new Systems Manager experience uses Default Host Management Configuration (DHMC) to grant EC2 instances permissions to connect to Systems Manager. This simplifies setup and permission management and replaces the previous approach that attached IAM instance profiles to each instance. Customers who prefer to self-manage SSM Agent permissions for EC2 instances can opt out of DHMC to use their own policies. Customers can also define inventory collection schedules and SSM Agent update frequencies that align with their operational requirements. By providing these configuration options, Systems Manager enables customers to manage settings through their preferred methods, including self-managed Infrastructure as Code (IaC) tools and processes.
These customization options are available in all AWS Regions where the new Systems Manager experience is available.
AWS Resource Explorer now supports 41 more resource types across all AWS commercial Regions from services including AWS CloudTrail, Amazon Connect, Amazon Sagemaker, and more.
With this release, customers can now search for the following resource types in AWS Resource Explorer:
In the AI era, where data fuels intelligent applications and drives business decisions, demand for accurate and consistent data insights has never been higher. However, the complexity and sheer volume of data coupled with the diversity of tools and teams can lead to misunderstandings and inaccuracies. That’s why trusted definitions managed by a semantic layer become indispensable. Armed with unique information about your business, with standardized references, the semantic layer provides a business-friendly and consistent interpretation of your data, so that your AI initiatives and analytical endeavors are built on a foundation of truth and can drive reliable outcomes.
Looker’s semantic layer acts as a single source of truth for business metrics and dimensions, helping to ensure that your organization and tools are leveraging consistent and well-defined terms. By doing so, the semantic layer offers a foundation for generative AI tools to interpret business logic, not simply raw data, meaning answers are accurate, thanks to critical signals that map to business language and user intent, reducing ambiguity. LookML (Looker Modeling Language) helps you create the semantic model that empowers your organization to define the structure of your data and its logic, and abstracts complexity, easily connecting your users to the information they need.
A semantic layer is particularly important in the context of gen AI. When applied directly to ungoverned data, gen AI can produce impressive, but fundamentally inaccurate and inconsistent results. It sometimes miscalculates important variables, improperly groups data, or misinterprets definitions, including when writing complex SQL. The result can be misguided strategy and missed revenue opportunities.
In any data-driven organization, trustworthy business information is non-negotiable. Our own internal testing has shown that Looker’s semantic layer reduces data errors in gen AI natural language queries by as much as two thirds. According to a recent report by Enterprise Strategy Group, ensuring data quality and consistency proved to be the top challenge for organizations’ analytics and business intelligence platform. Looker provides a single source of truth, ensuring data accuracy and delivering trusted business logic for the entire organization and all connected applications.
aside_block
<ListValue: [StructValue([(‘title’, ‘Try Google Cloud for free’), (‘body’, <wagtail.rich_text.RichText object at 0x3eba6a366910>), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>
The foundation of trustworthy Gen AI
To truly trust gen AI, it needs to be anchored to a robust semantic layer, which acts as your organization’s data intelligence engine, providing a centralized, governed framework that defines your core business concepts and helping to ensure a single, consistent source of truth.
The semantic layer is essential to deliver on the promise of trustworthy gen AI for BI, offering:
Trust: Reduce gen AI “hallucinations” by grounding AI responses in governed, consistently defined data.
Deep business context: AI and data agents should know your business as well as your analysts do. You can empower those agents with an understanding of your business language, metrics, and relationships to accurately interpret user queries and deliver relevant answers.
Governance: Enforce your existing data security and compliance policies within the gen AI environment, protecting sensitive information and providing auditable data access.
Organizational alignment: Deliver data consistency across your entire organization, so every user, report and AI-driven insight are using the same definitions and terms and referring to them the same way.
LookML improves accuracy and reduces large language model guesswork
The semantic layer advantage in the gen AI era
LookML, Looker’s semantic modeling language, is architected for the cloud and offers a number of critical values for fully integrating gen AI in BI:
Centralized definitions: Experts can define metrics, dimensions, and join relationships once, to be re-used across all Looker Agents, chats and users, ensuring consistent answers that get everyone on the same page.
Deterministic advanced calculations: Ideal for complex mathematical or logistical operations, Looker eliminates randomness and provides predictable and repeatable outcomes. Additionally, our dimensionalized measures capability aggregates values so you can perform operations on them as a group, letting you perform complex actions quickly and simply.
Software engineering best practices: With continuous integration and version control, Looker ensures code changes are frequently tested and tracked, keeping production applications running smoothly.
Time-based analysis: Built-in dimension groups allow for time-based and duration-based calculations.
Deeper data drills: Drill fields allow users to explore data in detail through exploration of a single data point. Data agents can tap into this capability and assist users to dive deeper into different slices of data.
With the foundation of a semantic layer, rather than asking an LLM to write SQL code against raw tables with ambiguous field names (e.g., order.sales_sku_price_US), the LLM is empowered to do what it excels at: searching through clearly defined business objects within LookML (e.g., Orders > Total Revenue). These objects can include metadata and human-friendly descriptions (e.g., “The sum of transaction amounts or total sales price”). This is critical when business users speak in the language of business — “show me revenue” — versus the language of data — ”show me sum of sales (price), not quantity.” LookML bridges the data source and what a decision-maker cares about, so an LLM can better identify the correct fields, filters, and sorts and turn data agents into intelligent ad-hoc analysts.
LookML offers you a well-structured library catalog for your data, enabling an AI agent to find relevant information and summaries, so it can accurately answer your question. Looker then handles the task of actually retrieving that information from the right place.
The coming together of AI and BI promises intelligent, trustworthy and conversational insights. Looker’s semantic layer empowers our customers to gain benefit from these innovations in all the surfaces where they engage with their data. We will continue to expand support for a wide variety of data sources, enrich agent intelligence, and add functionality to conversational analytics to make data interaction as intuitive and powerful as a conversation with your most trusted business advisor.
To gain the full benefits of Looker’s semantic layer and Conversation Analytics, get started here. To learn more about the Conversational Analytics API, see our recent update from Google Cloud Next, orsign up here for preview access.
In today’s data-driven world, teams struggle with siloed data, lack of business context, data reliability concerns, and inconsistent governance that hinders actionable insights. But what if there was a way that could transform your data landscape, unlocking the true value of your information?
That’s the problem we aim to solve with the experimental launch of data products in BigQuery, announced at Google Cloud Next.
Data products in BigQuery offers an approach to organizing, sharing, and leveraging your most valuable asset by treating data as the product. Imagine a ‘Customer Sales’ data product: a curated bundle of BigQuery views combining customer order details and regional sales data. The Sales Analytics team, as the data product owner, provides business context for campaign analysis, along with data freshness guarantees and a dedicated point of contact. With this context and guarantees, data consumers can now effectively use this data product to make informed business decisions related to customer sales.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud data analytics’), (‘body’, <wagtail.rich_text.RichText object at 0x3eba6447b400>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>
A data product in BigQuery simplifies the transaction between data producers and consumers by allowing data producers to bundle one or more BigQuery tables or views that address a use case, and distribute them as a logical block to data consumers. While BigQuery already provides a powerful way to share data through datasets listed in data exchanges, data products go beyond this by offering a higher level of abstraction and richer context on the business case the data addresses. Data products are available within the BigQuery experience, allowing data consumers to search and discover relevant assets as one consumable unit.
A data product allows data producers to manage their data as product, which entails the following:
Build for use cases: Identify the customer, use case, and build a data product with one or more assets that addresses the use case.
Establish ownership: Define the owner and contact information for the data product, helping to ensure accountability and provide trust for consumers.
Democratize context: Distribute valuable context about the problems the product addresses, usage examples and expectations.
Streamline contracts: Provide data consumers the ability to annotate details on data freshness and quality to provide trust and cut down time to insight.
Govern assets: Control who can view the product and regulate access to the data that’s distributed via the data product.
Discover data: Provide data consumers the ability to easily discover and search data products.
Distribute data: Distribute the data product beyond the organization’s boundaries into private consortiums or to the public via a data exchange.
Evolve offerings: Iterate and evolve the product to address consumer needs.
When data producers build assets that address use cases and manage data as a product, it allows data teams to be more efficient, with:
Reduced redundancy: By creating standardized and reusable data products, data teams avoid building the same datasets or pipelines repeatedly for different users or purposes. This frees up their time and resources.
Better prioritization: Treating data as a product helps data teams prioritize their work based on the value and impact of each data product, aligning their efforts with business needs.
Demonstrable ROI: By tracking the usage and the impact of a data product, data teams can better measure and communicate the value of their work to the organization.
Built-in data governance: In the future, data products will be able to incorporate governance policies and compliance workflows, helping to ensure that data is managed responsibly and consistently.
Finally, all of these translate to efficiency for the data consumer by reducing the toil involved in finding the right asset. Data consumers get faster access to insight, since anyone within the organization can search, browse, and discover data products, as well as subscribe to the data product. They also get increased trust, because when data is well-defined, reliable, and properly documented, it’s easier to select the right data for a given use case.
Data products in BigQuery provide the building blocks and controls you need to manage data as a product. It leads to faster access to insights for data consumers through business-outcome-driven data management, maximizing value to the organization.
Are you ready to unlock the untapped potential of your data? Sign up for the experimental preview here.
Have you ever had something on the tip of your tongue, but you weren’t exactly sure how to describe what’s in your mind?
For developers, this is where “vibe coding ” comes in. Vibe coding helps developers achieve their vision with models like Gemini 2.5 Pro to generate code from natural language prompts. Instead of writing every line of code, developers can now describe the desired functionality in plain language. AI translates these “vibes” into your vision.
Today, we’ll show you how vibe coding can help developers create Model Context Protocol (MCP) servers. MCP, launched in November 2024 by Anthropic, provides an open standard for integrating AI models with various data sources and tools. Since its release, it has become increasingly popular for building AI applications – including with new experimental models like Gemini 2.5.
You can use Gemini 2.5 Pro’s code generation capabilities to create MCP servers with ease, helping you build intuitive, natural language specifications and operational AI infrastructure.
aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3eba671716d0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>
The methodology
Effective AI-assisted coding, especially for specific tasks like generating MCP server code with models such as Gemini 2.5 Pro, starts with clear prompting. To achieve the best results:
Provide context: Offer relevant background information about the MCP server.
Be specific: Give clear and detailed instructions for the code you need.
Be patient: Generating and refining code can take time.
Remember that this is often an iterative process. Be prepared to refine your instructions and regenerate the code until the results are satisfactory.
How to create an MCP server using vibe coding, step by step
There are two ways to leverage Gemini 2.5 Pro for vibe coding: through the Gemini app (gemini.google.com) or by utilizing the Google Gen AI SDK.
Visit gemini.google.com and upload the saved PDF file.
Enter your prompt to generate the desired code.
Here’s an example of prompt to generate a Google Cloud BigQuery MCP Server:
instruction = """
You are an MCP server expert. Your mission is to write python code for MCP server. The MCP server development guide and examples are provided.
Please create MCP server code for Google Cloud BigQuery. It has two tools:
One is to list tables for all datasets,
The other is to describe a table.
Google Cloud project ID and location will be provided in the query string. Please use project id to access BigQuery client.
”””
4. Copy your code, and test the server using this notebook
Alternatively, you can use Google Gen AI SDK to create your server code in a notebook.
Approach 2: Use the Google Gen AI SDK
Begin by configuring the system’s instructions.
system_instruction = f"""
You are an MCP server expert.
Your mission is to write python code for MCP server.
Here's the MCP server development guide and example:
{reference_content}
"""
2. Set user prompt
This step involves defining the instructions or questions that will guide the Gemini Vibe Coding process. The user prompt acts as the input for the AI model, specifying the desired outcome for building MCP servers.
url = "https://medlineplus.gov/about/developers/webservices/"
prompt_base = """
Please create an MCP server code for https://medlineplus.gov/about/developers/webservices/. It has one tool:
- get_medical_term. You provide a medical term, this tool will return an explanation of the medical term.
Here's the API details:
"""
prompt = [prompt_base, types.Part.from_uri(file_uri=url, mime_type="text/html")]
Creating an MCP server example for a government website offering free API services is shown above.
To enhance understanding of the API being used, the API service URL is provided as additional context for Gemini content generation.
3. Generate code
Utilize the provided function to create the necessary server code.
4. Use this notebook to test the server. The complete and detailed code is available within this notebook.
Test it by yourself
Gemini 2.5 Pro, currently in preview, offers exceptional code generation capabilities for MCP servers, drastically speeding up and easing the development of your MCP applications. Keep in mind that vibe coding, including models like Gemini 2.5 Pro, may produce errors, so thorough code review is essential before implementation.
To begin creating your own code, explore Gemini app. We suggest experimenting with various prompts and Gemini models.
Google Threat Intelligence Group (GTIG) has identified a new piece of malware called LOSTKEYS, attributed to the Russian government-backed threat group COLDRIVER (also known as UNC4057, Star Blizzard, and Callisto). LOSTKEYS is capable of stealing files from a hard-coded list of extensions and directories, along with sending system information and running processes to the attacker. Observed in January, March, and April 2025, LOSTKEYS marks a new development in the toolset of COLDRIVER, a group primarily known for credential phishing against high-profile targets like NATO governments, non-governmental organizations (NGOs), and former intelligence and diplomatic officers. GTIG has been tracking COLDRIVER for many years, including their SPICA malware in 2024.
COLDRIVER typically targets high-profile individuals at their personal email addresses or at NGO addresses. They are known for stealing credentials and after gaining access to a target’s account they exfiltrate emails and steal contact lists from the compromised account. In select cases, COLDRIVER also delivers malware to target devices and may attempt to access files on the system.
Recent targets in COLDRIVER’s campaigns have included current and former advisors to Western governments and militaries, as well as journalists, think tanks, and NGOs. The group has also continued targeting individuals connected to Ukraine. We believe the primary goal of COLDRIVER’s operations is intelligence collection in support of Russia’s strategic interests. In a small number of cases, the group has been linked to hack-and-leak campaigns targeting officials in the UK and an NGO.
To safeguard at-risk users, we use our research on serious threat actors like COLDRIVER to improve the safety and security of Google’s products. We encourage potential targets to enroll in Google’s Advanced Protection Program, enable Enhanced Safe Browsing for Chrome, and ensure that all devices are updated.
Stage 1 — It Starts With A Fake CAPTCHA
LOSTKEYS is delivered at the end of a multi-step infection chain that starts with a lure website with a fake CAPTCHA on it. Once the CAPTCHA has been “verified,” PowerShell is copied to the users clipboard and the page prompts the user to execute the PowerShell via the “run” prompt in Windows:
The first stage PowerShell that is pasted in will fetch and execute the second stage. In multiple observed cases, the second stage was retrieved from 165.227.148[.]68.
COLDRIVER is not the only threat actor to deliver malware by socially engineering their targets to copy, paste, and then execute PowerShell commands—a technique commonly called “ClickFix.” We have observed multiple APT and financially motivated actors use this technique, which has also been widelyreportedpublicly. Users should exercise caution when encountering a site that prompts them to exit the browser and run commands on their device, and enterprise policies should implement least privilege and disallow users from executing scripts by default.
Stage 2 — Device Evasion
The second stage calculates the MD5 hash of the display resolution of the device and if the MD5 is one of three specific values it will stop execution, otherwise it will retrieve the third stage. This step is likely done to evade execution in VMs. Each observed instance of this chain uses different, unique identifiers that must be present in the request to retrieve the next stage. In all observed instances the third stage is retrieved from the same host as the previous stages.
Stage 3 — Retrieval of the Final Payload
The third stage is a Base64-encoded blob, which decodes to more PowerShell. This stage retrieves and decodes the final payload. To do this it pulls down two more files, from the same host as the others, and again using different unique identifiers per infection chain.
The first is a Visual Basic Script (VBS) file, which we call the “decoder” that is responsible for decoding the second one. The decoding process uses two keys, which are unique per infection chain. The decoder has one of the unique keys and the second key is stored in stage 3. The keys are used in a substitution cipher on the encoded blob, and are unique to each infection chain. A Python script to decode the final payload is:
# Args: encoded_file Ah90pE3b 4z7Klx1V
import base64
import sys
if len(sys.argv) != 4:
print("Usage: decode.py file key1 key2")
sys.exit(1)
if len(sys.argv[2]) != len(sys.argv[3]):
print("Keys must be the same length")
sys.exit(1)
with open(sys.argv[1], 'r') as f:
data = f.read()
x = sys.argv[2]
y = sys.argv[3]
for i in range(len(x)):
data = data.replace(x[i], '!').replace(y[i], x[i]).replace('!', y[i])
with open(sys.argv[1] + '.out', 'wb') as f:
f.write(base64.b64decode(data))
The Final Payload (LOSTKEYS)
The end result of this is a VBS that we call LOSTKEYS. It is a piece of malware that is capable of stealing files from a hard-coded list of extensions and directories, along with sending system information and running processes to the attacker. The typical behavior of COLDRIVER is to steal credentials and then use them to steal emails and contacts from the target, but as we have previously documented they will also deploy malware called SPICA to select targets if they want to access documents on the target system. LOSTKEYS is designed to achieve a similar goal and is only deployed in highly selective cases.
A Link To December 2023
As part of the investigation into this activity, we discovered two additional samples, hashes of which are available IOCs section, dating back as early as December 2023. In each case, the samples end up executing LOSTKEYS but are distinctly different from the execution chain mentioned here in that they are Portable Executable (PE) files pretending to be related to the software package Maltego.
It is currently unclear if these samples from December 2023 are related to COLDRIVER, or if the malware was repurposed from a different developer or operation into the activity seen starting in January 2025.
Protecting the Community
As part of our efforts to combat threat actors, we use the results of our research to improve the safety and security of Google’s products. Upon discovery, all identified malicious websites, domains and files are added to Safe Browsing to protect users from further exploitation. We also send targeted Gmail and Workspace users government-backed attacker alerts notifying them of the activity and encouraging potential targets to enable Enhanced Safe Browsing for Chrome and ensure that all devices are updated.
We are committed to sharing our findings with the security community to raise awareness and with companies and individuals that might have been targeted by these activities. We hope that improved understanding of tactics and techniques will enhance threat hunting capabilities and lead to stronger user protections across the industry.
Indicators of compromise (IOCs) and YARA rules are included in this post, and are also available as a GTI collection and rule pack.
YARA Rules
rule LOSTKEYS__Strings {
meta:
author = "Google Threat Intelligence"
description = "wscript that steals documents and becaons system
information out to a hardcoded address"
hash = "28a0596b9c62b7b7aca9cac2a07b067109f27d327581a60e8cb4fab92f8f4fa9"
strings:
$rep0 = "my_str = replace(my_str,a1,"!" )"
$rep1 = "my_str = replace(my_str,b1 ,a1 )"
$rep2 = "my_str = replace(my_str,"!" ,b1 )"
$mid0 = "a1 = Mid(ch_a,ina+1,1)"
$mid1 = "b1 = Mid(ch_b,ina+1,1)"
$req0 = "ReqStr = base64encode( z & ";" &
ws.ExpandEnvironmentStrings("%COMPUTERNAME%") & ";" &
ws.ExpandEnvironmentStrings("%USERNAME%") & ";" &
fso.GetDrive("C:\").SerialNumber)"
$req1 = "ReqStr = Chain(ReqStr,"=+/",",-_")"
$cap0 = "CapIN "systeminfo > """ & TmpF & """", 1, True"
$cap1 = "CapIN "ipconfig /all >> """ & TmpF & """", 1, True"
$cap2 = "CapIN "net view >> """ & TmpF & """", 1, True"
$cap3 = "CapIN "tasklist >> """ & TmpF & """", 1, True"
condition:
all of ($rep*) or all of ($mid*) or all of ($req*) or all of ($cap*)
}
Amazon Web Services has announced availability of Amazon WorkSpaces Personal, WorkSpaces Pools and WorkSpaces Core in the AWS Europe (Paris) Region. You can now provision WorkSpaces closer to your users, helping to provide in-country data residency and a more responsive experience. Additionally, you can quickly add or remove WorkSpaces to meet changing demand, without the cost and complexity of on-premises Virtual Desktop Infrastructure (VDI).
Amazon WorkSpaces is a fully managed virtual desktop infrastructure (VDI) service that helps organizations provide end users access to applications and data while optimizing costs and improving productivity. WorkSpaces gives organizations the flexibility to choose between highly configurable virtual desktops for workers that need access to a consistent, personalized environment each time they log in or pools of virtual desktops shared across a group of users to help reduce costs.
To get started, sign in to the Amazon WorkSpaces Management Console and select Europe (Paris) Region. For the full list of Regions where WorkSpaces is available, see the AWS Region Table. For pricing details, visit the WorkSpaces Pricing page.
Amazon SageMaker now supports direct connectivity to Oracle, Amazon DocumentDB, and Microsoft SQL Server databases, expanding the available data integration capabilities in Amazon SageMaker Lakehouse. This enhancement enables customers to seamlessly access and analyze data from these databases.
With these new data source connections, customers can directly query data and build ETL flows from their Oracle, Amazon DocumentDB, and Microsoft SQL Server databases. This integration simplifies data and AI/ML workflows by allowing you to work with your data alongside AWS data, analytics and AI capabilities.
Support for these new data sources is available in all AWS Regions where Amazon SageMaker Unified Studio is available. For the most up-to-date information about regional availability, visit the AWS Region table.
To learn more about connecting to data sources in Amazon SageMaker Lakehouse, visit the documentation.