Cloud

2022 01 07

Azure – Generally available: Azure HPC – CycleCloud 8.1.1

Azure, Cloud Azure

This release includes a considerable list of resolved issues and a retired feature.

Read More for the details.

2022 01 07

AWS – AWS Firewall Manager now supports AWS Shield Advanced automatic application layer DDoS mitigation

AWS, Cloud AWS

AWS Firewall Manager now enables you to centrally deploy AWS Shield Advanced automatic application layer (L7) DDoS protections across accounts in your organization. AWS Shield Advanced automatic L7 DDoS protections block application layer DDoS events with no manual intervention needed. With this launch, security administrators for AWS Firewall Manager can now enable automatic L7 DDoS protections across accounts using the Firewall Manager security policy for AWS Shield Advanced.

Read More for the details.

2022 01 07

AWS – Amazon RDS for SQL Server now supports SSAS Multidimensional

AWS, Cloud AWS

Amazon RDS for SQL Server now supports SQL Server Analysis Services (SSAS) in Multidimensional mode. There is no additional cost to install SSAS directly on your Amazon RDS for SQL Server DB instance.

Read More for the details.

2022 01 07

GCP – How Google Cloud and Trigo are partnering to power autonomous stores

Cloud, Google Cloud gcp

Editor’s note: To kick off the new year, we invited partners from across our retail ecosystem to share stories, best practices, and tips and tricks on how they are helping retailers transform during a time that has seen tremendous change. We hope you enjoy these series of guest blogs from our partners.

At Trigo, we are building the digital infrastructure for the future of retail with our computer vision and AI-powered technology. It is currently enabling our frictionless checkout solution, EasyOut™, to retrofit existing grocery stores for some of the world’s largest grocery retailers. Over the past few months, we have opened our first fully operational and autonomous stores integrated with EasyOut™ in the UK and Germany. From the beginning, we have been working with Google Cloud to provide the best product for our retailers, and our partnership is more important than ever as we scale with more retailers and stores in the pipeline over the next year.

Opening new stores

Trigo’s computer vision and AI-driven technology creates 3D models of the stores, working behind the scenes to capture shoppers’ movement and interactions with thousands of SKUs. The amount of data – at least ½ billion images per day – that needs to be processed and analyzed requires a robust technological infrastructure to support these functions. We chose Google Cloud because of their offerings that can keep up with the pace at which we create, analyze and then utilize data. Google Cloud BigQuery has been instrumental in helping us improve our performance, specifically basket accuracy and time to receipt. We leveraged BigQuery to manage and process all of the data and queries that we inputted, which allowed us to improve our accuracy.

Each new store we open requires additional processing capabilities to sustain the massive amount of data being fed into our system. We plan to dramatically grow the number of stores we have already opened within the next year, so our cloud partner must have the technological capacity to handle our scale goals. One of these goals is global expansion, which will require our system to be able to communicate effectively with its cloud backends from any of our activity regions – North America, Europe, the Middle East, and more. Google Cloud has servers present in all of these regions, so the latency time for receipts and other data is thus shorter for all of these stores. This will allow us to deliver a better service and an overall better experience for our retailers’ shoppers, which will encourage our scale goals for existing retailers as well.

Beyond cloud services

Part of the EasyOut™ deployment process includes the installation of an off-the-shelf sensors kit. While our core business is developing the most cutting-edge technology for frictionless checkout, it is also choosing the right hardware partners to maximize the potential of our algorithms and software. Google Cloud has been proactive from the start in connecting us to their vast network of partners, giving us access to hardware providers that can improve the accuracy of our product.

Google Cloud has also been very hands-on in gaining a deep understanding of our systems. Our teams regularly collaborate to do a deep dive into the internals of our system and the Google Cloud team provides us with their professional insights. They also give us access to their experts in AI, big data and machine learning that find the right tools to help us solve complex problems in our system.

Looking ahead

As we continue to grow into new regions and with new retailers, so will the products and services that we offer. Our upcoming StoreOS™ suite of products, powered by our proprietary 3D engine, will provide retailers with a range of additional solutions, including predictive inventory management, pricing optimization, security and fraud prevention, planogram compliance, and event-driven marketing. Google Cloud has been invaluable in paving the way for this expansion, whether through their extensive data storage and analysis capabilities or by providing support and insights from their internal experts. Google is at the forefront of innovation, creating cutting edge solutions that have the potential to elevate our technological capabilities. We look forward to continuing our partnership as we continue in our quest to re-invent the in-store shopping experience by bringing all the benefits of the online to the physical stores.

Read More for the details.

2022 01 07

AWS – Amazon EC2 C6g and R6gd instances powered by AWS Graviton2 now available in additional regions.

AWS, Cloud AWS

Starting today, the compute-optimized Amazon EC2 C6g instances are now available in Middle East (Bahrain) region. The C6g instances are ideal for compute-intensive applications such as high performance computing, video encoding, gaming, and CPU-based machine learning inference acceleration. Additionally, memory-optimized Amazon EC2 R6gd instances with local NVMe-based SSD storage are available in Asia Pacific (Mumbai), Canada (Central), and Europe (Paris). R6gd instances deliver up to 50% more NVMe storage GB/vCPU over comparable x86-based instances, and are ideal for applications that need access to high-speed, low latency storage, as well as for temporary storage of data such as batch and log processing, and for high-speed caches and scratch files. The Amazon EC2 C6g and R6gd instances are powered by AWS Graviton2 processors and deliver best price performance in EC2.

Read More for the details.

2022 01 07

AWS – Amazon EC2 M6a Instances are Now Available in Asia Pacific (Mumbai) AWS Region

AWS, Cloud AWS

Starting today, Amazon EC2 M6a instances are available in are available in the Asia Pacific (Mumbai) region. Designed to provide a balance of compute, memory, storage and network resources, M6a instances are built on the AWS Nitro System, a combination of dedicated hardware and lightweight hypervisor, which delivers practically all of the compute and memory resources of the host hardware to your instances. These instances are SAP-Certified and are ideal for workloads such as web and application servers, back-end servers supporting enterprise applications (e.g. Microsoft Exchange Server and SharePoint Server, SAP Business Suite, MySQL, Microsoft SQL Server, and PostgreSQL databases), web servers, micro-services, multi-player gaming servers, caching fleets, as well as for application development environments.

Read More for the details.

2022 01 07

GCP – Are you a multicloud engineer yet? The case for building skills on more than one cloud

Cloud, Google Cloud gcp

Over the past few months, I made the choice to move from the AWS ecosystem to Google Cloud — both great clouds! — and I think it’s made me a stronger, more well-rounded technologist.

But I’m just one data point in a big trend. Multicloud is an inevitability in medium-to-large organizations at this point, as I and others have been saying for awhile now. As IT footprints get more complex, you should expect to see a broader range of cloud provider requirements showing up where you work and interview. Ready or not, multicloud is happening.

In fact, Hashicorp’s recent State of Cloud Strategy Survey found 76% of employers are already using multiple clouds in some fashion, with more than 50% flagging lack of skills among their employees as a top challenge to survival in the cloud.

That spells opportunity for you as an engineer. But with limited time and bandwidth, where do you place your bets to ensure that you’re staying competitive in this ever-cloudier world?

You could pick one cloud to get good at and stick with it; that’s a perfectly valid career bet. (And if you do bet your career on one cloud, you should totally pick Google Cloud! I have reasons!) But in this post I’m arguing that expanding your scope of professional fluency to at least two of the three major US cloud providers (Google Cloud, AWS, Microsoft Azure) opens up some unique, future-optimized career options.

What do I mean by ‘multicloud fluency’?

For the sake of this discussion, I’m defining “multicloud fluency” as a level of familiarity with each cloud that would enable you to, say, pass the flagship professional-level certification offered by that cloud provider–for example, Google Cloud’s Professional Cloud Architect certification or AWS’s Certified Solutions Architect Professional. Notably, I am not saying that multicloud fluency implies experience maintaining production workloads on more than one cloud, and I’ll clarify why in a minute.

How does multicloud fluency make you a better cloud engineer?

I asked the cloud community on Twitter to give me some examples of how knowledge of multiple clouds has helped their careers, and dozens of engineers responded with a great discussion.

Turns out that even if you never incorporate services from multiple clouds in the same project — and many people don’t! — there’s still value in understanding how the other cloud lives.

Learning the lingua franca of cloud

I like this framing of the different cloud providers as “Romance languages” — as with human languages in the same family tree, clouds share many of the same conceptual building blocks. Adults learn primarily by analogy to things we’ve already encountered. Just as learning one programming language makes it easier to learn more, learning one cloud reduces your ramp-up time on others.

More than just helping you absorb new information faster, understanding the strengths and tradeoffs of different cloud providers can help you make the best choice of services and architectures for new projects. I actually remember struggling with this at times when I worked for a consulting shop that focused exclusively on AWS. A client would ask “What if we did this on Azure?” and I really didn’t have the context to be sure. But if you have a solid foundational understanding of the landscape across the major providers, you can feel confident — and inspire confidence! — in your technical choices.

Becoming a unicorn

To be clear, this level of awareness isn’t common among engineering talent. That’s why people with multicloud chops are often considered “unicorns” in the hiring market. Want to stand out in 2022? Show that you’re conversant in more than just one cloud. At the very least, it expands the market for your skills to include companies that focus on each of the clouds you know.

Taking that idea to its extreme, some of the biggest advocates for the value of a multicloud resumé are consultants, which makes sense given that they often work on different clouds depending on the client project of the week. Lynn Langit, an independent consultant and one of the cloud technologists I most respect, estimates that she spends about 40% of her consulting time on Google Cloud, 40% on AWS, and 20% on Azure. Fluency across providers lets her select the engagements that are most interesting to her and allows her to recommend the technology that provides the greatest value.

But don’t get me wrong: multicloud skills can also be great for your career progression if you work on an in-house engineering team. As companies’ cloud posture becomes more complex, they need technical leaders and decision-makers who comprehend their full cloud footprint. Want to become a principal engineer or engineering manager at a mid-to-large-sized enterprise or growing startup? Those roles require an organization-wide understanding of your technology landscape, and that’s probably going to include services from more than one cloud.

How to multicloud-ify your career

We’ve established that some familiarity with multiple clouds expands your career options. But learning one cloud can seem daunting enough, especially if it’s not part of your current day job. How do you chart a multicloud career path that doesn’t end with you spreading yourself too thin to be effective at anything?

Get good at the core concepts

Yes, all the clouds are different. But they share many of the same basic approaches to IAM, virtual networking, high availability, and more. These are portable fundamentals that you can move between clouds as needed. If you’re new to cloud, an associate-level solutions architect certification will help you cover the basics. Make sure to do hands-on labs to help make the concepts real, though — we learn much more by doing than by reading.

Go deep on your primary cloud

Fundamentals aside, it’s really important that you have a native level of fluency in one cloud provider. You may have the opportunity to pick up multicloud skills on the job, but to get a cloud engineering role you’re almost certainly going to need to show significant expertise on a specific cloud.

Note: If you’re brand new to cloud and not sure which provider to start with, my biased (but informed) recommendation is to give Google Cloud a try. It has a free tier that won’t bill you until you give permission, and the nifty project structure makes it really easy to spin up and tear down different test environments.

It’s worth noting that engineering teams specialize, too; everybody has loose ends, but they’ll often try to standardize on one cloud provider as much as they can. If you work on such a team, take advantage of the opportunity to get as much hands-on experience with their preferred cloud as possible.

Go broad on your secondary cloud

You may have heard of the concept of T-shaped skills. A well-rounded developer is broadly familiar with a range of relevant technologies (the horizontal part of the “T”), and an expert in a deep, specific niche. You can think of your skills on your primary cloud provider as the deep part of your “T”. (Actually, let’s be real — even a single cloud has too many services for any one person to hold in their heads at an expert level. Your niche is likely to be a subset of your primary cloud’s services: say, security or data.)

We could put this a different way: build on your primary cloud, get certified on your secondary. This gives you hirable expertise on your “native” cloud and situational awareness of the rest of the market. As opportunities come up to build on that secondary cloud, you’ll be ready.

I should add that several people have emphasized to me that they sense diminishing returns when keeping up with more than one secondary cloud. At some point the cognitive switching gets overwhelming and the additional learning doesn’t add much value. Perhaps the sweet spot looks like this: 1< 2 > 3.

Bet on cloud-native services and multicloud tooling

The whole point of building on the cloud is to take advantage of what the cloud does best — and usually that means leveraging powerful, native managed services like Spanner and Vertex AI.

On the other hand, the cloud ecosystem has now matured to the point where fantastic, open-source multicloud management tooling for wrangling those provider-specific services is readily available. (Doing containers on cloud? Probably using Kubernetes! Looking for a DevOps role? The team is probably looking for Terraform expertise no matter what cloud they major on.) By investing learning time in some of these cross-cloud tools, you open even more doors to build interesting things with the team of your choice.

Multicloud and you

When I moved into the Google Cloud world after years of being an AWS Hero, I made sure to follow a new set of Google Cloud voices like Stephanie Wong and Richard Seroter. But I didn’t ghost my AWS-using friends, either! I’m a better technologist (and a better community member) when I keep up with both ecosystems.

“But I can hardly keep up with the firehose of features and updates coming from Cloud A. How will I be able to add in Cloud B?” Accept that you can’t know everything. Nobody does. Use your broad knowledge of cloud fundamentals as an index, read the docs frequently for services that you use a lot, and keep your awareness of your secondary cloud fresh:

Follow a few trusted voices who can help you filter the signal from the noise

Attend a virtual event once a quarter or so; it’s never been easier to access live learning

Build a weekend side project that puts your skills into practice

Ultimately, you (not your team or their technology choices!) are responsible for the trajectory of your career. If this post has raised career questions that I can help answer, please feel free to hit me up on Twitter. Let’s continue the conversation.

Read More for the details.

2022 01 07

AWS – AWS IoT Core for LoRaWAN Launches Two New Features to Manage and Monitor Communications Between Device and Cloud

AWS, Cloud AWS

AWS IoT Core for LoRaWAN is a fully managed LoRaWAN Network Server (LNS) of AWS IoT Core that allows customers to connect wireless devices to the AWS cloud using the low-power long-range wide area network (LoRaWAN) technology. Now, AWS IoT Core for LoRaWAN supports two new features, Downlink Queue Management and Network Analyzer, to help customers manage and monitor communications between devices and the cloud. Downlink Message Management feature allows customers to schedule, delete and even purge downlink messages, and Network Analyzer can be used to monitor the messages and help troubleshoot issues related to uplink or downlink events.

Read More for the details.

2022 01 07

AWS – Amazon ECR adds the ability to monitor repository pull statistics

AWS, Cloud AWS

Today, Amazon Elastic Container Registry (ECR) launched the ability to monitor repository pull statistics through Amazon CloudWatch. The new pull statistics helps you to monitor usage patterns or identify anomalous behavior by observing image pull requests per repository.

Read More for the details.

2022 01 07

AWS – Amazon AppStream 2.0 now provides application entitlements for SAML 2.0 federated user identities

AWS, Cloud AWS

Starting today, you can control access to specific applications within your Amazon AppStream 2.0 stacks based on SAML 2.0 attribute assertions. In addition, your SAML 2.0 federated user identities can access multiple AppStream 2.0 stacks from a single SAML 2.0 service provider (SP) application. Previously, each stack required a separate service provider application configured in your SAML 2.0 identity provider (IdP). These features will allow you to streamline access control to your AppStream stacks and reduce the number of fleets and images that need to be maintained due to application access restrictions. For example, from a single SAML 2.0 SP application in your IdP relaying to a single AppStream 2.0 stack, you can entitle users belonging to one group to one set of applications, and another group to a different set of applications.

Read More for the details.

2022 01 07

Azure – Azure Traffic Manager: Additional IP addresses for endpoint monitoring service.

Azure, Cloud Azure

Update your network access control rules if you see Traffic Manager health probes with new IP addresses.

Read More for the details.

2022 01 06

AWS – Amazon EC2 On-Demand Capacity Reservations now support Cluster Placement Groups

AWS, Cloud AWS

Starting today, customers can use Amazon EC2 On-Demand Capacity Reservations to reserve capacity for cluster placement groups. With cluster placement groups, customers can launch EC2 instances into logical groups within a segment of the network with high bisection bandwidth, thus getting low latency and high throughput between instances inside the cluster. Cluster placement groups are beneficial for customers with workloads that require tightly coupled node-to-node communication, such as high-performance computing (HPC) workloads or in-memory databases like SAP HANA. With the addition of On-Demand Capacity Reservations for cluster placement groups, customers can get the assurance of reserved capacity as they scale compute resources within their cluster.

Read More for the details.

2022 01 06

AWS – Instance Tags now available on the Amazon EC2 Instance Metadata Service

AWS, Cloud AWS

You can now access your instance’s tags from the EC2 Instance Metadata Service. Tags enable you to categorize your AWS resources in different ways, for example, by purpose, owner, or environment. This is useful when you have many resources of the same type—you can quickly identify a specific resource based on the tags that you’ve assigned to it. Previously, you could access your instance tags from the console or by using the describe-tags API.

Read More for the details.

2022 01 06

AWS – AWS Lambda now supports ES Modules and Top-Level Await for Node.js 14

AWS, Cloud AWS

AWS Lambda functions using the Node.js 14 runtime now support code packaged as ECMAScript modules, allowing Lambda customers to consume a wider range of JavaScript packages in their Lambda functions. In addition, Lambda customers can now take advantage of ‘top-level await’, a Node.js 14 language feature. When used with Provisioned Concurrency, this improves cold-start performance for functions with asynchronous initialization tasks. For more information, see the blog post Using Node.JS ES Modules and Top-Level Await in AWS Lambda.

Read More for the details.

2022 01 06

AWS – Fine grained access control now supported on existing Amazon OpenSearch Service domains

AWS, Cloud AWS

Amazon OpenSearch Service (successor to Amazon Elasticsearch Service) now supports enabling fine-grained access control on existing domains. Fine-grained access control adds several capabilities to help you have better access control over the data stored in your domain.

Read More for the details.

2022 01 06

AWS – Amazon EMR on EKS adds support for customized container images for interactive jobs run using managed endpoints

AWS, Cloud AWS

Amazon EMR on EKS enables customers to easily run open-source big data frameworks such as Apache Spark on Amazon EKS. Amazon EMR on EKS customers setup and use a managed endpoint (available in preview) to run interactive workloads using integrated development environments (IDE) such as EMR Studio.

Read More for the details.

2022 01 06

AWS – Announcing AWS Serverless Application Model (SAM) CLI support for local testing of AWS Cloud Development Kit (CDK)

AWS, Cloud AWS

Today AWS is announcing the general availability of AWS Serverless Application Model CLI (AWS SAM CLI) support for local testing of AWS Cloud Development Kit applications. AWS SAM and AWS CDK are both open-source frameworks for building applications using infrastructure as code (IaC). AWS SAM is made up of the SAM template, which is a way to describe infrastructure in an application using JSON or YAML, and SAM CLI, which is a tool to build, package, test and deploy AWS SAM applications. AWS CDK is a development framework to define your cloud application resources using familiar programming languages such as Python or Node.js.

Read More for the details.

2022 01 06

AWS – Amazon EKS now supports Internet Protocol version 6 (IPv6)

AWS, Cloud AWS

Amazon Elastic Kubernetes Service (EKS) now supports IPv6, enabling customers to scale containerized applications on Kubernetes far beyond limits of private IPv4 address space, while achieving high network bandwidth with minimal complexity.

Read More for the details.

2022 01 06

AWS – Introducing 37 new resource types in the CloudFormation Registry

AWS, Cloud AWS

Since our last update in November 2021, AWS CloudFormation Registry has expanded to include support for 37 new resource types (refer to the complete list below) between November and December 2021. A resource type includes schema (resource properties and handler permissions) and handlers that allow API interactions with the underlying AWS or third-party services. Customers can now configure, provision, and manage the lifecycle of these newly supported resources as part of their cloud infrastructure through CloudFormation, by treating them as infrastructure as code. Furthermore, we are pleased to announce that three new AWS services added CloudFormation support on the day of launch. These services include: Amazon CloudWatch Evidently, Amazon CloudWatch RUM, and AWS Resilience Hub. CloudFormation now supports 170 AWS services spanning over 830 resource types, along with over 40 third-party resource types.

Read More for the details.

2022 01 06

GCP – Announcing preview of BigQuery’s native support for semi-structured data

Cloud, Google Cloud gcp

Today we’re announcing a public preview for the BigQuery native JSON data type, a capability which brings support for storing and analyzing semi-structured data in BigQuery.

With this new JSON storage type and advanced JSON features like JSON dot notation support, adaptable data type changes, and new JSON functions, semi-structured data in BigQuery is now intuitive to use and query in its native format.

You can enroll in the feature preview by signing up here.

The challenge with changing data

Building a data pipeline involves many decisions. Where will my data be ingested from? Does my application require data to be loaded as a batch job or real-time streaming ingest? How should my tables be structured? Many of these decisions are often made up front before a data pipeline is built, meaning table or data type changes down the road can unfortunately be complex and/or costly.

To handle such events, customers have traditionally had to build complex change-handling automation, pause data ingest to allow for manual intervention, or write unplanned data to a catch-all String field which later has to be parsed in a post-process manner.

These approaches all add cost, complexity, and slow down your ability to make data driven insights.

Native JSON to the rescue

JSON is a widely used format that allows for semi-structured data, because it does not require a schema. This offers you added flexibility to store and query data that doesn’t always adhere to fixed schemas and data types. By ingesting semi-structured data as a JSON data type, BigQuery allows each JSON field to be encoded and processed independently. You can then query the values of fields within the JSON data individually via dot notation, which makes JSON queries easy to use. This new JSON functionality is also cost efficient compared to previous methods of extracting JSON elements from String fields, which requires processing entire blocks of data.

Thanks to BigQuery’s native JSON support, customers can now write to BigQuery without worrying about future changes to their data. Customers like DeNA, a mobile gaming and e-commerce services provider, sees value in leveraging this new capability as it provides faster time to value.

“Agility is key to our business. We believe Native JSON functionality will enable us to handle changes in data models more quickly and shorten the lead time to pull insights from our data.”—Ryoji Hasegawa, Data Engineer, DeNA Co Ltd.

JSON in action

The best way to learn is often by doing, so let’s see native JSON in action. Suppose we have two ingestion pipelines, one performing batch ingest and the other performing real-time streaming ingest, both of which ingest application login events into BigQuery for further analysis. By leveraging the native JSON feature, we can now embrace upstream data evolution and changes to our application.

Batch ingesting JSON as a CSV

JSON types are currently supported via batch load jobs of CSV-formatted files. So as an example, let’s create a new table called json_example.batch_events and then ingest this correctly escaped login_events.csv file into BigQuery with the below bq commands. You’ll notice the batch_events table has both structured columns as well as a labels field which uses the new JSON type for our semi-structured fields. In this example some application values will remain highly structured such as event creationTime, event ID, event name, etc. so we’ll define this table as storing both structured data as well as semi-structured data.

We’ll look at how to run queries using the new JSON functions a bit later in this blog, but first let’s also explore how we might stream semi-structured real-time events into BigQuery using the JSON type too.

Real-Time Streaming JSON Events

Now let’s walk through an example of how to stream the same semi-structured application login events into BigQuery. We’ll first create a new table called json_example.streaming_events which leverages the same combination of structured and semi-structured columns. However, instead of using the bq command line, we’ll create this table by running the SQL Data definition language (DDL) statement:

BigQuery supports two forms of real-time ingestion: the BigQuery Storage Write API and the legacy streaming API. The Storage Write API provides a unified data-ingestion API to write data into BigQuery via gRPC and provides advanced features like exactly-once delivery semantics, stream-level transactions, support for multiple workers, and is generally recommended over the legacy streaming API. However because the legacy streaming API is still in use by some customers, let’s walk through both examples: ingesting JSON data through the Storage Write API and ingesting JSON data through the legacy insertAll streaming API.

JSON via the Storage Write API

To ingest data via the Storage Write API, we’ll stream data as protocol buffers. For a quick refresher on working with protocol buffers, here’s a great tutorial.

We’ll first define our message format for writing into the json_example.streaming_events table using a .proto file in proto2. You can copy the file from here, then run the following command within a Linux environment to update your protocol buffer definition:

We’ll then use this sample Python code to stream both structured and semi-structured data into the streaming_events table. This code streams a batch of row data by appending proto2 serialized bytes to the serialzed_rows repeated field like the example below. Of particular note is the labels field which was defined within our table to be JSON.

Once executed, we can see our table now has ingested a few rows from the Storage Write API!

Preview of the json_example.streaming_events table in BigQuery after data ingestion.

JSON via the legacy insertAll streaming API

And lastly, let’s explore streaming data to the same streaming_events table with the legacy insertAll API. With the insertAll API approach, we’ll ingest a set of JSON events stored within a local file in real-time to our same streaming_events table. The events will will be structured like the below, with the labels field being highly variable and semi-structured:

Example JSON events to be ingested into BigQuery, with the field labels as a highly variable input which is constantly being updated.

Now run the following Python code which reads data from the local JSON events file and streams it into BigQuery.

Now that our JSON events have successfully been ingested into BigQuery (through batch ingest, the Storage Write API, the legacy streaming API, or even all three) we’re ready to query our semi-structured data in BigQuery!

Preview of the json_example.streaming_events table in BigQuery highlighting the semi-structured nature of the labels JSON field.

Querying your new JSON data

With the introduction of the native JSON type, we’re also introducing new JSON functions to easily and efficiently query data in its native format.

For instance, we can get a count of the events we ingested which encountered a login authentication failure by filtering on the labels.property field of the JSON value using dot notation:

We can also perform aggregations by averaging event threats caused by login failures within our data set by natively casting a threatRating field within labels as a FLOAT:

Native JSON with existing tables

What if you have existing tables, can you take advantage of the Native JSON type without rewriting all your data? Yes!

BigQuery makes operations like modifying existing table schemas a snap though DDL statements like the below which adds a new JSON column titled newJSONField to an existing table:

From here, you can decide on how you want to leverage your newJSON column by either converting existing data (perhaps existing JSON data stored as a String) into the newJSON field or by ingesting net new data into this column.

To convert existing data into JSON, you can leverage an UPDATE DML statement to update your existing through either the PARSE_JSON function, which converts a String into a JSON type, or by using the TO_JSON function, which converts any data type into a JSON type. Here are examples of each below:

Converting a String into JSON:

Converting existing data stored as a nested and repeated STRUCT, like the example here, into JSON:

How can you get started with native JSON in BigQuery?

Data comes in all forms, shapes, sizes, and never stops evolving. If you’d like to support your data and its future evolution with the BigQuery native JSON preview feature, please complete the sign up form here.

Read More for the details.

Cloud

Opening new stores

Beyond cloud services

Looking ahead

Albertsons and Google are making grocery shopping easier with cloud technology

How does multicloud fluency make you a better cloud engineer?

How to multicloud-ify your career

Multicloud and you

Five do’s and don’ts of multicloud, according to the experts

The challenge with changing data

Native JSON to the rescue

JSON in action

Batch ingesting JSON as a CSV

Real-Time Streaming JSON Events

JSON via the Storage Write API

JSON via the legacy insertAll streaming API

Querying your new JSON data

Native JSON with existing tables

How can you get started with native JSON in BigQuery?