Cloud

AWS Glue streaming extract, transform, and load (ETL) jobs can now read from AWS Glue Data Catalog tables created using the AWS Glue Schema Registry. With streaming ETL in AWS Glue, you can set up continuous ingestion pipelines to prepare streaming data on the fly and make it available for analysis in seconds. The AWS Glue Schema Registry allows you to centrally discover, control, and evolve data stream schemas. This integration streamlines the job setup process and simplifies schema enforcement.

Read More for the details.

2021 12 07

Azure – Public preview availability of Virtual Machine restore points

Take multi-disk consistent point in time snapshots of all the disks attached to a VM for backup and disaster recovery purposes.

Read More for the details.

2021 12 07

Azure – India Central Availability Zones now generally available

Azure Availability Zones are now generally available in India Central. These three new zones provide customers with options for additional resiliency and tolerance to infrastructure impact.

Read More for the details.

2021 12 07

GCP – DevOps and CI/CD on Google Cloud explained

What is CI/CD?

Continuous Integration (CI), at its core, is about getting feedback early and often, which makes it possible to identify and correct problems early in the development process. With CI, you integrate your work frequently, often multiple times a day, instead of waiting for one large integration later on. Each integration is verified with an automated build, which enables you to detect integration issues as quickly as possible and reduce problems downstream.

Continuous Delivery (CD) extends CI. CD is about packaging and preparing the software with the goal of delivering incremental changes to users. Deployment strategies such as red/black and canary deployments can help reduce release risk and increase confidence in releases. CD makes the release process safer, lower risk, faster, and, when done well, boring. Once deployments are made painless with CD, developers can focus on writing code, not tweaking deployment scripts.

Click to enlarge

How has the application development landscape changed?

Much has changed in the app development space recently, and you’ll want to take these changes into account as part of your CI/CD strategy.

Hybrid and multi-cloud deployments – Large enterprises want to deploy applications in hybrid cloud environments, with tools and services that don’t lock them into a specific vendor.

The shift from monolith to microservices – Teams are breaking down large monoliths into microservices for greater agility. This makes it possible for different teams to use different languages, tech stacks, development lifecycles, which means deployment patterns, tooling needs, and scaling patterns are changing.

Cloud-native applications – It’s not just VMs anymore; companies are shifting paradigms and embracing serverless, containers, and Kubernetes. While simplifying some aspects of app development, this move adds complexity in other areas. How do you handle rollbacks? Canary deployments? It’s different now.

Ideally, developers should be focused on their code, not on ushering their changes through a CI/CD process. CI/CD steps should be triggered and run behind the scenes as soon as code is checked in. So, your CI/CD pipeline should support:

Packaging source code

Automated unit and integration tests

Consistent build environments

Approvals before deploying to production

Blue/green and canary roll-outs

That’s where Cloud Build comes in.

Google Cloud DevOps overview

Cloud Build

Cloud Build is a fully-managed CI/CD platform that lets you build, test, and deploy across hybrid and multi-cloud environments that include VMs, serverless, Kubernetes, and Firebase. Cloud Build can import source code from Cloud Storage, Cloud Source Repositories, GitHub, or Bitbucket; execute a build to your specifications; and produce artifacts such as Docker container images or Java archives.

Cloud Build executes your build as a series of build steps, with each step run in a Docker container. A build step can do anything that can be done from a container irrespective of the environment. To perform your tasks, you can either use the supported build steps provided by Cloud Build or write your own build steps. As a part of the build step, Cloud Build deploys the app to a platform of your choice. You also have the ability to perform deep security scans within the CI/CD pipeline using Binary Authorization and ensure only trusted container images are deployed to production.

Cloud Build private pools help you meet enterprise security and compliance requirements. These are private, dedicated pools of workers that offer greater customization over the build environment, including the ability to access resources in a private network. For instance, you can trigger fully managed DevOps workflows from source code repositories hosted in private networks, including Github Enterprise.

Cloud Code

If you are working entirely in a cloud-native environment, then you’ll want to use Cloud Code to kick off your CI/CD pipeline. Use Cloud Code in your IDE; it comes with tools to help you write, run, and debug cloud-native applications quickly and easily. Then push your code to Cloud Build for the build process, package it in the Artifact Registry, and run it on GKE or Cloud Run. You can get all the visibility and metrics you want for the deployment in Google Cloud’s operations suite.

Cloud Deploy

Google Cloud Deploy (in preview) is a managed, opinionated continuous delivery service that makes continuous delivery to GKE easier, faster, and more reliable. It has built in security controls and it can be integrated with your existing DevOps ecosystem.

Cloud Shell Editor

Cloud Shell Editor, powered by the Eclipse Theia IDE platform, extends Cloud Shell with an online preconfigured cloud development environment that includes:

Local emulators for Kubernetes and serverless, and

Command line tools for working with cloud-native apps

Interested in getting started with CI/CD on Google Cloud? Check out the documentation here. For more #GCPSketchnote and similar cloud content, follow me on twitter @pvergadia and keep an eye out on thecloudgirl.dev

Read More for the details.

2021 12 07

GCP – Tokopedia’s journey to creating a Customer Data Platform (CDP) on Google Cloud Platform

Founded in 2009, Tokopedia is an ecommerce platform that enables millions of Indonesian to transact online. As the company grows, there is an urgent need to better understand customer’s behavior in order to improve the customer’s experience across the platform. Now, Tokopedia has more than 100 million Monthly Active Users and the demography and preferences of all these users are different. A way to meet their needs is through personalization.

Normally, a user needs to browse through thousands of products in order to find the item they are looking for. By creating product recommendations that are relevant to each user, we shorten their search journey and hopefully increase conversion early on in the journey. In order to build personalization, the Data Engineering Team’s Customer Data Platform (CDP) helped to gain access to user’s attributes. These attributes developed by the Data Engineering team come in handy for different use cases across functions and teams.

Previously, two main challenges were observed:

The need for speed and answers caused an increase in data silos. As the needs for personalization increased across the company, different teams have been building their own personalization features. However, the limited time and the need to simplify communication across teams have resulted in the decision for each team to create their own data pipeline. This caused a few redundancies due to the development of similar data across different teams and these redundancies caused slower development time for new personalized feature, even though some of the attributes have been previously build in a different module.

Inconsistent data definitions. As each team created their own data pipeline, there are many cases where each team had a different definition of a user’s attributes. On several occasions, this caused misunderstandings during meetings and unsynchronized user journeys due to different teams applying different attribute values to the same user. For example, team A evaluated user_id 001 as a woman in their 20s. Meanwhile, team B, having a different set of attributes and definitions evaluated user_id 001 as a woman in their 30s. These differences in definition and attributes can lead to different conclusions and results, consequently giving different personalizations. As a result, customers might be facing inconsistent experience during their journey in Tokopedia and have a bad experience during their activity. Imagine that you’re being displayed by one set type of content that is related with college necessities and then in a different module you’re being given a a content that is related to mom and baby.

Previous State of Data Distribution

Currently, with CDP, different teams do not have to constantly rebuild the infrastructure. The same attributes will only need to be processed once, and can be used by different teams across the company. This optimizes the development time, cost, and effort. Another advantage of having CDP is the single definition of attributes across services and teams. Since different teams will be looking at the same attributes inside the CDP, this will reduce the chances of misunderstanding and strengthen synchronization between teams. This will give customers consistent experience across the Tokopedia platform and enable them to display relevant contents.

CDP High level Concept

Moreover, there are several key factors required in building the CDP platform in Tokopedia. The journey is as follows:

1. Define and Make a List of Attributes
During this phase, we work with the Product and Analyst teams to define all of the user’s attributes required to build the CDP. Our product team interviewed several stakeholders to understand different perspectives regarding user attributes. As a result, an initial attributes list was made to include gender, age group, location, etc. This process is done repetitively in order to have the best understanding of the user’s attributes.

2. Platform Design
After doing comprehensive reviews, we decided to build our CDP platform using several GCP tech stacks.

CDP Architecture

Bigquery was chosen as the analytics backend of our CDP self-service. Meanwhile, Google Cloud BigTable was selected as the backend, where our services will interact to enable the personalization. In developing the storage for Big Table, the design of the scheme is very important. The frequency and categorization will affect how we design the column qualifier while the CDP attribute will affect how we design the row key.

We also opted to create a caching mechanism to reduce the load to big tables for similar read activity. We build the cache system using redis with certain Time to Live (TTL) to ensure an optimized performance. In addition, we also applied a Role Based Access Control (RBAC) mechanism on the CDP API to ensure access control of different services towards attributes in the CDP.

3. Monitoring and alerting
Another important point in building a CDP is developing the correct monitoring and alerting system to maintain stability on our platform. A soft and hard threshold on each metric is established and monitored. Once this threshold is reached, some alerts will be sent through the communication channel. Based on the current architecture, there are several parts in which we need to enable monitoring and alerting.

Data Pipeline
One of the things that we will need to monitor is resource consumption during computation and data pipeline from data sources to the CDP storages, as we operate using Bigquery and Dataflow for Data Computation and Data Pipeline. In Bigquery, we need to monitor the slot utilization that is used to compute some data aggregation or manipulation to produce the attribute.

Data Quality
When building the CDP, high quality data was important in order for it to be a trusted platform. Several metrics that are important in terms of data quality are Data Completeness, Data Validity, Data Anomaly and Data Consistency. Therefore, several monitoring needs to be enabled to ensure these metrics.

Storage and API Performance
Since CDP’s backend and API directly interact with several front facing features, we have to ensure the availability of the CDP service. Since we’re using Big Table as the backend, the monitoring of CPU, Latency and RPS is required. This metric, by default, is provided in the Bigtable monitoring.

4. Discoverability across company
Many users have been inquiring on how they can browse attributes that our CDP offers. Initially, we started out by documenting our attributes and sharing it to our stakeholders. However, as the number of the attributes increased, it became increasingly harder for people to go through our documentation. This pushed us to start integrating the CDP terminology into our Data Catalog. In this case, our Data Catalog plays an important role in enabling users to browse attributes in CDP, including the definition of each attribute and how they can retrieve the data.

5. Implementation and adoption of the platform
Another key point for a successful CDP implementation is collaboration across teams on the front end services. There are several types of CDP implementation in Tokopedia: Personalization, Marketing Analytics, and Self Service Analytics.

Personalization
The most common usage of CDP would be in personalizing a user’s journey. One example of personalization is the search feature. The product team personalizes the user’s search result based on the user’s address, so that the user will be able to find products that are in proximity to their location. After discussing the definition of user address, we created a CDP API contract with the Search team, so the development can run in parallel. As a result, today our users are able to have a better user experience based on their location.

Marketing Analytics
When we started building the CDP platform, we discussed with the Marketing team on their existing use cases. One of their goals was to personalize and optimize marketing efforts, such as sending out notifications to the right user based on the user’s attributes to reduce unnecessary notification costs to unrelated users, and to enhance the overall user experience by avoiding spam notifications. Once we understood their needs, we looked at the ways in which CDP could cater to those needs. We discussed with the relevant team on how to integrate the segmentation engine and communication channel towards the CDP platform, the type of user attributes to use when sending marketing push/notifications, and how to integrate it with the segmentation engine and communication channel of the CDP platform.

Self-Service Analytics
CDP also often uses self-service analytics to enable quick insights on user demographics and behavior in certain segments. To build this self-serve analytics tool, our team consulted with the Product and Analyst teams to define the user demographics’ attributes that business/product users often select for insights. After understanding the attributes required, we discussed with the Business Intelligence team to enable the visualization for the end user. This allowed different teams to understand our users better and gain insights on how we can improve our platform.

CDP implementation has created a significant impact on different use cases and helped Tokopedia to be a more data-driven company. Through CDP, we are also able to strengthen one of our core DNA, which is Focus on Consumer. By sharing the CDP framework, we hope to bring value and help others to more easily create a thriving CDP platform.

Read More for the details.

2021 12 07

GCP – Cloud Security podcast by Google turns 46 – Reflections and lessons!

Time flies when you’re having fun! We’ve produced 46 episodes of the Cloud Security Podcast by Google since our launch in February 2021. Looking back, we’d like to share some cloud security lessons and insights we picked up along the way.

Over the course of 2021, the following themes emerged as the most popular with our audience.

Zero trust security

Cloud threat detection

Making Cloud migrations more secure

Data security in the cloud

Let’s explore each of these while highlighting some of the more interesting episodes:

Zero trust

On the zero trust side, we had a great episode where we interviewed the creator of the term “zero trust”, John Kindervag. We looked through more than 10 years of history of zero trust beginning with the coining of the term in 2010 and early Google efforts in this area. John also shared some practical tips on how to approach zero trust in today’s IT environments.

The second zero trust episode focused on the technical details of collecting data for successful zero trust implementations. We covered some of the critical tasks and data points you must have before beginning any zero trust project.

Rest assured, more episodes on zero trust are coming.

Cloud migration security

The topic of security during cloud migration has been covered both at the leadership level, in our CISO panels, as well as using field lessons from customers, partners, and Googlers. For example, in our CISO panel, Phil Venables, Google Cloud’s CISO, and others emphasized that security in the cloud involves a mindset shift, not just technology change. On the other hand, while looking at some of the implementation lessons, we covered common mistakes that companies make while migrating. One of our partners shared lessons they’ve learned supporting cloud migrations. We also touched on how some organizations faced challenges abandoning pre-cloud thinking and practices.

When migrating to the cloud, where you’re starting from matters as much as where you’re going, and even where you and your customers are located, as we cover in our Europe-focused episode. Specifically, for our users in Europe, a different set of regulatory challenges are in play, including the overlapping and multiplicative regulatory complexity that arises from European federalism.

Finally, most organizations really migrate data and workloads to multiple clouds, and there are specific multi-cloud security challenges covered in this episode.

Cloud threat detection

We dug deep into the topic of threat detection, looking at many angles: from more philosophical challenges down to operational issues with creating rules and practicing detection engineering. A very popular episode shares how some threat detection challenges are solved here at Google. Specifically, we covered how our engineers pursue threat research, then create detection code, and then follow up triaging and responding to the “signals” generated by their detection logic. Yes, Google security engineers both write detection logic and respond to the output of that detection logic. Talk about aligned incentives to create low-noise rules!

No Google security story would be complete without mentioning our fun episode with Heather Adkins. She shared perspective on securing Google and her talk at RSA 2021, which, unlike the proverbial tree falling in the forest, really did happen, even if virtually.

Some great content on SIEM modernization was revealed in the episode where we interviewed one of the key implementation partners for Chronicleand Google Cloud security. We covered how SIEM technology is evolving in the cloud age, and plan to further explore this rich topic in future episodes.

Another excellent episode with a Chronicle user focused on how SIEM technology evolved and how to make it work for you now and in the future.

Data security in the cloud

Data security in the cloud presents both new challenges as well as solutions to old challenges. Things like pervasive encryption in GCP certainly solves some challenges while at the same time, reliance on identity is difficult for organizations that are used to building network security barriers between attackers and data. We covered foundational approaches to data security in the cloud and key pillars of a strategy in our second episode. Next, we asked more key questions about how secure data in the cloud really is and what controls are most important to address customer needs.

A NEXT 2021 special episode gathered together several product managers that build various data security products at Google Cloud (our DLP, encryption, etc.). They spoke to some of the data security innovations built here at Google and how they’ve been productized for our Cloud customers.

What’s next

You can review past episodes on the site and subscribe for upcoming episodes (please!) via Google Podcasts, Apple Podcasts and Spotify. Also, do follow Cloud Security Podcast on Twitter for episode announcements and audience commentary.

Finally, let us know what we should cover in 2022! We look forward to another exciting year bringing you some of the most interesting and diverse voices across the Cloud Security community.

Read More for the details.

2021 12 07

GCP – The past, present, and future of Kubernetes with Eric Brewer

In 2014, the first echos of the word Kubernetes in tech were heard throughout the industry. Back then, the first thing that usually came to mind was “How do you even pronounce it?” Fast forward seven years and it’s become one of the largest open source projects in the world. One of the early stewards of Kubernetes was Google Fellow Eric Brewer. For over a decade, Eric has taken a driver’s seat in advocating for, building, and externalizing technologies at Google. Though he now focuses on a broad set of Google Cloud services—think Kubernetes, serverless, DevOps, Istio, and services—he previously led groundbreaking efforts to separate storage from compute, drive the use of VM live migration at scale, and shape the use of appliances for disaggregation. I had the chance to sit down with him over a series of sessions to learn from his years of experience and dig into the four Kubernetes and open source insights that Eric says have defined the future of cloud computing.

1. Kubernetes became central to cloud native computing because it was open sourced, and we must continue to invest in open source technologies.

When Eric joined the UC Berkeley faculty, he focused on what later became cloud computing – a model based on clusters of commodity servers that use many processes, services, and APIs. Once he came to Google in 2011, he brought this view over to develop a new kind of cloud that centered on a higher level of abstraction. This meshed well with the early prototypes that led to Kubernetes, an open-source system for automating deployment, scaling, and management of containerized applications.

While cloud was still forming in the early 2010s, Eric knew that Google’s internal container-based approach would lead to a more powerful cloud than just VMs and disks. Though it was relatively easy to attract a group of supporters at Google, widespread industry adoption is often slower for novel and unproven ideas. With that foresight, Eric knew right away that open sourcing the project would be the only viable way to achieve the potential he knew Kubernetes held to revolutionize cloud computing.

Of course, he faced some resistance. By 2012, Google Cloud already had App Engine and VMs available. The common question from critics was, “Why do we need a third way to do computing?” Well, Google was already running billions of containers per week prior to the emergence of Kubernetes, and Eric saw massive value in further developing the technology for the rest of the industry. Kubernetes’ automation and flexibility makes it much easier to operate compared to raw VMs or raw disks.

After years of open source support, Kubernetes has become the de facto way to run applications in the cloud, with more and more opinionated and vertically oriented services that run on top of it, like Knative and Kubeflow. The project is still maturing, even as we now face another pivotal shift in cloud computing. Eric is currently spearheading efforts to combine the philosophy that underpins Kubernetes with the strict protection needed by security-sensitive industries. His focus is on open source and software supply chain security, with a goal of creating more opinionated tooling from source code to deployment in order to minimize attack points.

2. As the number of dependencies used in software development grows, the security risks multiply. Investing in software supply chain security is imperative, and a move towards managed services is actually safer than self-managed solutions.

Recent attacks, like those on SolarWinds and CodeCov, have shown that increasing reuse and development velocity across the software industry has created more openings for attacks. Eric is laying the groundwork to address a challenge that he believes should be a P0 for the entire planet.

99% of our vulnerabilities are not in the code you write in your application. They’re in a very deep tree of dependencies, some of which you may know about, some of which you may not know about. Eric Brewer

Because of the growing use of open source software and dependencies in software development, it’s critical for organizations to understand what pieces of software they want to bet on and why. Instead of including unvetted software dependencies in code, organizations must take time to evaluate this software and identify the elements that are either not quite up to par or poorly maintained.

When asked about how Google is investing in Kubernetes (which has several hundred software dependencies), Eric explained that Google Cloud helped form the Cloud Native Computing Foundation (CNCF) in 2015 to serve as the vendor-neutral home for many of the fastest-growing open source projects, including Kubernetes, Prometheus, and Envoy. The foundation’s mission is to make cloud native computing ubiquitous and foster the growth of the ecosystem. Under the auspices of the CNCF, Google has made over 680,000 additional contributions to the project, including over 123,000 contributions in 2020.

Google has a long history of committing to open source. In fact, Google recently committed another $100M to third-party foundations supporting open source security. In addition, Eric helped found the Open Source Security Foundation (OpenSSF), which focuses on open source security tooling and best practices so that those responsible for their organization’s security are able to understand and verify the security of open source dependency chains. Eric sees this work as absolutely essential in order to set a precedent. Though it will require lots of largely mundane work to get open source to be as secure as possible, this work is necessary and requires financial support.

Open source is a public infrastructure also. And like all public infrastructures, it needs maintenance and support. Eric Brewer

As services continue to move to higher levels of abstraction, managed services set a robust foundation for secure software delivery. Managed services allow providers to enable automatic security preventative controls and attestations. GKE Autopilot, for example, provisions and manages the cluster’s underlying infrastructure, including nodes and node pools, giving you an optimized cluster with a hands-off experience. It follows Google Kubernetes Engine

(GKE) best practices and recommendations for cluster and workload set up and security, while also enforcing settings that provide enhanced isolation for your containers. In Eric’s view, this model will continue as a dominant trend moving forward: Providers will manage more features (like security) over time, taking responsibility for features that you don’t want to manage yourself while making the most of the proven protocols and best practices they have built up over years.

3. Platform operators should run GKE as a general purpose platform while imposing guidelines the enterprise cares about.

A common question Eric has gotten over the years is how an enterprise should use a managed Kubernetes platform, like GKE. The first thing to remember is that a cloud provider offers more levers, options, and features to tinker with than you really want your developers to use. These levers, however, give platform owners the ability to create secure and maintainable platforms to power their modern apps. For example, it’s wise to apply backups by default and policies to prevent root file system access or creation of public IPs for backend systems. If you’re processing credit card transactions, you don’t want to give your internal developers free rein; instead you want to give them a platform where the transactions they execute are guaranteed by the structure of services to be compliant with the regulations where you operate.

Think of Kubernetes as the way to build customized platforms that enforce rules your enterprise cares about through controls over project creation, the nodes you use, and libraries and repositories you pull from. Background controls are not typically managed by app developers, rather they provide developers with a governed and secure framework to operate within.

Managed services often provide or support automated policy controls and best practices that platform operators can easily leverage. Anthos Service Mesh, for example, helps control traffic flows and API calls between services. With the ability to automatically and declaratively secure your services, your developers benefit from more productivity, and the organization benefits from the delivery of more features faster. At the same time, you are protected from shipping features that go against company policies or government regulations.

Google Cloud supports buildpacks—an open-source technology that makes it fast and easy for you to create secure, production-ready container images from source code and without a Dockerfile. Artifact Registry lets you set up secure private-build artifact storage on Google Cloud so you can maintain control over who can access, view, or download artifacts. Container Analysis provides vulnerability scanning on images in Artifact Registry and Container Registry.

4. Kubernetes will continue to expand to the edge, leverage coprocessors, and run effectively across public and private clouds.

In our final episode of the series, we collected questions from the field where a few themes emerged, including Kubernetes at the edge, Kubernetes on coprocessors, and finding the right balance between public and private clouds.

Kubernetes at the edge

We’re already seeing the potential of Kubernetes being realized at the edge. For example, Kubernetes is being used at the edge in telecommunications and retail spaces. In response to edge security as a concern, Eric explained that Kubernetes can be effectively secured, but it comes down to the full stack. Security can be strengthened through securing hardware through the root of trust, all the way up the stack running on it.

This is an area Google Cloud continues to invest in. At Next 2021, we announced Google Distributed Cloud, a portfolio of fully managed hardware and software solutions that extends Google Cloud’s infrastructure and services to the edge. It’s enabled by Anthos, which GKE is a major component of, and is ideal for local data processing, edge computing, on-premises modernization, and meeting requirements for sovereignty, strict data security, and privacy. To use Kubernetes at the edge securely, Distributed Cloud provides centralized configuration and control over clusters at Google’s edge network, the operator edge (5G and LTE services offered by our communication service provider partners), or your own edge like retail stores, factory floors, or branch offices.

Kubernetes running on coprocessors

We are also partnering with NVIDIA to deliver GPU-accelerated computing and networking solutions for running Anthos at the edge. This speaks to the potential of coprocessors for Kubernetes. Eric believes that coprocessors are an important part of the computing future. We’re reaching the end of Moore’s Law, and to make up for it, the industry is adopting domain-specific hardware accelerated for use cases like graphics processing (with GPUs) or machine learning (with TPUs).

The right balance between public vs. private clouds

Even with all this rapid innovation, companies still face difficult questions in balancing operating in the public cloud versus private sovereign clouds. Eric lays out clear reasons why a public cloud can offer more advantages:

You’d be better off with an open public cloud pretty much all the time if you can use one, because it will have better cost efficiency. It will have a higher rate of innovation. It can do more things over time. Eric Brewer

That being said, using a public cloud provider means you must trust your cloud provider and the government in which your provider is based (today, this is usually the US). If you don’t trust those or think they are too risky, you may want to run in your own country, on a private, sovereign cloud. The great thing is that Kubernetes is well-suited to run on a private cloud. Anthos (which Eric helped build) lets you run Kubernetes on GKE for hybrid and multicloud environments, and on bare metal. For those worried about vendor lock in, you can move off of Anthos and continue to run your applications on Kubernetes on-premises.

To hear more about Eric’s predictions on the next externalized product from Google Cloud and what the future of cloud computing looks like, check out the videos above. You can stay up-to-date with Eric’s research at Google by following him on Twitter @eric_brewer.

And, you can stay up-to-date with my latest content at @stephr_wong.

Read More for the details.

2021 12 07

GCP – How Vuclip safeguards its cloud environment across 100+ projects with Security Command Center

Entertainment has never been more accessible. As our phones are now an inextricable part of our lives, there’s an increasing appetite for mobile video content, and that is what Vuclip delivers. Vuclip is a leading video-on-demand service for mobile devices with more than 41 million monthly active users across more than 22 countries.

Speed is critical to the viewing experience, and delivering crisp, no-buffer video streaming was one of the reasons we decided to migrate to Google Cloud in 2017. Now we have replaced our monolithic on-prem infrastructure with a microservices-based production environment that’s almost fully on Google Cloud. Most services run on Google Kubernetes Engine, which delivers effortless scalability and quick time to market for new features and updates.

With a huge footprint in the cloud across multiple companies, we’re a big target for attacks, from data breaches to hackers trying to access our systems illegally. We must prepare for these attacks proactively and mitigate them quickly when they happen. That’s why we decided to use Google Cloud’s Security Command Center (SCC) Premium to protect our technology environment across our complex microservices-based architecture.

Increasing security and time-to-market with Security Command Center

Before signing up to SCC Premium, we conducted a proof-of-concept with help from the Google Cloud team to experience its capabilities firsthand. What stood out to us was that SCC wouldn’t just help us mitigate attacks, it would strengthen our entire security apparatus by continuously identifying the weaknesses of our system and giving us recommendations on how to improve it.

In the past, we had quite a traditional security model. Business units were responsible for their own security setup and received support from Group Risk, our company’s internal security audit team, to review developed applications before they could go into production. With SCC, it’s easier for us to detect findings and build the right security configurations into new services as we build them. We can configure policy based on SCC recommendations and act on suggestions quickly unlike earlier when everything was reported back to the Group Risk team for review. This has really reduced our time to market: going into production used to take at least a month, now we can do it in a week.

Centralizing visibility for continuous insights

With SCC Premium, we now streamline many security processes that used to require a lot of manual effort. In the past, we had to conduct regular vulnerability scans of our most critical systems, but with microservices running across more than 100+ projects it was difficult to deliver constant security checks on all of them. With centralized visibility, SCC enables us to monitor all of these projects continuously to discover misconfigurations and threats quickly, while making sure we’re adhering to our compliance standards.

Here’s what it looks like day to day: for every new and existing project, when new services are added to the system our policies require the SRE team to configure SCC into the setup from the beginning. That’s how we can make sure that every surface and every application stack is utilizing the platform to help us detect all alerts and suggestions. We integrate all of these notifications into our Pub/Sub alerting system, giving us centralized visibility over our security posture across multiple projects.

Every misconfiguration revealed with comprehensive alerts

Improved visibility enables us to keep an eye on our systems proactively. Let’s take IP addresses, for example. Whenever we set up a new system, we must configure a new public-facing IP address from the GKE endpoint. When that happens, we get an alert from SCC, informing us that a new public IP address is being set up. Right away, SCC identifies any vulnerabilities or misconfigurations, such as missing firewall rules. Having that constant visibility, as opposed to the spaced-out vulnerability scans from the past, we achieve a continuous level of security that improves our overall posture.

Mitigating threats in ¼ of the time

This comprehensive security posture inadvertently leads to an increased number of alerts from SCC. Not all of them relate to serious attacks that need to be mitigated right away. That’s why we have dedicated team members on a rotating basis, who scroll through the alerts to identify the most pressing threats and decide on further actions. If there’s a problem we need to mitigate, we can do it in about a quarter of the time it used to take without SCC. This is because we no longer have to identify issues and search for solutions ourselves. Instead, the issue is pointed out immediately in the alert.

A great side effect of these detailed alerts and recommendations is that our employees learn more about security-related matters. This experience trains them on how to improve our systems in the future and helps them prepare for more serious attacks.

Strengthening compliance for faster approval

Another area where SCC is helpful is compliance. Our baseline for new and existing services is the CIS Google Cloud Computing Foundations Benchmark, and SCC enables us to meet its requirements more efficiently with targeted suggestions. This facilitates the approval from the Group Risk team before we launch a service, they can see exactly how compliant we are with the CIS standard, further increasing our time-to-market and overall security posture.

Entertaining the world securely with Security Command Center

With SCC Premium, we’ve moved from a traditional security model reliant on intermittent vulnerability scans to a much more agile security strategy with continuous monitoring and centralized visibility and control. We’re excited to explore more of SCC’s features in the future, such as the ability to mute findings, which will help us to disable certain alerts we don’t need to be reminded of.

Our evolution with SCC hasn’t just made Vuclip more secure and compliant, it’s helped us to reduce our time-to-market, delivering our services faster without compromising on security. In a fast-paced media world, that’s exactly what we need to remain the video-on-demand service provider of choice and entertain people around the world.

Read More for the details.

2021 12 07

GCP – Postmortems at Loon: a guiding force for rapid development

Founded by Google SRE alumni, it is no surprise that Loon’s Production Engineering/SRE team instituted a culture of blameless postmortems that became a key feature of Loon’s approach to incident response. Blameless postmortems originated as an aerospace practice in the mid-20th century, so it was particularly fitting that they came full circle to be used at a company that melded cutting edge aerospace work with the development of a communications platform and the world’s first stratospheric temporospatial software defined network. The use of postmortems became a standardizing factor across Loon’s teams— from avionics and manufacturing, to flight operations, to software platforms and network service. This blog post discusses how Loon moved from a heterogeneous approach to postmortems to eventually standardize and share this practice across the organization— a shift that helped the company move from R&D to commercial service in 2020.

Background

Postmortems

Many industries have adopted the use of postmortems— they are fairly common in high-risk fields where mistakes can be fatal or extremely expensive. Postmortems are also widespread in industries and projects where bad processes or assumptions can incur expensive project development costs and avoiding repeat mistakes is a priority. Individual industries and organizations often develop their own postmortem standards or templates so that postmortems are easier to create and digest across teams.

Blameless postmortems likely originated in the healthcare and aerospace industries in the mid-20th century. Because of the high cost of failure, these industries needed to create a culture of transparency and continuous improvement that could only come from openly discussing failure. As the original SRE book states, blameless postmortems are key to “an environment where every ‘mistake’ is seen as an opportunity to strengthen the system.”

The goal of a postmortem is to document an incident or event in order to foster learning from it, both among the affected teams and beyond. The postmortem usually includes a timeline of what happened, the solutions implemented, the incident’s impact, the investigation into root causes, and changes or follow-ups to stop it from happening again. To facilitate learning, SRE’s postmortem format includes both what went well— acknowledging the successes that should be maintained and expanded— and what went poorly and needs to be changed. In this way, postmortem action items are key to prioritizing work that ensures the same failures don’t happen again.

Loon

Loon aimed to supply internet access to unserved and underserved populations around the world by providing connectivity via stratospheric balloons. These high altitude “flying cell towers” covered a much wider footprint than a terrestrial tower, and could be deployed (and repositioned) into the most remote corners of the earth without expensive overland transportation and installation. As the first company to attempt anything like this, Loon dealt with a number of systems that were complex, challenging, or novel: superpressure balloons designed to stay aloft for hundreds of days, wind-dependant steering, a software defined network consisting of constantly moving nodes, and extremes of temperature and weather at 20km above Earth’s surface.

Prod Team

The initial high-risk operations of Loon’s mission were avionic: could we launch and steer balloons carrying a networking payload long enough to reach and serve the targeted region? As such, the earliest failure reports within Loon (which weren’t officially called “postmortems” at the time) mostly involved balloon construction or flight, and drew on the experience of team members who had worked in the Avionics, Reliability Engineering, and/or Flight Safety fields. As Loon’s systems evolved and matured, they started to require operational reliability, as well. Just before graduating from a purely R&D project in Google’s “moonshot factory” incubator X to a company with commercial goals, Loon started building a Site Reliability Engineering (SRE) team known internally as Prod Team.

In order to effectively offer internet connectivity to users, Loon had to solve network serving failures with the same rigor as hardware failures. Prod Team took the lead on a number of practices to improve network reliability. The Prod Team had three primary goals:

Ensure that the fleet’s automation, management, and safety-critical systems were built and operated to meet the high safety bar of the aviation industry.

Lead the integration of the communications services (e.g., LTE) end to end.

Own the mission of fielding and providing a reliable commercial service (Loon Library) in the real world.

Postmortems at Loon

The Early Days

Postmortems were one tool for reaching Prod Team’s (SRE’s) goals. Prod Team often interacted with SREs in other infrastructure support teams that the Loon service connected to, such as the team developing the Evolved Packet Core (EPC), our telco partner counterparts, and teams that handle edge network connectivity. Postmortems provided a common tool for sharing incident information across all these teams, and could even span multiple companies when upstream problems impacted customers.

At Loon, postmortems served the following goals:

Document and transcribe the events, actions, and remedies related to an incident.

Provide a feedback loop to rectify problems.

Indicate where to build better safeguards and alerts.

Break down silos between teams in order to facilitate cross-functional knowledge sharing and accelerate development.

Identify macro themes and blind spots over the longer term.

The combination of aerospace and high tech brought two strong practices of writing postmortems, but also the challenge of how to own, investigate, or follow up on problems that crossed those boundaries, or when it wasn’t clear where the system fault lay.

Loon’s teams across hardware, software, and operations orgs used postmortems, as was standard practice in their fields for incident response. The Flight Operations Team, which handled the day-to-day operations of steering launched balloons, captured in-flight issues in a tracking system. The tracking system was part of the anomaly resolution system devised to identify and resolve root cause problems. Seeking to complement the anomaly resolution system, the Flight Operations Team incorporated the SRE software team’s postmortem format for incidents that needed further investigation— for example, failure to avoid a storm system, deviations from the simulated (expected) flight path that led to an incident, and flight operator actions that directly or indirectly caused an incident. Given that most incidents spanned multiple teams (e.g., when automation failed to catch an incorrect command sent by a flight operator, which resulted in a hardware failure), utilizing a consistent postmortem format across teams simplified collaboration.

The Aviation and Systems Safety Team, which focused on safety related to the flight system and flight process, also brought their own tradition and best practices of postmortems. Their motto, “Own our Safety”, brought a commitment to continually improving safety performance and building a positive safety culture across the company. This was one of the strengths of Loon’s culture: all the organizations were aligned not just on our audacious vision to “connect people everywhere”, but also on doing so safely and effectively. However, because industry standards for postmortems and how to handle different types of problems varied across teams, there was some divergence in process. We proactively encouraged teams to share postmortems between teams, between orgs, and across the company so that anyone could provide feedback and insight into an incident. In that way, anyone at Loon could contribute to a postmortem, see how an incident was handled, and learn about the breadth of challenges that Loon was solving.

Challenges

While everyone agreed that postmortems were an important practice, in a fast moving start-up culture, it was a struggle to comprehensively follow through on action items. This probably comes as no surprise to developers in similar environments— when the platform or services that require investment are rapidly changing or being replaced, it’s hard to spend resources on not repeating the same mistakes. Ideally, we would have prioritized postmortems that focused on best practices and learnings that were applicable to multiple generations of the platform, but those weren’t easy to identify at the time of each incident.

Even though the company was not especially large, the novelty of Loon’s platform and interconnectedness of its operations made determining which team was responsible for writing a postmortem and investigating root causes difficult. For example, a 20 minute service disruption on the ground might be caused by a loss of connectivity from the balloon to the backhaul network, a pointing error with the antennae on the payload, insufficient battery levels, or wind that temporarily blew the balloon out of range. Actual causes could be quite nuanced, and often were attributable to interactions between multiple sub-systems. Thus, we had a chicken-and-egg problem: which team should start the postmortem and investigation, and when should they hand off the postmortem to the teams that likely owned the faulty system or process? Not all teams had a culture of postmortems, so the process could stall depending on the system where the root cause originated. For that reason, Loon’s Prod Team/SREs advocated for a company-wide blameless postmortem culture.

Much of how Loon used postmortems, especially in software development and Prod Team, was in line with SRE industry standards. In the early days of Loon, however, there were no service level objectives or agreements (SLO/As). As Loon was an R&D project, we wrote postmortems when a test network failed to boot after launch, or when performance didn’t meet the team’s predictions, rather than for “service outages”. Later on, when Loon supplied commercial service in disaster relief areas in Peru and Kenya, the Prod Team could more clearly identify the types of user-facing incidents that required postmortems due to failure to meet SLAs.

Improving and Standardizing Loon’s Postmortem Processes

Moving Loon from an R&D model to the model of reliability and safety necessary for a commercial offering required more than simply performing postmortems. Sharing the postmortems openly and widely across Loon was critical to building a culture of continuous improvement and addressing root causes.

To increase cross-team awareness of incidents, in 2019 we instituted a Postmortem Working Group. In addition to reading and discussing recent postmortems from across the company, the goals of the working group were to make it easier to write postmortems, promote the practice of writing postmortems, increase sharing across teams, and discuss the findings of these incidents in order to learn the patterns of failure. Its founding goal was to “Cultivate a postmortem culture in Loon to encourage thoughtful risk taking, to take advantage of mistakes, and to provide structure to support improvement over time.” While the volume of postmortems could ebb and flow across weeks and months, over multiple years of commercial service we expected to be able to identify macro-trends that needed to be addressed with the cooperation of multiple teams.

In addition to the Postmortem Working Group, we also created a postmortem mailing list and a repository of all postmortems, and presented a “Lunch & Learn” on blameless postmortems (see example slide below). Prod Team and several other teams’ meetings had a standing agenda item to review postmortems of interest from across the company, and we sent a semi-annual email celebrating Loon’s “best-of” recent incidents: the most interesting or educational outages.

Once we had a standardized postmortem template in place, we could adopt and reuse it to document commercial service field tests. By recording a timeline and incidents, defining a process and space to determine root causes of problems, recording measurements and metrics, and providing the structure for action item tracking, we brought the benefits of postmortem retrospectives to prospective tasks.

When Loon began commercial trials in countries like Peru and Kenya, we conducted numerous field tests. These tests required engineers from Loon and/or the telco partner to travel to remote locations to measure the strength of the LTE signal on the ground. Prod Team proactively used the postmortem template to document the field tests. It provided a useful format to record the log of test events, results that did and did not match expectations, and links to further investigations into those failures. As a cutting edge project in a highly variable operating environment, using the postmortem template as our default testing template was an acknowledgement that we were in a state of constant and rapid iteration and improvement. These trials took place in early to mid 2020, under the sudden specter of Covid and the subsequent shift towards working from home. The structured communications at the core of Loon’s postmortem structure were particularly helpful as we moved from in-person coordination rooms to WFH.

What Loon Learned from Standardizing Postmortems

Postmortems are widely used in various industries because they are effective. At Loon, we saw that even fast moving startups and R&D projects should invest early in a transparent and blameless postmortem culture. That culture should include a clear process for writing postmortems, clear guidelines for when to conduct a postmortem, and a staffed commitment to follow up on action items.

Meta-reviews across postmortems and outages revealed several trends.

The many points of failure we observed across the range of postmortems were indicative of both the complexity of Loon’s systems and the complexity of some of its supporting infrastructure. Postmortems are equally adept at finding flaky tests and fragile processes vs. hardware failures or satellite network outages. These are complexities familiar to many startups, where postmortems can help manage the tradeoff between making changes safely vs. moving quickly and trying many new things.

Loon was still operating a superhero culture: across a wide range of issues, a small set of experts were repeatedly called upon to fix the system. This dynamic is common in startups, and not meant as a pejorative, but was markedly different from the system maturity that many of Prod Team/SRE were used to. Once we identified this pattern, our plan for commercial service was to staff a 24×7 oncall rotation, complemented by Program Managers driving intention processes to de-risk production

Postmortems provided a space to ask questions like, “What other issues could pop up in this realm?”, which prompted us to solve for the broader case of problems rather than specific problems we’d already seen. This practice also stopped people from brushing off problems in the name of development speed, or from dismissing issues because they “just concerned a prototype”.

Tips and Takeaways

While the specifics of Loon’s journey to standardize postmortems tell the story of one company, we have some tips and takeaways that should be applicable at most organizations.

Tip 1: Adopting a blameless postmortem culture requires everyone to participate

Although the initiative of writing postmortems often originates with a software team, if you want every team to adopt the practice, we suggest trying the following:

Give a talk about postmortems and how and why they could benefit all.

Form a postmortem working group.

Invite people representing different teams to be part of the postmortem working group. They will give insights into what could work better for their respective teams.

Don’t make the postmortem working group responsible for writing the postmortems— this approach doesn’t scale. Reviewing and consulting on postmortems may be in scope of their duties, especially while new teams are adopting this practice.

Tip 2: Define a lightweight postmortem process

Especially during adoption, you want teams to see the benefits of postmortems, not the burden of writing them. Creating a postmortem template with minimum requirements can be helpful.

Tip 3: Define a clear owner for postmortems

Who should write a postmortem and when? For software teams with an oncall rotation, the answer is clear: the person who was oncall during the incident is the owner, and we write postmortems when a service interruption breached SLOs. But when the service has no SLOs, or when a team doesn’t have an oncall rotation, you need defined criteria. Bonus points if the outage involves multiple systems and teams. The following exercises can help in this area:

Reflect on these topics from the point of view of each team, and from the point of view of the interaction between teams.

For each team, define what type of incident(s) should trigger a postmortem.

Within the team, define who should own writing each postmortem. Avoid putting the entire burden on the same person frequently; consider forming a rotation.

Tip 4: Encourage blameless postmortems and make people proud of them

Consider some activities that can help foster the blameless postmortem culture:

Write a report of the best postmortems over a given period and circulate them broadly.

Conduct training on how to write postmortems.

Train managers and encourage them to prioritize postmortems on their teams.

Conclusion

When Loon shut down, addressing all these points was still a work in progress. We don’t have a teachable moment of “this postmortem process will solve your failures”, because postmortems don’t do that. However, we could see where postmortems stopped us from needing to deal with the same failures repeatedly… and where sometimes we did experience repeat incidents because the AIs from the first postmortem weren’t prioritized enough. And so this piece of writing— effectively, a postmortem on Loon’s postmortems—serves up a familiar lesson: postmortems work, but only as well as they are widely accepted and adhered to.

Read More for the details.

2021 12 07

GCP – Sustainability starts locally in Kingston and Sutton with Chrome OS, Acer, Px3, and Citrix

Editor’s note:Today’s post is by David Grasty, the Corporate Head of Digital Transformation at Kingston council in Southwest London. The council chose Chrome OS and Acer Chromebooks to accelerate progress on local sustainability goals.

To fight climate change on a global level, we have to start in our home boroughs. Both the Royal Borough of Kingston-upon-Thames and the London Borough of Sutton declared a climate emergency in 2019, so bringing sustainability to every daily activity is critical. We need partners that understand our goal. Since Google has committed to operating its business on carbon-free energy 24/7 by 2030, Chrome OS, Android Enterprise and Acer Chromebooks are the perfect partners.

Committing to sustainability is very important, but it has to be done in a way that doesn’t hinder service to our communities. We’ll adopt low-energy devices and resource-saving productivity apps that help keep employees productive.

The devices we chose had to meet our objective to give employees an “any device, anywhere, anytime” experience. They also had to reduce our carbon footprint within our borough offices—for example, reducing energy consumption and sparing us costly and wasteful IT upgrades. We’ve also made a commitment to close one of our data centers by late 2022, and we knew that shifting to the cloud would play a large part in the sustainability program. Our 5,000 council employees were already using Google Workspace apps, and accessing our borough applications through Citrix Virtual Apps and Desktop.

Sustainability and usability

As my colleague Jason Sam-Fat, the borough’s Digital and IT Commercial Manager, points out, we had to be practical as we studied which devices would contribute to both sustainability and usability. Fortunately, we had the numbers to show the expected sustainability improvements of our top choice, Acer Chromebook Spin 513 with LTE. Our partner Px3 produced detailed findings about the link between the devices’ energy savings.

“We received clear metrics from actual energy usage,” Jason says. “That way, we have a level of assurance and confidence that we can validate our sustainability objectives.”

The Acer CP5-417 Chromebook devices met our sustainability objectives. The latest LTE-enabled Acer Chromebook Spin 513 version that we’ve aligned to further supports our objectives while adding flexibility for employees.

“The Acer Chromebooks are 46 percent more energy efficient than the alternatives in the market,” Jason explains, citing findings from sustainable IT consultancy Px3, which partnered with Kingston and Sutton. “And they have 14 hours of battery life.” That’s something employees really enjoy, since they don’t have to search for electric outlets when they’re in the middle of projects.

The sustainability reporting from Px3 benchmarked a 32 percent reduction in energy with the move to Citrix and Acer Chromebooks—building on previous Px3 research about Chrome OS and energy savings. If you combine this with the resulting reduced levels of commuting by employees, you would need 3,700 acres of mature forest, roughly one and a half times the size of London’s Richmond Park, to remove the equivalent amount of pollution from the atmosphere.

The combination of Citrix and Acer Chromebooks will help us make steady progress toward sustainability for years to come. Our previous devices used a lot more computing power than we needed, given our growing reliance on cloud apps. With Chrome OS devices, we can better take advantage of moving the actual computing tasks to the cloud, which is a less energy-intensive way of working.

Chrome OS inspires future sustainability ideas

We’re pleased that we can make progress toward sustainability goals while we also free up employees to travel around the boroughs as needed, Acer Chromebooks in hand. We gave Chromebooks to people who generally do desk work but like the flexibility to work at home. For employees who meet with residents in the community, Acer Chromebooks also allow them to undertake tasks that they need to do there and then, rather than taking notes and then coming back to an office to update records. All of our staff can access their council systems on their own phones using Android Enterprise with work profiles, which keeps their work data separate from their personal data. With work profiles, our IT team can still manage the work data and keep it secure.

The pandemic brought home the flexibility of our Chromebooks and digital architecture. In March 2020, once the decision was made to switch to remote work, we had 90 percent of employees working from home the next day.

“If we hadn’t had the Chromebooks in place, our remote-work situation would have looked very, very different,” says Steve O’Connor, our Chief Digital Information Officer. “Because we seamlessly moved so many people to remote work literally overnight, it meant that meetings scheduled for the next day still took place.”

As Jason explains, seeing the remote-work experience play out has helped us think about other ways to improve sustainability, such as reducing employee commutes. “Having the right technology, like the Acer Chromebooks, is crucial to our entire journey,” Jason says. “It’s helped us make better decisions around IT, and it will definitely help to define how we can better incorporate sustainability. It’s about continuing to build on what we’ve got, and improving it.”

Read More for the details.

2021 12 06

AWS – Amazon S3 File Gateway now supports NFS file share auditing

Post Content

Read More for the details.

2021 12 06

AWS – Amazon S3 File Gateway enables administrators to force the closing of locked files

Amazon S3 File Gateway now enables you to force-close locked files on SMB file shares on Amazon S3 File Gateway by providing access to local security groups. Amazon S3 File Gateway provides on-premises applications with file-based, cached access to virtually unlimited cloud storage using SMB and NFS protocols. End users and applications using files on SMB shares, may stop working on those files without closing them. This leaves the files in an open, or locked, state. Until now, gateway administrators did not have permissions to close these files.

Read More for the details.

2021 12 06

Azure – Microsoft Defender for Cloud: Public preview updates for November 2021

Public preview enhancements and updates released for Microsoft Defender for Cloud in November 2021.

Read More for the details.

2021 12 06

Azure – Microsoft Defender for Cloud: General availability updates for November 2021

New enhancements and updates released for general availability (GA) in Microsoft Defender for Cloud in November 2021.

Read More for the details.

2021 12 06

AWS – Amazon Aurora R6g instances, powered by AWS Graviton2 processors, are now available in Europe (Milan), Europe (Paris), and Europe (Stockholm) Regions

AWS Graviton2-based R6g database instances are now available in Europe (Milan), Europe (Paris), and Europe (Stockholm) regions for Amazon Aurora MySQL-Compatible Edition and Amazon Aurora PostgreSQL-Compatible Edition.

Read More for the details.

2021 12 06

AWS – AWS AppSync now supports custom domain names for AppSync GraphQL endpoints

Today, we are releasing a new feature in AWS AppSync that allows customers to use custom domain names with their AWS AppSync GraphQL APIs.

Read More for the details.

2021 12 06

GCP – AI for all humans: A course to delight and inspire!

Part 1 — Introduction to ML

Making Friends with Machine Learning was a legendary internal-only Google course specially created to inspire beginners and amuse experts. Today, it is available to everyone! You can now enjoy it by following these links:

Part 1 — Introduction to ML: bit.ly/mfml_part1

Part 2 — Life of a Machine Learning Project: bit.ly/mfml_part2

Part 3 — AI from Prototype and Production: bit.ly/mfml_part3

Part 4 — Opening the Black Box: bit.ly/mfml_part4

About the course

The course is designed to give you the tools you need for effective participation in machine learning for solving business problems and for being a good citizen in an increasingly AI-fueled world. MFML is perfect for all humans; it focuses on conceptual understanding (rather than the mathematical and programming details) and guides you through the ideas that form the basis of successful approaches to machine learning. It has something for everyone!

After completing this course, you will:

gain an intuitive and correct understanding of core machine learning concepts.

understand the flavor of several popular machine learning methods.

avoid common errors in machine learning.

know how machine learning can help your endeavors.

gain insight into the steps involved in leading machine learning projects from conception to launch and beyond.

Chapter by chapter

Part 2 — Life of a Machine Learning Project

Part 3 — AI from Prototype and Production

Part 4 — Opening the Black Box

While the first 3 parts focused on giving you the concepts and roadmaps to lead a successful applied machine learning project, Part 4 indulges your curiosity about what’s going on under the hood. The final chapter covers the intuition behind:

Clustering and k-Means

Lazy learning and k-NN

Perceptron

Maximal Margin Classifier

Support Vector Classifier

Support Vector Machines

Decision Trees

Boosted Aggregation

Random Forests

Ensemble Models

Naive Bayes

Linear Regression

Logistic Regression

Neural Networks / Deep Learning

You’ll gain insights into a whole bunch of algorithms… all without having to study the equations. To be fair, those equations are what’s already in all the textbooks, so the goal of this course was to give you something you can’t get elsewhere. It’s all about intuition and conceptual understanding. Luckily, after you’ve absorbed the intuition, those equations will make much more sense if/when you do choose to study them.

Read More for the details.

2021 12 06

AWS – AWS WAF adds support for CloudWatch Log and logging directly to S3 bucket

You can now send AWS WAF logs directly to a CloudWatch Logs log group or to an Amazon S3 bucket. With this launch, we’re adding two new optional destinations for WAF logs in addition to Amazon Kinesis Data Firehose, which was already supported. When you use CloudWatch Logs as your WAF log destination, you can search and analyze WAF logs directly in the WAF console using CloudWatch Logs Insights. Using CloudWatch Logs Insights, you can view individual logs, compile aggregated reports, create visualizations, and construct dashboards.

Read More for the details.

2021 12 06

AWS – AWS Systems Manager Fleet Manager now offers console based viewing and management of instance processes