Compute

10 ways Google Cloud IaaS stands out

October 5, 2021

https://storage.googleapis.com/gweb-cloudblog-publish/images/compute.max-2600x2600.jpg

Sanjay Jacob

Product Management, Google

Gagandeep Singh

Principal Architect, Google

When you choose to run your business on Google Cloud you benefit from the same planet-scale infrastructure that powers Google’s products such as Maps, YouTube, and Workspace.

We have picked 10 ways in which Google Cloud Infrastructure services outshine alternatives in the market in how they simplify your operations, save money, and secure your data.

1. Custom Machine Types means no wasted resources

Compute Engine offers predefined machine types that you can use when you create a VM instance. A predefined machine type has a preset number of vCPUs and a preset amount of memory; each type is billed at a set price as described on the Compute Engine pricing page.

If predefined machine types don't meet your needs, you can create a VM instance with a custom number of vCPUs and custom amount of memory, effectively building a custom machine type. Custom machine types are available only for general-purpose machine families. When you create a custom machine type, you are deploying a custom machine type from the E2, N2, N2D, or N1 machine family on GCP. No other leading cloud vendor offers custom machine types so extensively.

Custom machine types are a good idea for workloads that aren't a good fit for the predefined machine types and for workloads that require more processing power or memory but don't need all of the upgrades provided by the next machine type level. This translates into lower operating costs. They are also useful for controlling software licensing costs that are based on the number of underlying compute cores.

Jeremy Lloyd, Infrastructure and Application Modernization Lead at Appsbroker, a Google partner:

“Custom machine types coupled with Google’s StratoZone data center discovery tool provides Appsbroker with the flexibility we need to provide cost efficient virtual machines matched to a virtual machine’s actual utilization. As a result, we are able to keep our customers’ operating costs low while still providing the ability to scale as needed.”

2. Compute Engine Virtual Machines are optimized for scale-out workloads

For scale-out workloads, T2D, the first instance type in the Tau VM family, is based on 3rd Gen AMD EPYC processors and leapfrogs VMs for scale-out workloads of any leading public cloud provider today, both in terms of performance and price-performance. Tau VMs offer 56% higher absolute performance and 42% higher price-performance compared to general-purpose VMs from any leading public cloud vendor (source). The x86 compatibility provided by these AMD EPYC processor-based VMs gives you market-leading performance improvements and cost savings, without having to port your applications to a new processor architecture. Sign up here if you are interested in trying out T2D instances in Preview.

For SAP HANA, Google Cloud has demonstrated with SAP how we can run the world’s largest scale-out HANA system in the public cloud (96TB). With such innovation, you are covered as your business grows exponentially.

3. Largest single node GPU-enabled VM

Google is the only public cloud provider to offer up to 16 NVIDIA A100 GPUs in a single VM, making it possible to train very large AI models. Users can start with one NVIDIA A100 GPU and scale to 16 GPUs without configuring multiple VMs for single-node ML training, without crossing the VM layer.

Additionally, customers can choose smaller GPU configurations—1, 2, 4 and 8 GPUs per VM—providing the flexibility to scale their workload as needed.

The A2 VM family was designed to meet today’s most demanding applications—workloads like CUDA-enabled machine learning (ML) training and inference, for example. This family is built on the A100 GPU which offers up to 20x the compute performance compared to the previous generation GPU and comes with 40 GB of high-performance HBM2 GPU memory. To speed up multi-GPU workloads, the A2 VMs use NVIDIA’s HGX A100 systems to offer high-speed NVLink GPU-to-GPU bandwidth that delivers up to 600 GB/s. A2 VMs come with up to 96 Intel Cascade Lake vCPUs, optional Local SSD for workloads requiring faster data feeds into the GPUs and up to 100 Gbps of networking. A2 VMs provide full vNUMA transparency into the architecture of underlying GPU server platforms, enabling advanced performance tuning. Google Cloud offers these GPUs globally.

4. Non-disruptive maintenance means you worry less about planned downtime

Compute Engine offers live migration (non-disruptive maintenance) to keep your virtual machine instances running even when a host system event, such as a software or hardware update, occurs. Google’s Compute Engine live migrates your running instances to another host in the same zone without requiring your VMs to be rebooted. Live migration enables Google to perform maintenance that is integral to keeping infrastructure protected and reliable without interrupting any of your VMs. When a VM is scheduled to be live-migrated, Google provides a notification to the guest that a migration is imminent.

Live migration keeps your instances running during:

Regular infrastructure maintenance and upgrades
Network and power grid maintenance in the data centers
Failed hardware such as memory, CPU, network interface cards, disks, power, and so on. This is done on a best-effort basis; if a hardware component fails completely or otherwise prevents live migration, the VM crashes and restarts automatically and a hostError is logged.
Host OS and BIOS upgrades
Security-related updates
System configuration changes, including changing the size of the host root partition, for storage of the host image and packages

Live migration does not change any attributes or properties of the VM itself. The live migration process transfers a running VM from one host machine to another host machine within the same zone. All VM properties and attributes remain unchanged, including internal and external IP addresses, instance metadata, block storage data and volumes, OS and application state, network settings, network connections, and so on. This has the benefit of reducing operational and maintenance overhead, helps you build a more robust security posture where infrastructure can be consciously revamped from a known good state and minimizes risks for advanced persistent threats.

Refer to Lessons learned from a year of using live migration in production on Google Cloud from the Google engineering team.

5. Trusted Computing: Shielded VMs guard you against advanced, persistent attacks

Establishing trust in your environment is multifaceted, involving hardware and firmware, as well as host and guest operating systems. Unfortunately, threats like boot malware or firmware rootkits can stay undetected for a long time, and an infected virtual machine can continue to boot in a compromised state even after you’ve installed legitimate software.

Shielded VMs can help you protect your system from attack vectors like:

Malicious guest OS firmware, including malicious UEFI extensions
Boot and kernel vulnerabilities in the guest OS
Malicious insiders within your organization

To guard against these kinds of advanced persistent attacks, Shielded VMs use:

Unified Extensible Firmware Interface (UEFI) BIOS: Helps ensure that firmware is signed and verified
Secure and Measured Boot: Helps ensure that a VM boots an expected, healthy kernel
Virtual Trusted Platform Module (vTPM): Establishes root-of-trust, underpins Measured Boot, and prevents exfiltration of vTPM-sealed secrets
Integrity Monitoring: Provides tamper-evident logging, integrated with Stackdriver, to help you quickly identify and remediate changes to a known integrity state

The Google approach allows customers to deploy Shielded VMs with only a simple click, thereby easing implementation.

6. Confidential Computing encrypts data while in use

Google Cloud was a founding member of the Confidential Computing Consortium. Along with encryption of data in transit and at rest using customer-managed encryption keys (CMEK) and customer-supplied encryption keys (CSEK), Confidential VM adds a "third pillar" to the end-to-end encryption story by encrypting data while in use. Confidential Computing uses processor-based technology that allows data to be encrypted in use while it is being processed in the public cloud. Confidential VM allows you to to encrypt memory in use on a Google Compute Engine VM by checking a single checkbox.

All Confidential VMs support the previously mentioned Shielded VM features under the covers—you can think of Shielded VM as helping to address VM integrity, while Confidential VM addresses the memory encryption aspect which relies on CPU features. With the confidential execution environments provided by Confidential VM and AMD Secure Encrypted Virtualization (SEV), Google Cloud keeps customers' sensitive code and other data encrypted in memory during processing. Google does not have access to the encryption keys. In addition, Confidential VM can help alleviate concerns about risk related to either dependency on Google infrastructure or Google insiders' access to customer data in the clear.

See what Google Cloud partners say about Confidential Computing here.

7. Advanced networking delivers full-stack networking and security services with fast, consistent, and scalable performance

Google Cloud’s network delivers low latency, reduces operational costs and ensures business continuity, enabling organizations to seamlessly scale up or down in any region to meet business needs. Our planet-scale network uses advanced software-defined networking and security with edge caching services to deliver fast, consistent, and scalable performance. With 28 regions, 85 zones, and 146 PoPs connected by 16 subsea fiber cables around the world, Google Cloud’s network offers a full stack of layer 1 to layer 7 services for enterprises to run their workloads anywhere. Enterprises can be assured that they have best-in-class networking and security services connecting their VMs, containers, and bare metal resources in hybrid and multi-cloud environments with simplicity, visibility, and control.

Google Cloud’s network has protected customers from one of the world’s largest DDoS attacks at 2.54 Tbps. With our multi-layer security architecture and products such as Cloud Armor, our customers ran their business with no disruptions. Furthermore, our recent integration of Cloud Armor with reCAPTCHA Enterprise adds best-in-class bot and fraud management to prevent volumetric attacks. Cloud Armor is deployed with our Cloud Load Balancer and Cloud CDN, extending the secure benefits at the network edge for traffic coming into Google Cloud so customers have security, performance, and reliability all built in. Furthermore, we are excited to offer Cloud IDS in preview, which was co-developed with security industry leader, Palo Alto Networks, to run natively in Google Cloud.

Our advanced networking capabilities also extends to GKE and Anthos networking. With the GKE Gateway controller, customers can manage internal and external HTTPS load balancing for a GKE cluster or a fleet of GKE clusters with multi-tenancy while maintaining centralized admin policy and control. Unlike other Kubernetes offerings, we offer eBPF dataplane which brings powerful tooling such as Kubernetes network policy and logging to GKE. eBPF is known to kernel engineers as a “superpower” for its unique architecture to load and unload modules in kernel space, and now this capability is built in with Google Cloud networking.

For observability and monitoring, our customers deploy Network Intelligence Center, Google Cloud’s comprehensive network monitoring, verification and optimization platform. With four key modules in Network Intelligence Center, and several more to come, we are working towards realizing our vision of proactive network operations that can predict and heal network failures, driven by AI/ML recommendations and remediation. Network Intelligence Center provides unmatched visibility into your network in the cloud along with proactive network verification. Centralized monitoring cuts down troubleshooting time and effort, increases network security and improves the overall user experience.

8. Regional Persistent Disk for High Availability

Regional Persistent Disk is a storage option that provides synchronous replication of data between two zones in a region. Regional Persistent Disks can be a great building block if you need to ensure high availability of your critical applications as they offer cost-effective durable storage and replication of data between two zones in the same region.

Regional Persistent Disks are also easy to set up within the Google Cloud Console. If you are designing robust systems or high availability services on Compute Engine, Regional Persistent Disks combined with other best practices such as backing up your data using snapshots enable you to build an infrastructure that is highly available and recoverable in a disaster. Regional Persistent Disks are also designed to work with regional managed instance groups. In the unlikely event of a zonal outage, Regional Persistent Disks allow continued I/O through failover of your workloads to another zone. Regional Persistent Disks can help meet zero RPO and near-zero RTO requirements and other stringent SLAs that your critical applications might require by maximizing application availability and protection of data during events such as host/VM failures and zonal outages.

9. Cloud Storage’s single namespace for dual-region and multi-region means managing regional replication is incredibly simple

Similar to how Persistent Disk makes data more available by replicating data across zones, Cloud Storage provides similar benefits for object storage. Cloud Storage within a region is cross-zone by definition, reducing the risk that a zonal outage would take down your application. Cloud Storage adds to this by also providing a cross-region option that can protect against a regional outage and gets your data closer to distributed users. This comes in the form of Dual-region or Multi-region settings for a bucket. These are the simplest to implement cross-region replication offerings in the industry—just a simple button or API call to enable them. In addition to being simple to implement, they offer an added advantage of using a single bucket name that spans regions.

This is unique in the industry. Competitive offerings currently require setting up and managing two distinct buckets, one in each region and they don’t offer the strong consistency properties Cloud Storage offers across regions. Operations and app development are burdened by this design. Google’s single namespace approach dramatically simplifies application development (the app runs on single region or dual/multi-region without any changes), and provides simpler application restarts and testing for DR.

10. Predictive autoscaling

Customers use predictive autoscaling to improve response times for applications with long initialization times or for applications with workloads that vary predictably with daily or weekly cycles. When you enable predictive autoscaling, Compute Engine forecasts future load based on your Managed Instance Group’s history and scales out the MIG’s in advance of predicted load, so that new instances are ready to serve when the load arrives. Without predictive autoscaling, an autoscaler can only scale a group reactively, based on observed changes in load in real time.

With predictive autoscaling enabled, the autoscaler works with real-time data as well as with historical data to cover both the current and forecasted load. Forecasts are refreshed every few minutes (faster than competing clouds) and consider daily and weekly seasonality, leading to more accurate forecasts of load patterns.

For more information, see How predictive autoscaling works and Checking if predictive autoscaling is suitable for your workload.

These are just a few examples of customer-centric innovation that set Google Cloud infrastructure apart. Bring your applications and let the platform work for you.

Get started by learning about your options for migration, or talk to our sales team to join the thousands of customers who have embarked upon this journey.

Acknowledgement

Special thanks to Dheeraj Konidena (Google) for contributing to this article.

Posted in