Jump to Content
HPC

Accelerating MPI applications using Google Virtual NIC (gVNIC)

October 20, 2021
Jian Yang

Software Engineer

Pavan Kumar

Product Manager

Try Google Cloud

Start building on Google Cloud with $300 in free credits and 20+ always free products.

Free trial

At Google, we are constantly improving the performance of our network infrastructure. We recently introduced Google Virtual NIC (gVNIC), a virtual network interface designed specifically for Compute Engine. gVNIC is an alternative to the VirtIO-based Ethernet driver. It is tightly integrated with our high performance, flexible Andromeda virtual network stack and is required to enable high network bandwidth configurations (50-100 Gbps).

Using gVNIC improves communication performance by more efficiently delivering traffic among your VM instances. This improvement is valuable for high performance computing (HPC) users because MPI communication performance is critical for application scalability of workloads such as weather modeling, computational fluid dynamics, and computer aided engineering. 

To simplify using gVNIC for HPC workloads, our CentOS 7 based HPC VM image now supports gVNIC and includes the latest gve driver (gve-1.2.3) by default. Continue reading for more details on gVNIC performance or skip ahead to our quickstart guide to get started today!

Performance results using HPC benchmarks

We compared the performance of gVNIC and VirtIO-Net across the Intel MPI Benchmarks and several application benchmarks, including finite element analysis (ANSYS LS-DYNA), computational fluid dynamics (ANSYS Fluent) and weather modeling (WRF). 

Intel MPI Benchmark (IMB) PingPong

IMB PingPong measures the average one-way time to send a fixed-sized message between two MPI ranks over a pair of VMs. We can see from the results below that gVNIC provides lower latency on medium and large message sizes (e.g., 2kB to 4MB). For these messages, gVNIC improves latency by 26%, on average, compared to VirtIO.

Benchmark setup:

Results

https://storage.googleapis.com/gweb-cloudblog-publish/images/intel_mpi_benchmark.max-1200x1200.jpg

OSU Micro Benchmark (OMB) Multiple Bandwidth

OMB Multiple Bandwidth measures the aggregate uni-directional bandwidth between multiple pairs of processes across VMs. For this benchmark, we used 30 processes per node (PPN=30) on each of 2 VMs. gVNIC is required for 100 Gbps networking. This benchmark demonstrates that gVNIC with 100G networking unlocks 57% higher throughput, on average, compared to VirtIO. 

Benchmark setup:

Results

https://storage.googleapis.com/gweb-cloudblog-publish/images/omb_multiple_bandwidth.max-1200x1200.jpg

HPC application benchmarks: WRF, ANSYS LS-DYNA, ANSYS Fluent

The latency and bandwidth benchmark gains using gVNIC translate into shorter runtimes for HPC application benchmarks. Using gVNIC with HPC VM image yields 51% performance improvement to the WRFv3 CONUS 12KM benchmark when running on 720 MPI ranks across 24 Intel Xeon processor-based C2 instances. With ANSYS Fluent and LS-DYNA, we observed a performance improvement of 13% and 11%, respectively, using gVNIC, in comparison with Virtio-Net. 

Benchmark setup

  • ANSYS LS-DYNA (“3-cars” model): 8 c2-standard-60 VMs with compact placement policy, using the LS-DYNA MPP binary compiled with AVX-2 

  • ANSYS Fluent (“aircraft_wing_14m” model): 16 c2-standard-60 VMs with compact placement policy

  • WRF V3 Parallel Benchmark (12 KM CONUS): 24 c2-standard-60 VMs with compact placement policy

  • HPC VM image: hpc-centos-7-v20210925 

Results

https://storage.googleapis.com/gweb-cloudblog-publish/images/hpc_application_benchmarks.max-1200x1200.jpg

Get started today!

Starting today, you can use the latest HPC VM image with gVNIC support via Google Cloud Marketplace or the gcloud command-line tool. Check out our quickstart guide for details on creating instances using gVNIC and the HPC VM image.


Special thanks to Jiuxing Liu, Tanner Love, Mansoor Alicherry and Pallavi Phene for their contributions.

Posted in