Jump to Content
Cloud Operations

The Ops Agent is now GA and it leverages OpenTelemetry

July 20, 2021
https://storage.googleapis.com/gweb-cloudblog-publish/images/Google_Blog_CloudMigration_D.max-2600x2600.jpg
Rahul Harpalani

Product Manager

Joe Lynch

Engineering Manager

Running and troubleshooting production services requires deep visibility into your applications and infrastructure. While basic logs and metrics are available out of the box with Google Cloud Compute Engine (GCE), capturing advanced data used to require the installation of both a metrics agent and a logging agent. Today, we’re happy to announce the General Availability of the new Ops Agent, which replaces both the Logging and Monitoring agents and simplifies installation, management, and configuration across the board. This combined agent approach serves to break down boundaries between Cloud Logging and Cloud Monitoring as well as between operating systems.

What’s new with the Ops Agent?

  • It is now our recommended agent and will ultimately replace our legacy agents.

  • It is largely backwards compatible.

  • It has dramatically higher throughput logging, which will help you avoid OutOfMemory errors and prevent data loss.

  • It features simple YAML-based configuration for both logs and metrics, bringing greater consistency between logging and monitoring tools and a consistent feature set across Linux distros & Windows.

  • The combined approach means just one agent to download, install, and maintain instead of two.

  • It is open source and can leverage the advancements of the rapidly blossoming OpenTelemetry community.

 Learn more about the Ops Agent in our documentation.

Built on OpenTelemetry 

As organizations expand into cloud, hybrid cloud, and multi-cloud,  operators and developers have to juggle too many agents (including our own), too many protocols, and too many ways to capture metrics, logs, and traces. We wanted to simplify the story for you, so we built the Ops Agent on OpenTelemetry. This CNCF supported, open source, vendor-neutral technology is at the forefront of unifying operations. It has garnered the support of many vendors in the operations industry and the Ops Agent emphasizes Google Cloud’s commitment to openness. The OpenTelemetry community is guiding  telemetry to a place that is optimized for the user, and we are proud of our involvement and contributions to the project.

Higher throughput and improved resource efficiency with Fluent Bit

As our customers continued to build larger and more complex services on Google Cloud, we heard the feedback that you need a VM logging agent that can support higher throughput. This allows you to avoid data loss and OutofMemory errors.

To do this we turned to Fluent Bit, an open source log processor and forwarder, and a great compliment to OpenTelemetry. In internal tests, the new Ops Agent has supported 15x higher throughput than our legacy Logging agent. Plus, this was achieved with better efficiency for the underlying VM resources, so running the agent on even the smallest VMs is far more economical.

Automatic integration with Cloud Logging and Cloud Monitoring

As detailed in a previous post about the deep integration between Google Cloud infrastructure and our observability tools, the Ops Agent is pre-integrated with the observability tools available in the Google Cloud console. After you have installed the agent in your VM, host metrics, process metrics and logs will automatically be routed to Cloud Logging and Cloud Monitoring without any action needed on your part.
https://storage.googleapis.com/gweb-cloudblog-publish/images/Dashboard_examples.max-900x900.png

Some examples of out of the box VM dashboards that do not require user setup (source)

Installation options

Administrators, developers and IT managers alike spend enough time learning new tools. Therefore, if your organization already uses the configuration management/automation capabilities of the open source tool Ansible, we’ve made sure you can use it to install the Ops Agent. Using the Ansible Role, you can install and configure the agent across your fleet of Linux and Windows VMs. For more information, refer to the Ansible Role for Cloud Ops documentation. In addition to Ansible, Puppet support is coming by the end of this month and Chef will be supported within the next quarter!

If you are already using Terraform, the open-source provisioning management/infrastructure as code tool, you can use the Terraform module to install and configure the Ops Agent on your VMs. For more information, refer to the Terraform Agent Policy documentation.

For those who prefer a managed solution, we provide a mechanism to automatically manage the installation of the Ops Agent called Agent Policies, which is currently in preview. With as little as one command, you can create a policy that governs new and existing VMs to ensure proper installation and optional auto-upgrade of the Ops Agent on VMs that meet your specified criteria.

Get started today

We hope that you find the improved throughput, resource efficiency, and consolidated feature set of the Ops Agent useful. Please be sure to check out our blog page, as well as our release notes page for updates, as we are bringing new features to the Ops Agent constantly. If you have any specific questions about the Ops Agent, please join the discussion on our Google Cloud Community, Cloud Operations page.
Posted in