7 Open Source Cloud-Native Tools For Observability and Analysis – Container Journal

Long Live Containerization!
In 2021, ‘observability’ is close to gaining buzzword status. This is perhaps because, for years, monitoring wasn’t as standardized in software development. Tracing was given less forethought, and applications produced logs in varying formats and styles. Without unifying layers to analyze a growing number of services, this led to a chaotic mess of jumbled application analysis.
Now, with cloud-native technology, engineers aren’t trying to repeat these mistakes from the past. Also, with increased user expectations and digital innovations demands, there is now more focus on maintaining overall stability, performance, and availability. This has given rise to the growth of observability and analysis tools. These open source projects are making logs more actionable, tracing events with detailed metadata, and exposing valuable metrics from Kubernetes environments. Such insights can inform business metrics, help pinpoint bugs and spur quick recovery measures. For these reasons, deep observabilty across the cloud-native application stack is a must.
So, below we’ll explore six well-established CNCF projects related to observability, telemetry and analysis. Many of these projects help collect and manage observability data such as metrics, logs and traces.
The popular monitoring system and time series database
GitHub | Website
Prometheus is the most popular graduated CNCF project related to observability and likely needs no introduction, as many engineers are already familiar with it. Large companies such as Amadeus, Soundcloud, Ericsson and others already use Prometheus to power their monitoring and alerting systems.
Prometheus has built-in service discovery and functions by collecting data via a pull model over HTTP. It then stores metrics organized as time-series key-value pairs. These metrics can be customized to the application at hand and set to trigger alerts — for example; an e-commerce site may need to identify slow load times to stay competitive. Prometheus has great querying abilities; the PromQL query language can be used to search data and generate visualizations.
A Prometheus environment is comprised of the main Prometheus server, client libraries, a push gateway, special-purpose exporters, an alert manager and various support tools. To get started, developers can review the getting started guide here.
Open source, end-to-end distributed tracing
GitHub | Website
With the move toward distributed systems, the process of debugging, networking and supporting observability for many components has become exponentially more challenging. Jaeger is one project that aims to solve this dilemma; it’s designed to “monitor and troubleshoot transactions in complex distributed systems.” According to the documentation, its features are as follows:
Jaeger works by implementing various APIs for retrieving data. This data follows the OpenTracing Standard, which organizes traces into spans; each span details granular details like the operation name, a start timestamp, a finish timestamp and other metadata. Jaeger backend modules can export Prometheus metrics, and logs are structured using zap, a logging library.
A unified logging layer
GitHub | Website
Fluentd is a logging layer designed to be decoupled from backend systems. The philosophy is that a Unified Logging Layer can rid the chaos of incompatible logging formats and disparate logging routines.
Fluentd can track events from many sources, such as web apps, mobile apps, NGINX logs and others. Fluentd centralizes these logs and can also port them to external systems and database solutions, like Elasticsearch, MongoDB, Hadoop and others. To enable this, Fluentd sports over 500 plugins. Using Fluentd could be helpful if you need to send out alerts in response to certain logs or enable asynchronous, scalable logging for user events.
To get started with Fluentd for logging, one can download it here for any operating system or find it on Docker. Once installed, Fluentd offers a graphical UI to configure and manage it.
Highly available Prometheus setup with long-term storage capabilities
GitHub | Website
For those that want to get more out of Prometheus, Thanos is an option. It’s framed as an available metric system with unlimited storage capacity that can be placed on top of existing Prometheus deployments. Using Thanos to obtain a global view of metrics could be helpful for organizations that use multiple Prometheus servers and clusters. Thanos also enables extensions to your own storage of choice, making data retention theoretically limitless. As Thanos is designed to work with larger amounts of data, it incorporates downsampling to speed up queries.
Horizontally scalable, highly available, multi-tenant, long-term Prometheus.
Cortex is another CNCF project designed to work with multiple Prometheus setups. Using Cortex, teams can collect metrics from various Prometheus servers and perform globally aggregated queries on all the data. Availability is a plus with Cortex, as it can replicate itself and run on multiple machines. Like Thanos, Cortex provides long-term storage capabilities, with integrations for S3, GCS, Swift and Microsoft Azure.
According to the documentation, “Cortex is primarily used as a remote write destination for Prometheus, with a Prometheus-compatible query API.” To begin working with Cortex, check out the getting started guide here.
An observability framework for cloud-native software.
GitHub | Website
OpenTelemetry is a project built to collect telemetry data, such as metrics, logs and traces, from various sources to integrate with many types of analysis tools. The package supports integrations with popular frameworks such as Spring, ASP.NET Core, Express and Quarkus, making it easy to add observability mechanics to a project. Of note is that OpenTracing and OpenCensus recently merged to form OpenTelemetry, making this one powerhouse of an open source telemetry solution.
In today’s digital age, metrics are the lifeblood of a business. Having a holistic assortment of application performance data and end-user actions information is vital for analysis. But that’s not the only end goal — quality filtering and navigation for such data are just as crucial for turning stale metadata into actionable insights.
Above, we’ve covered some of the most adopted CNCF projects related to observability, monitoring, and analysis. But these aren’t the only options available — there is a lot more exciting development occurring within CNCF-hosted projects and the surrounding ecosystem.
At the time of writing, CNCF hosts the following projects in sandbox status. As you can see, these emerging projects involve more active monitoring, such as via chaos engineering and Kubernetes health checks, as well as deeper Kubernetes-first observability.
Bill Doerrfeld is a tech journalist and analyst. His beat is cloud technologies, specifically the web API economy. He began researching APIs as an Associate Editor at ProgrammableWeb, and since 2015 has been the Editor at Nordic APIs, a high-impact blog on API strategy for providers. He loves discovering new trends, interviewing key contributors, and researching new technology. He also gets out into the world to speak occasionally.
Bill Doerrfeld has 44 posts and counting. See all posts by Bill Doerrfeld
document.getElementById( “ak_js” ).setAttribute( “value”, ( new Date() ).getTime() );
Cloud migration and infrastructure modernization are two of the most critical investment areas in digital transformation for IT decision-makers today. For many IT leaders, the issue is no longer about why to leverage the cloud. Instead, it is all about correctly migrating and modernizing their infrastructure to rapidly deploy digital applications, optimize operations with the […] The post Journey to the Cloud: Cloud Migration and Modernization Success & Survival appeared first on DevOps.com. […]
Proven time and again, automating bad processes just helps you do the wrong thing faster! Enter Value Stream Management and the new focus of Release Managers as the guides to improving DevOps capabilities and automation masters in driving efficiency through the software lifecycle. The post Release Management Has Progressed, Have You? appeared first on DevOps.com. […]
Facilitated by the growth of cloud computing and virtualization, infrastructure-as-code (IaC) has revamped the way that organizations deploy and maintain their IT infrastructure. But IaC initiatives need to be implemented with security and access controls in mind. The post Best Practices for Secure Infrastructure-as-Code Initiatives appeared first on DevOps.com. […]
Reaching true hybrid cloud at speed requires a level of automation and integration that’s a challenge for many organizations. Our container is pre-enabled with key features. The post Accelerate Your Journey to Hybrid Cloud and Containerization appeared first on DevOps.com. […]
Open source and third-party component use is growing. The number of “hands,” pieces and parts that contribute to the life of an application is also increasing. Anything that goes into your code and anyone contributing to it is a link in the supply chain. Your software supply chain risk is inherited from your dependencies. Lack […] The post What’s on the Horizon for the Software Supply Chain? appeared first on DevOps.com. […]
If you’re the sort of person who likes attacking people with ransomware, COVID-19 has been a wonderful opportunity. Attack surfaces went up by a factor of 100 while would-be victims scrambled to protect themselves. But if you’re in the crosshairs of these attacks, you want faster ransomware recovery solutions and security strategies that adapt to.. The post Modern Ransomware: How We Got Here and Where We’re Going appeared first on Security Boulevard. […]
Many executives, managers and individual contributors in the federal government and large enterprise spaces may feel overwhelmed with the rate of change in the IT landscape today. From the cloud to DevOps to DevSecOps, it can feel like every day there is something new that you “should be doing” to keep pace with your competitors.. The post Helping People Adopt DevSecOps in the Federal Government (and the Enterprise) appeared first on Security Boulevard. […]
APIs are the fabric of modern service delivery and compartmentalized app development, but the application logic and sensitive data that APIs expose has made them a high-value target of threat actors. Prioritizing API security within modern application development and multi-cloud operations is paramount to mitigate threats and data exposure among the myriad of internal and.. The post API Security, Privacy and Governance: Shift Left DevOps and DevSecOps? appeared first on Security Boulevard. […]
The practice of hard-coding secrets like passwords, tokens, and API keys is skyrocketing as applications increasingly leverage dependencies that require integration and Infrastructure-as-Code that must authenticate services. Yet, hardcoded secrets have been at the heart of numerous security incidents because they expose access to valuable resources and enable attackers to rapidly “peel the onion.” Furthermore,.. The post Fixing Hardcoded Secrets the Developer-Friendly Way appeared first on Security Boulevard. […]
As companies become data-driven, they need to make their data accessible to developers, DevOps teams and data engineers to facilitate better applications, compress release cycles and improve productivity. However, managing access to databases is, arguably, still stuck in the stone age. It’s difficult, if not impossible, to integrate with tools like Okta, Auth0 and AD.. The post How to Give Everyone Access to Your Data and Still Keep it Safe appeared first on Security Boulevard. […]