Observability, a new buzzword - What concepts hides behind it?
The last couple of years there's been emerging a new buzzword around logging and metrics called "observability". This stems from the microservice architecture that has become more and more popular. One of the biggest issues with microservices is that the network and services are widely distributed - which makes it hard to see where/how/why when an error/issue occurs. Because of this there's naturally been some technological advances trying to address this issue, one of those advances has been within "distributed tracing".
A few people might think of "stack tracing" when hearing the word "tracing" and that's actually not far away from what distributed tracing is. Distributed tracing let's you see where at what line for example an error occurs, but also where the requests starts, what services the request "visits", where it ends, including what errors/response code is returned to the user (if any).
The "observability" stack has therefore become logging, metrics and tracing. Using metrics and combing it with tracing allows what is called "telemetry" and instrumentation - which let's you see a lot more than just where requests go. Telemetry allows you to see your topology stack, do APM (application performance monitoring), resource provisioning, early issue detection, automated infrastructure diagrams and other visualization tools for security and services. And that's just to mention a few things.
Sentry and other commercial tools
Most have heard of, or even tried Sentry. I believe Sentry originally started as a front end error reporting tool (i.e how do we actually know what errors a user gets on the front end?), but it has actually become one of the first tracing tools. Sentry has some basic "tracing" mechanics (what endpoint was called, how was the data handled etc.) which is configured by simply dropping in an SDK. Actually that's a concept most observability tools follow, you simply drop them into your project and they automatically collect what is required. Since the introduction of Sentry, there's a lot of new commercial projects popping up which is similar. Doing observability tools has become one of the new "cool" tech. Pretty much every cloud provider has developed their own proprietary solution, either as it's own standalone tool or built-in through their dashboards.
A few commercial tools are:
The open source community
Meanwhile there's been a bit of a delay in the open source community, in 2022, they've made some huge progress. I can remember I looked at those tools just a year ago, and they have gone from babies to fully matured systems.
To mention the strides of the open source community:
OpenTelemetry - "is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior."
- OpenTelemetry is an open standard or "collection of tools" which has made a lot of progress the last 12 months, and with support for a lot of client libraries. OpenTelemetry is already in use by both a lot of commercial and open source projects.
- Apache Skywalking, a fully fledged observability platform with support for a lot of languages and frameworks. Used by Alibaba cloud and at the moment a lot of other Chinese companies. Skywalking is under the Apache Foundation and you can find Skywalking on Github.
- OpsTrace, another mature observability platform written in Go. This was just recently aquired by Gitlab and is now the system Gitlab uses for their observability needs. OpsTrace on Gitlab.
- Also Grafana added more support for tracing this year, here's their article: https://grafana.com/traces/
- Jaeger - Platform for distributed tracing, "Monitor and troubleshoot transactions in complex distributed systems".
Some more reading material:
Here's a descriptive article from Google on what observability is:
Devops measurement: Monitoring and observability