Logs or Metrics - A Conceptual Decision

Maintaining a cloud production environment is not an easy task. Just ask Amazon, WhatsApp, or Waze:

There are endless suggestions, best practices, and tips on how to keep production environments stable and prevent service outages. But let’s face it, there will always be problems that must be detected early, handled correctly and speedily, and learned from for the future.

To achieve these objectives, production environments must be monitored closely and every event must be recorded and studied.

I bet the first thing that came to mind when you read the last sentence was, “Wow, that’s a lot of data!” The infrastructure of even a basic cloud-based application consists of multiple possible points of failure—potentially involving services, containers, UIs, and integrations.

Figure 1—Cloud application architecture*

Source:Dustin’s Blog

Why Metrics?

As explained above and shown in Figure 1, even a simple cloud-based application relies on several components that are all deployed in an environment that DevOps teams find very hard to control. If one of these components fails to function as expected, the whole application might be in jeopardy.

Metrics help measure component functionality and define thresholds for attention-required usage. Metrics give DevOps engineers the ability to assess service value over time and provide a continuous view of the whole environment. There is an infinite number of metrics that can be used to evaluate an application, so it is important to specify the business-critical functionality and build the metrics plan accordingly.

Basic metrics such as transaction throughput and response time are applicable for all applications, while clicks-per-second or new users per month are used for more sophisticated use cases. Metrics are not only relevant for the code, but can also be applied to the containers hosting the services. Metrics such as tasks/consumption/memory and network throughput help DevOps teams to understand the velocity and efficiency of a system and determine the level of readiness for traffic spikes or continuous load.

For serverless applications, metrics are absolutely crucial—container startup time, response time, and average container execution time reflect the application usage and the platform’s ability to satisfy the application’s needs.

Metrics are relatively easy to implement, but once in place, they can pose a scaling challenge as the data and required infrastructure grow. There are, however, several tools that can monitor cloud services, and the information gathered is used for the metrics. When the services load requires scaling, these monitoring tools know to collect the same data for the new instances, so the metrics automatically contain the new data and require zero manual intervention.

Using metrics has its disadvantages. To obtain data for each metric, an event must be generated for each occurrence of the activities being measured. Designing and implementing these events is an extra task in every development assignment, and the service overhead — including memory usage and service uptime — should also be taken into account.

Also, as implied above, metrics are easy to create and store, so inexperienced teams might make the mistake of creating too many and may not be able to choose the metrics relevant to them. Metrics are good for identifying trends, relating application behavior to groups of events, and foreseeing system deficiencies — an action that helps to avoid customer-facing issues, particularly around performance.

Why Logs?

Metrics are critical to have an overview of how cloud-deployed software behaves over time and informs decisions on the improvement of deployment and maintenance processes.

But many developers find metrics to be insufficient and sometimes not even useful. While metrics show the tendencies and propensities of a service or an application, logs focus on specific events. The purpose of logs is to preserve as much information—mostly technical—as possible on a specific occurrence. The information in logs can be used to investigate incidents and to help with root-cause analysis of the faults or defects but also for a growing amount of additional use cases.

Another aspect where metrics differ from logs is that logs can be unique in each R&D team (application logs for example), and are structured according to either the needs of the incident investigation team or the system that collects and analyzes them. Logs attend to some other aspects of monitoring—identifying security breach attempts and misuse of the application’s functionality, and maintaining records for legal compliance needs.

But logs aren’t easy to use either. They require bigger storage and have more complicated processing procedures than metrics. Implemented incorrectly, they contain a large amount of unusable data concealing the pieces of information actually required for the analysis process.

When logs look like this, it is not clear where the error is, when it happened, what caused it, and how to understand its origins:

20170330 19-13-01.654 LicenseManager - check license mode20170330 19-13-01.738 TrayIconManager - IconManager - init20170330 19-13-01.738 TrayIconManager - No icon UI mode20170330 19-13-01.745 TrayIconManager - No icon UI mode20170330 19-13-01.768 TrayIconManager - No icon UI mode20170330 19-13-01.800 ProcessWatchDog - starting to watch process: 5716 on platform: win3220170330 19-13-02.843 DirectChannel - Connect called for direct channel client of tunnel LWE-PMR20170330 19-13-02.845 DirectChannel - Connect called for direct channel client of tunnel LWE-PMR20170330 19-13-02.848 DirectChannel - init direct channel client for tunnel LWE-PMR20170330 19-13-02.850 Engine.ChannelManager - onListening: listening to: SDK20170330 19-13-02.850 ERROR LightWeight.Dispatcher - onListening: no listening event from { target: 'SDK' }20170330 19-13-02.850 DirectChannelListener - DirectChannelListener.clientConnected : Client has connect20170330 19-13-02.850 Engine.ChannelManager - onConnect: got connection from PackageManager with id: 120170330 19-13-02.850 LightWeight.Dispatcher - onConnect: Got connection from { target: 'PackageManager_1' }20170330 19-13-02.850 PackagesManager.ChannelManager - onConnect: got connection from lwe with id: undefined20170330 19-13-02.850 Dispatcher - onConnect: Got connection from { target: ’?????’}20170330 19-13-02.850 Dispatcher - connected: { target: ’?????’}20170330 19-13-02.850 Dispatcher - onConnect: Got connection to the parent dispatcher going to send registration message20170330 19-13-02.850 LightWeight.Dispatcher - registerDispatcher: Got registration from PackageManager_120170330 19-13-02.850 ERROR SessionManager - packageManagerConnected: Failed connection to { target: 'PackageManager_1' }20170330 19-13-02.850 LightWeight.Dispatcher - registerDispatcher:

Log output should be planned and tested like any other application functionality, so that when push comes to shove the necessary information is available, clear, and useful. In order to be effective, logs should meet specific standards, such as displaying human-readable language and date time in a clear format, highlighting errors and having context for each record.

What About Tracing?

Tracing is another way to keep track of the environment status, allowing developer-level logging.

When logs are configured to trace level, all communications, events and data are recorded, creating many different types of records. Most of these are not localized, meaning they are not readable, some might even expose sensitive information.

Several approaches hold the view that tracing is the right way to log all activities in the ecosystem, but only when done right. If tracing is not following a clear set of rules, the immense amount of data in the logs obscures important data and requires a deeper examination to collect relevant information. Incorrect implementation of tracing can also affect the performance of the system, as a huge amount of data is being registered in the logs, and every action is being documented thoroughly.

Tracing is recommended only for power users with a genuine need for the low-level data.

So, What Method Should I Use?

As explained above, metrics and logs address two different needs of cloud applications and are both critical to the business.

Metrics can be used to monitor performance, recognize events of importance, and facilitate prediction of future lapses. Logs are usually used for troubleshooting issues, but also for analyzing user behavior, application metrics and a growing variety of additional use cases.

Metrics help with pointing out points of improvement for processes and allowing a birdseye view of the application. Logs are especially useful when they become practical—if the application is facing many functional problems and constantly requiring deep examination.

The good news is that DevOps teams do not necessarily need to choose one method over the other. Logs and metrics can be used in tandem. The bad news is that mastering both monitoring methods requires handling a huge amount of data and the ability to filter out the insubstantial information and focus on what is meaningful and relevant for application maintenance.

There are several tools designed to solve these problems, overseeing the monitoring process and extracting the significant data. These tools implement different mechanisms for collecting, analyzing and displaying the data in a manner that help to investigate problems and understanding consequences.

FAQs

What are logs and metrics? ›

While metrics show the tendencies and propensities of a service or an application, logs focus on specific events. The purpose of logs is to preserve as much information—mostly technical—as possible on a specific occurrence.

Read On ›

What is the use of Logz io? ›

Logz.io is a scalable, end-to-end cloud monitoring service that combines the best open-source tools with a fully managed SaaS platform. It provides unified log, metric, and trace collection with AI/ML-enhanced features for improved troubleshooting, faster response times, and cost management.

Discover More Details ›

What does Logz do? ›

Logz. io's Cloud-Native Observability Platform centralizes log, metric, and tracing analytics in one place, so you can monitor the health and performance of your Azure environment.

What is the difference between logs, metrics, and traces? ›

Logs chronicle events, providing a detailed narrative. Metrics quantify system health, offering performance insights. Traces identify bottlenecks and system component relationships, facilitating issue diagnosis. Together, they form a comprehensive framework for observability.

See Details ›

What are the three types of metrics? ›

' There are three types of metrics that an organization should collect. These are –Technology metrics, process metrics, and service metrics.

Find Out More ›

Is Logzio free? ›

Logz.io offers a free plan, the Community with 1 day of log retention, 1 GB log limit, 10 alerts, and ML-powered analytics. Their pricing depends on two variables.

Tell Me More ›

Is Logz IO open source? ›

Logz.io is based on open source. Our architecture relies on a variety of projects that enable us to offer a robust, reliable and scalable log analysis solution.

Show Me More ›

Who is the CEO of Logz? ›

Tomer Levy - CEO, Co-Founder @ Logz.io - Crunchbase Person Profile.

Explore More ›

How to solve log z? ›

log(z)=log(|z|)+iarg(z), where −π<arg(z)≤π (principal branch).

Is Logz analytic? ›

Answer: The function Log(z) is analytic except when z is a negative real number or 0.

Show Me More ›

What is the real part of log z? ›

The real part of log(z) is the natural logarithm of |z|. Its graph is thus obtained by rotating the graph of ln(x) around the z-axis.

Read The Full Story ›

What are log metrics? ›

Log-based metrics can extract data from logs to create metrics of the following types: Counter: these metrics count the number of log entries that match a specified filter within a specific period. Use counters when you want to keep track of the number of times a value or string appears in your logs.

See Details ›

What are the four pillars of observability? ›

When it comes to understanding data observability, one must understand the four key pillars that comprise the concept, which are: metrics, metadata, lineage, and logs. Here we describe each pillar and the importance of each when it comes to mitigating data uncertainty.

Get More Info Here ›

What is the best description of the difference between logs and metrics? ›

While logs are about a specific event, metrics are a measurement at a point in time for the system. This unit of measure can have the value, timestamp, and identifier of what that value applies to (like a source or a tag).

What are logs in simple terms? ›

Logarithms are the inverse of exponents. A logarithm (or log) is the mathematical expression used to answer the question: How many times must one “base” number be multiplied by itself to get some other particular number? For instance, how many times must a base of 10 be multiplied by itself to get 1,000?

What are logs and their meaning? ›

1. : a usually bulky piece or length of a cut or fallen tree. especially : a length of a tree trunk ready for sawing and over six feet (1.8 meters) long.

View Details ›

What are logs in data? ›

Log data is the records of all the events occurring in a system, in an application, or on a network device. When logging is enabled, logs are automatically generated by the system and timestamped. Log data gives detailed information, such as who was part of the event, when it occurred, where, and how.

What is this logs? ›

Log files are the primary data source for network observability. A log file is a computer-generated data file that contains information about usage patterns, activities and operations within an operating system, application, server or another device.

Learn More ›

Logs or Metrics - A Conceptual Decision | Logz.io (2024)

Why Metrics?

Why Logs?

What About Tracing?

So, What Method Should I Use?

FAQs

What are logs and metrics? ›

What is the real part of log z? ›