Deciding what to log is one of the most challenging aspects of applicationdevelopment since it's difficult to foresee which pieces of information willprove critical during troubleshooting.

Many developers resort to logging everything, generating a tremendous amount oflog data, which can be cumbersome to manage and expensive to store and process.

To maximize the effectiveness of your logging efforts and prevent excessivelogging, it's crucial to follow well-established logging best practices.

These guidelines are designed not only to improve the quality of your log databut also to minimize the impact of logging on system performance.

By implementing the following logging strategies, you'll ensure that your logsare both informative and manageable, leading to quicker issue resolution andlower costs!

1. Do establish clear objectives for your logging

To prevent noisy logs that don't add any value, it's crucial to define theobjectives of your logging strategy. Ask yourself: what are the overarchingbusiness or operational goals? What function is your application designed toperform?

Once you've pinpointed these objectives, you can determine the key performanceindicators (KPIs) that will help you track your advancement towards these goals.

With a clear understanding of your aims and KPIs, you'll be in a better positionto make informed decisions about which events to log and which ones are bestleft to track through other means (such as metrics and traces) instead of tryingto do everything through your logs

It's hard to get it this stuff right from the get-go, so you'll want to err onthe side of over logging and establish a regular review process to assess andadjust your log levels to balance noise, identifying and rectifying overlyverbose logs or missing metrics.

Take error logging, for instance: the objective is not just to record errors,but to enable their resolution. You should log the error details and the eventsleading up to the error, providing a narrative that helps diagnose theunderlying issues.

The fix is usually straightforward once you know what's broken and why.

2. Do use log levels correctly

Log levels are the most basic signal for indicating the severity of the eventbeing logged. They let you distinguish routine events from those that requirefurther scrutiny.

Here's a summary of common levels and how they're typically used:

  • INFO: Significant and noteworthy business events.
  • WARN: Abnormal situations that may indicate future problems.
  • ERROR: Unrecoverable errors that affect a specific operation.
  • FATAL: Unrecoverable errors that affect the entire program.

Other common levels like TRACE and DEBUG aren't really about event severitybut the level of detail that the application should produce.

Most production environments typically default to INFO to prevent noisy logsbut the log will often not have enough detail to troubleshoot some kinds ofproblems. You must plan to log at an increased verbosity temporarily whileinvestigating issues.

Modifying log verbosity is typically done through static config files orenvironmental variables but a more agile solution involves implementing amechanism to adjust log levels on the fly. Thiscan be done at the host level, for specific clients, or service-wide.

Some logging frameworks also provide the flexibility to alter log levels forspecific components or modules within an application rather than globally. Thisapproach allows for more granular control and minimizes unnecessary log outputeven when logging at the DEBUG or TRACE level.

Remember to adjust the log level once you're done troubleshooting.

Learn more: Log Levels Explained and How to Use Them

3. Do structure your logs

Historical logging practices were oriented toward creating logs that arereadable by humans, often resulting in entries like these:


[2023-11-03 08:45:33,123] ERROR: Database connection failed: Timeout exceeded.Nov 3 08:45:10 myserver kernel: USB device 3-2: new high-speed USB device number 4 using ehci_hcdERROR: relation "custome" does not exist at character 15

These types of logs lack a uniform format that machines can parse efficiently,which can hinder automated analysis and extend the time needed for diagnosingissues.

To streamline this process, consider the following steps:

Firstly, adopt a logging framework that allows you to login a structured format like JSON if your language does not provide suchcapabilities in its standard library.

Secondly, configure your application dependencies to output structured datawhere possible. For example, PostgreSQL produces plaintext logs by default butas of version 15, it can be configured toemit logs in JSON format.

Thirdly, you can use log shippers to parse andtransform unstructured logs into structured formats before they are shipped tolong-term storage.

As an example, consider Nginx errorlogs. At the time ofwriting, they don't support native structuring but with a tool likeVector, you can convert an unstructured error log from this:

Copied! - alice [01/Apr/2021:12:02:31 +0000] "POST /not-found HTTP/1.1" 404 153 "http://localhost/somewhere" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36" "2.75"

To structured JSON like this:


{ "agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36", "client": "", "compression": "2.75", "referer": "http://localhost/somewhere", "request": "POST /not-found HTTP/1.1", "size": 153, "status": 404, "timestamp": "2021-04-01T12:02:31Z", "user": "alice"}

With your logs in a structured format, it becomes significantly easier to set upcustom parsing rules for monitoring, alerting, and visualization using logmanagement tools like Better Stack.

For those times in development when you prefer to read logs that are easier onthe eyes, you can use tools designed to colorize and prettify logs, such ashumanlog, or check if your frameworkoffers built-in solutions for log beautification.

4. Do write meaningful log entries

The utility of logs is directly tied to the quality of the information theycontain. Entries filled with irrelevant or unclear information will inevitablybe ignored, undermining the entire purpose of logging.

Approach log message creation with consideration for the reader, who might beyour future self. Write clear and informative messages that precisely documentthe event being captured.

Including ample contextual fields within each log entry helps you understand thecontext in which the record was captured, and lets you link related entries tosee the bigger picture. It also lets you quickly identify issues when a customerreaches out with problems.

Essential details can include:

  • Request or correlation IDs
  • User IDs
  • Database table names and metadata
  • Stack traces for errors

Here's an example of a log entry without sufficient context:


{ "timestamp": "2023-11-06T14:52:43.123Z", "level": "INFO", "message": "Login attempt failed"}

And here's one with just enough details to piece together who performed theaction, why the failure occurred, and other meaningful contextual data.


{ "timestamp": "2023-11-06T14:52:43.123Z", "level": "INFO", "message": "Login attempt failed due to incorrect password", "user_id": "12345", "source_ip": "", "attempt_num": 3, "request_id": "xyz-request-456", "service": "user-authentication", "device_info": "iPhone 12; iOS 16.1", "location": "New York, NY"}

Do explore the Open Web Application Security Project’s (OWASP) compilation ofrecommended event attributesfor additional insights into enriching your log entries.

Learn more: Log Formatting Best Practices

5. Do sample your logs

For systems that generate voluminous amounts of data, reaching into hundreds ofgigabytes or terabytes per day, log sampling is an invaluable cost-controlstrategy that involves selectively capturing a subset of logs that arerepresentative of the whole, allowing the remainder to be safely omitted withoutaffecting analysis needs.

This targeted retention significantly lowers the demands on log storage andprocessing, yielding a far more cost-effective logging process.

A basic log sampling approach is capturing a predetermined proportion of logs atset intervals. For instance, with a sampling rate of 20%, out of 10 occurrencesof an identical event within one second, only two would be recorded, and therest discarded.


func main() { log := zerolog.New(os.Stdout). With(). Timestamp(). Logger().

Sample(&zerolog.BasicSampler{N: 5})

for i := 1; i <= 10; i++ { log.Info().Msg("a log message: %d", i) }}

For more nuanced control, advanced sampling methods can be employed, such asadjusting sampling rates based on the content within the logs, varying ratesaccording to the severity of log levels or selectively bypassing sampling forcertain categories of logs.

For example, if your application is experiencing a run of errors it might startwriting out several identical log entries with large stack traces which could bequite expensive especially for a high traffic service. To guard against this,you can apply sampling to drop the majority of the logs without compromisingyour ability to debug the issue.

Sampling is most efficiently implemented directly within the application,provided that the logging framework accommodates such a feature. Alternatively,the sampling process can be incorporated into your logging pipeline when thelogs are aggregated and centralized.

It's crucial to introduce log sampling in your logging process sooner ratherthan later before costs become an issue.

6. Do employ canonical log lines per request

A canonical log line is a single,comprehensive log entry that is created at the end of each request to yourservice. This record is designed to be a condensed summary that includes all theessential information about the request, making it easier to understand whathappened without needing to piece together information from multiple logentries.

This way, when you troubleshoot a failed request, you only have a single logentry to look at. This entry will have all the necessary details, including therequest's input parameters, the caller's identity and authentication method, thenumber of database queries made, timing information, rate limit count, and anyother data you see fit to add.

Here's an example in JSON format:


{ "http_verb": "POST", "path": "/user/login", "source_ip": "", "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36", "request_id": "req_98765", "response_status": 500, "error_id": "ERR500", "error_message": "Internal Server Error", "oauth_application": "AuthApp_123", "oauth_scope": "read", "user_id": "user_789", "service_name": "AuthService", "git_revision": "7f8ff286cda761c340719191e218fb22f3d0a72", "request_duration_ms": 320, "database_time_ms": 120, "rate_limit_remaining": 99, "rate_limit_total": 100}

7. Do aggregate and centralize your logs

Most modern applications are composed of various services dispersed acrossnumerous servers and cloud environments, with each one contributing to amassive, multifaceted stream of log data.

In such systems, aggregating and centralizing logs is not just a necessity but astrategic approach to gaining holistic insight into your application'sperformance and health.

By funneling all logs into a centralized log management system, you'll create asingular, searchable source of truth that simplifies monitoring, analysis, anddebugging efforts across your entire infrastructure.

Implementing a robust log aggregation and management system allows you tocorrelate events across services, and accelerate incident response by enablingquicker root cause analysis, and ensure regulatory compliance in data handlingand retention while reducing storage and infrastructure costs by consolidatingmultiple logging systems.

With the right tools and strategies, proper log management turns a deluge of logdata into actionable insights, promoting a more resilient and performantapplication ecosystem.

Learn more: What is Log Aggregation? Getting Started and BestPractices

8. Do configure a retention policy

When aggregating and centralizing your logs, a crucial cost-controlling measureis configuring a retention policy.

Log management platforms often set their pricing structures based on the volumeof log data ingested and its retention period.

Without periodically expiring or archiving your logs, costs can quickly spiral,especially when dealing with hundreds of gigabytes or terabytes of data. Tomitigate this, establish a retention policy that aligns with your organizationalneeds and regulatory requirements.

This policy should specify how long logs must be kept active for immediateanalysis and at what point they can be compressed and moved to long-term,cost-effective storage solutions or purged entirely.

You can apply different policies to different categories of logs. The mostimportant thing is to consider the value of the logs over time and ensure thatyour policy reflects the balance between accessibility, compliance, and cost.

Remember also to set up an appropriate logrotation strategy tokeep log file sizes in check on your application hosts.

9. Do protect logs with access control and encryption

Certain logs, such as database logs, tend to contain some degree of sensitiveinformation. Therefore, you must take steps to protect and secure the collecteddata to ensure that it can only be accessed by personnel who genuinely need touse it (such as for debugging problems).

Some measures include encrypting the logs at rest and in transit using strongalgorithms and keys so that the data is unreadable by unauthorized parties, evenif it is intercepted or compromised.

Choose a compliant log management provider that provides access control andaudit logging so that only authorized personnel can access sensitive log data,and all interactions with the logs are tracked for security and compliancepurposes.

Additionally, you should verify the provider's practices regarding the handling,storage, access, and disposal of your log data when it is no longer needed.

10. Don't log overly sensitive information

The mishandling of sensitive information in logs can have severe repercussions,as exemplified by the incidents atTwitterandGitHubin 2018.

Twitter inadvertently stored plaintext passwords in internal logs, leading to amassive password reset initiative. GitHub also encountered a less extensive butsimilar issue where user passwords were exposed in internal logs.

Although there was no indication of exploitation or unauthorized access in thesecases, they underscore the critical importance of ensuring sensitive informationis never logged.

A practical approach to preventing the accidental inclusion of sensitive data inyour logs is to hide sensitive information at the application level such thateven if an object containing sensitive fields is logged, the confidentialinformation is either omitted or anonymized.

For instance, in Go's Slog package, this is achievable byimplementing the LogValuer interface to control which struct fields areincluded in logs:


package mainimport ( "log/slog" "os")type User struct { ID string `json:"id"` FirstName string `json:"first_name"` LastName string `json:"last_name"` Email string `json:"email"` Password string `json:"password"`}

func (u *User) LogValue() slog.Value {

return slog.StringValue(u.ID)


func main() { handler := slog.NewJSONHandler(os.Stdout, nil) logger := slog.New(handler) u := &User{ ID: "user-12234", FirstName: "Jan", LastName: "Doe", Email: "jan@example.com", Password: "pass-12334", } logger.Info("info", "user", u)}

Implementing the LogValuer interface above prevents all the fields of theUser struct from being logged. Instead, only the ID field is logged:



It's a good practice to always implement such interfaces for any custom objectsyou create. Even if an object doesn't contain sensitive fields today, they maybe introduced in the future, resulting in leaks if it ends up being loggedsomewhere.

Redacting sensitive data can also be done outside the application through yourlogging pipeline to address cases that slip through initial filters.

You can catch a broader variety of patterns and establish a unified redactionstrategy for all your applications, even if they're developed in differentprogramming languages.

The main disadvantage here is that you will face a performance penalty sincepattern matching can be pretty expensive, especially when done through regularexpressions.

Learn more: Best Logging Practices for Safeguarding SensitiveData

11. Don't ignore the performance cost of logging

It's important to recognize that logging always incurs a performance cost onyour application. This cost can be exacerbated by excessive logging, using aninefficient framework, or maintaining a suboptimal pipeline.

To illustrate, let's consider a basic Go application server:


package mainimport ( "fmt" "log" "net/http")func main() { http.HandleFunc("/login", func(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(w, "Login successful") }) fmt.Println("Starting server at port 8080") log.Fatal(http.ListenAndServe(":8080", nil))}

This server, when tested without logging, handles around 192k requests persecond on my machine:


wrk -t 1 -c 10 -d 10s --latency http://localhost:8080/login


. . .Requests/sec: 191534.19Transfer/sec: 23.75MB

However, introducing logging with Logrus,a popular Go logging library leads to a 20% performance drop:


func main() { l := logrus.New() l.Out = io.Discard l.Level = logrus.InfoLevel l.SetFormatter(&logrus.JSONFormatter{}) http.HandleFunc("/login", func(w http.ResponseWriter, r *http.Request) { l.WithFields(logrus.Fields{ "user_id": 42, "event": "login", "ip_address": "", }).Info("User login event recorded") fmt.Fprintf(w, "Login successful") }) . . .}


. . .Requests/sec: 152572.85Transfer/sec: 18.92MB

In contrast, adopting the newly introduced Slog package resultsin a much more modest performance reduction of about 3%:


func main() { l := slog.New(slog.NewJSONHandler(io.Discard, nil)) http.HandleFunc("/login", func(w http.ResponseWriter, r *http.Request) { l.Info( "User login event recorded", slog.Int("user_id", 42), slog.String("event", "login"), slog.String("ip_address", ""), ) fmt.Fprintf(w, "Login successful") }) . . .}


. . .Requests/sec: 187070.78Transfer/sec: 23.19MB

These examples highlight the importance of choosing efficient logging tools tobalance between logging needs and maintaining optimal application performance.

Another thing you should do is conduct load tests on your services at maximumsustained load to verify their ability to manage log offloading during peaktraffic and avoid disk overflow.

You can also mitigate performance issues through other techniques likesampling, offloading serialization and flushing to aseparate thread, or logging to a separate partition. Regardless of yourapproach, proper testing and resource monitoring is crucial.

12. Don't rely on logs for monitoring

While logs are crucial for observing and troubleshooting system behavior, theyshouldn't be used for monitoring. Since they only capture predefined events anderrors, they aren't suitable for trend analysis or anomaly detection.

Metrics, on the other hand, excel in areas where logs fall short. They provide acontinuous and efficient stream of data regarding various application behaviorsand help define thresholds that necessitate intervention or further scrutiny.

They can help you answer questions like:

  • What is the request rate of my service?
  • What is the error rate of my service?
  • What is the latency of my service?

Metrics allow for the tracking of these parameters over time, presenting adynamic and evolving picture of system behavior, health, and performance.

Their inherent structure and lightweight nature make them ideal for aggregationand real-time analysis. This quality is crucial for creating dashboards thathelp identify trends and patterns in system behavior.

Final thoughts

Starting with these 12 logging practices is a solid step towards betterapplication logs. However, continual monitoring and periodic reviews areessential to ensure that your logs continue to fulfill ever-evolving businessneeds.

Thanks for reading, and happy logging!

