Log files could be a great place to find the answer on what went wrong in our applications. They should contain all the necessary information to understand the state of the application before and while the error happened. Instead most log files are giant collections of data without much structure and even less meaning. What is unfortunate in one single application is a problem as soon as multiple applications need to work together.
Every developer has its own idea how a log message should be structured, what is important and how it should be represented. Trying to combine log files from multiple applications is therefore a lot of work and the chances to miss the important insight is high.
This post is part of the Improve Your Log Messages series. You can find the other parts here:
- Part 1: The Missed Opportunities of Log Files
- Part 2: Structured Logging with Serilog
- Part 3: RavenDB as a Sink for Serilog
- Part 4: Seq as a Sink for Serilog
- Part 5: How to Influence the Output of Serilog
- Part 6: Monitor your Application with Seq
- Part 7: Debugging Serilog
- Part 8: Elasticsearch as a Sink for Serilog
- Part 9: Monitor your Applications with Kibana
- Part 10: Closing the Feedback Loop from Log Messages to Knowledge
- Part 11: How To Analyse IIS Log Files
- Part 12: Using Logstash to Analyse IIS Log Files with Kibana
- Part 13: Analysing Apache Log Files with Logstash and Kibana
- Part 14: How to Analyse SharePoint Log Files
Everything goes
On top of all that there are tremendous differences in what is logged and how readable they are. Log files tend to be more understandable when the developer used them to figure out what his application does. In this case a log message may look like this one:
1 2 3 4 5 6 7 |
20:46:29 INFO ataraxis.gui.AtaraxisStarter - ##### Entering splash screen at: Tue Jul 29 20:46:29 CEST 2014 ##### 20:46:30 DEBUG ataraxis.crypt.AESKeyCreator - createSecretKey() - start 20:46:30 DEBUG ataraxis.crypt.AESKeyCreator - Size of AES-Key: 256 20:46:31 DEBUG ataraxis.crypt.AESKeyCreator - createSecretKey() - end 20:46:31 DEBUG ataraxis.crypt.AtaraxisCrypter - JuristicationTest was ok 20:46:31 INFO ataraxis.gui.AtaraxisStarter - Splashscreen: policyIsOK => true 20:46:31 DEBUG ataraxis.gui.AtaraxisStarter - ##### Exiting splash screen at: Tue Jul 29 20:46:29 CEST 2014 ##### |
Having a great debugger and therefore no need to read the produced log messages may easily result in something like this:
1 2 3 4 |
07/27/2014 22:23:25.33 OWSTIMER.EXE (0x1328) 0x19EC SharePoint Foundation Monitoring b4ly Medium Leaving Monitored Scope (Timer Job Health Statistics Updating). Execution Time=31.6552103689156 81eb6a9c-f96d-47e6-a7ba-0de19dd2c868 07/27/2014 22:23:25.37 wsstracing.exe (0x0384) 0x1EA4 SharePoint Foundation Unified Logging Service b9wt High Log retention limit reached. Log file 'C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS\SRV104-20140713-2153.log' has been deleted. 07/27/2014 22:23:25.37 wsstracing.exe (0x0384) 0x1EA4 SharePoint Foundation Tracing Controller Service 8096 Information Usage log retention limit reached. Some old usage log files have been deleted. 07/27/2014 22:23:25.50 OWSTIMER.EXE (0x1328) 0x1650 SharePoint Foundation Usage Infrastructure a5rv High |
Even messages like those can be understood. All it takes is a lot of time, a good tool and a high threshold for frustration.
Implicit structure…
When you know more about the different log formats you can start to understand the structure behind them. Understanding the structure will give you insights on the purpose of the entries and from there you may get a good enough picture of what’s going on.
Unfortunately every developer has different ideas on how things should work. Therefore you will have to start from scratch as soon as you open the log files of another application. Even when they do the same things and should print out a similar set of entries, the implicit structure may result in completely different messages.
With more training you will use less time to understand another application, but it still may be too much time. Especially when the application is down and you quickly need to understand the reason for the current crash. There must be a better solution than starting over and over again.
…made explicit
It would all be simpler if the implicit structure of the log messages would be explicit. We wouldn’t have to guess what the field represents – we would simply read the description and know what it does. Not only what it should do but what it actually does.
With this information we could get all error codes from all log files that occurred near the time of the crash and we would instantly know which system was the source of the error.
To reach a state like this we not only need to have a description for all fields, but a structure that contains the data in a parsable format. Only then will it be possible to combine the different sources automatically and extract the knowledge in one single place. Adding another system to our application portfolio would only need an initial configuration for the mapping. From then on it would deliver its messages in the same useful way all the other applications already do. Would that not be great?
And imagine what else we could get out of this data. We could even go that far and use real data collected on our own systems to optimize our applications. Our log files would no longer be a useless collection of garbage, instead they could be a useful source for fact based decisions.
Next
Over the next weeks I will show you different ways to write better log files (using Serilog), collect and integrate them (with RavenDB and Elastic Search) and finally use tools to understand what’s going on in your application (using Seq or Kibana). The journey to this new level of understanding starts next week with Serilog.
1 thought on “The Missed Opportunities of Log Files”