The Missed Opportunities of Log Files

Log files could be a great place to find the answer on what went wrong in our applications. They should contain all the necessary information to understand the state of the application before and while the error happened. Instead most log files are giant collections of data without much structure and even less meaning. What is unfortunate in one single application is a problem as soon as multiple applications need to work together.

Every developer has its own idea how a log message should be structured, what is important and how it should be represented. Trying to combine log files from multiple applications is therefore a lot of work and the chances to miss the important insight is high.

This post is part of the Improve Your Log Messages series. You can find the other parts here:

Everything goes

On top of all that there are tremendous differences in what is logged and how readable they are. Log files tend to be more understandable when the developer used them to figure out what his application does. In this case a log message may look like this one:

20:46:29 INFO  ataraxis.gui.AtaraxisStarter - ##### Entering splash screen at: Tue Jul 29 20:46:29 CEST 2014 #####
20:46:30 DEBUG ataraxis.crypt.AESKeyCreator - createSecretKey() - start
20:46:30 DEBUG ataraxis.crypt.AESKeyCreator - Size of AES-Key: 256
20:46:31 DEBUG ataraxis.crypt.AESKeyCreator - createSecretKey() - end
20:46:31 DEBUG ataraxis.crypt.AtaraxisCrypter - JuristicationTest was ok
20:46:31 INFO  ataraxis.gui.AtaraxisStarter - Splashscreen: policyIsOK => true
20:46:31 DEBUG ataraxis.gui.AtaraxisStarter - ##### Exiting splash screen at: Tue Jul 29 20:46:29 CEST 2014 #####

20:46:29 INFO ataraxis.gui.AtaraxisStarter - ##### Entering splash screen at: Tue Jul 29 20:46:29 CEST 2014 #####

20:46:30 DEBUG ataraxis.crypt.AESKeyCreator - createSecretKey() - start

20:46:30 DEBUG ataraxis.crypt.AESKeyCreator - Size of AES-Key: 256

20:46:31 DEBUG ataraxis.crypt.AESKeyCreator - createSecretKey() - end

20:46:31 DEBUG ataraxis.crypt.AtaraxisCrypter - JuristicationTest was ok

20:46:31 INFO ataraxis.gui.AtaraxisStarter - Splashscreen: policyIsOK => true

20:46:31 DEBUG ataraxis.gui.AtaraxisStarter - ##### Exiting splash screen at: Tue Jul 29 20:46:29 CEST 2014 #####

Having a great debugger and therefore no need to read the produced log messages may easily result in something like this:

07/27/2014 22:23:25.33 	OWSTIMER.EXE (0x1328)                   	0x19EC	SharePoint Foundation         	Monitoring                    	b4ly	Medium  	Leaving Monitored Scope (Timer Job Health Statistics Updating). Execution Time=31.6552103689156	81eb6a9c-f96d-47e6-a7ba-0de19dd2c868
07/27/2014 22:23:25.37 	wsstracing.exe (0x0384)                 	0x1EA4	SharePoint Foundation         	Unified Logging Service       	b9wt	High    	Log retention limit reached.  Log file 'C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS\SRV104-20140713-2153.log' has been deleted.	 
07/27/2014 22:23:25.37 	wsstracing.exe (0x0384)                 	0x1EA4	SharePoint Foundation         	Tracing Controller Service    	8096	Information	Usage log retention limit reached.  Some old usage log files have been deleted.	 
07/27/2014 22:23:25.50 	OWSTIMER.EXE (0x1328)                   	0x1650	SharePoint Foundation         	Usage Infrastructure          	a5rv	High

07/27/2014 22:23:25.33 OWSTIMER.EXE (0x1328) 0x19EC SharePoint Foundation Monitoring b4ly Medium Leaving Monitored Scope (Timer Job Health Statistics Updating). Execution Time=31.6552103689156 81eb6a9c-f96d-47e6-a7ba-0de19dd2c868

07/27/2014 22:23:25.37 wsstracing.exe (0x0384) 0x1EA4 SharePoint Foundation Unified Logging Service b9wt High Log retention limit reached. Log file 'C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS\SRV104-20140713-2153.log' has been deleted.

07/27/2014 22:23:25.37 wsstracing.exe (0x0384) 0x1EA4 SharePoint Foundation Tracing Controller Service 8096 Information Usage log retention limit reached. Some old usage log files have been deleted.

07/27/2014 22:23:25.50 OWSTIMER.EXE (0x1328) 0x1650 SharePoint Foundation Usage Infrastructure a5rv High

Even messages like those can be understood. All it takes is a lot of time, a good tool and a high threshold for frustration.

Implicit structure…

When you know more about the different log formats you can start to understand the structure behind them. Understanding the structure will give you insights on the purpose of the entries and from there you may get a good enough picture of what’s going on.
Unfortunately every developer has different ideas on how things should work. Therefore you will have to start from scratch as soon as you open the log files of another application. Even when they do the same things and should print out a similar set of entries, the implicit structure may result in completely different messages.

With more training you will use less time to understand another application, but it still may be too much time. Especially when the application is down and you quickly need to understand the reason for the current crash. There must be a better solution than starting over and over again.

…made explicit

It would all be simpler if the implicit structure of the log messages would be explicit. We wouldn’t have to guess what the field represents – we would simply read the description and know what it does. Not only what it should do but what it actually does.
With this information we could get all error codes from all log files that occurred near the time of the crash and we would instantly know which system was the source of the error.

To reach a state like this we not only need to have a description for all fields, but a structure that contains the data in a parsable format. Only then will it be possible to combine the different sources automatically and extract the knowledge in one single place. Adding another system to our application portfolio would only need an initial configuration for the mapping. From then on it would deliver its messages in the same useful way all the other applications already do. Would that not be great?

And imagine what else we could get out of this data. We could even go that far and use real data collected on our own systems to optimize our applications. Our log files would no longer be a useless collection of garbage, instead they could be a useful source for fact based decisions.

Over the next weeks I will show you different ways to write better log files (using Serilog), collect and integrate them (with RavenDB and Elastic Search) and finally use tools to understand what’s going on in your application (using Seq or Kibana). The journey to this new level of understanding starts next week with Serilog.

The Missed Opportunities of Log Files

Everything goes

Implicit structure…

…made explicit

Next

Like this:

Related

1 thought on “The Missed Opportunities of Log Files”

Leave a Comment Cancel reply

Everything goes

Implicit structure…

…made explicit

Next

Share this:

Like this:

Related

1 thought on “The Missed Opportunities of Log Files”

Leave a Comment Cancel reply