Closing the Feedback Loop from Log Messages to Knowledge

In the last few weeks I posted a series of articles on log messages. Starting at the missed opportunities, continued with the great power of structured logging with Serilog and different sinks to the helpful dashboard of Kibana. With that entire infrastructure in place we are now able to turn log messages into knowledge. And even better, this doesn’t have to be one-time action.

This post is part of the Improve Your Log Messages series. You can find the other parts here:

 

Assumptions and Validated Learning

Assumptions are the one thing every software project has no shortage of. We guess a lot and hope it will come true. But as long as we don’t explicitly state that these are guesses and not facts no one will bother to check them. If they are true then no harm is done. But what happens if we are wrong? Not at a simple detail but at a fundamental level? Don’t we want to know as early as possible if we run in the wrong direction?

More and more ideas from the Lean Startup movement gain traction in agile software development. One idea I’m interested in particular is the idea of validated learning. To go from an assumption to proven statement we can use these steps (according to Wikipedia):

  1. Specify a goal
  2. Specify a metric that represents the goal
  3. Act to achieve the goal
  4. Analyse the metric – did you get closer to the goal?
  5. Improve and try again

 

The Challenge

It’s easy to write log messages. It’s harder to write meaningful messages that can be used for validated learning. But it’s even harder to come up with a goal and a metric that will prove (or disprove) your assumption. This is not just a mechanical act of slapping code together. You have to think about the goal you want to achieve. Why do you write a new feature? Is it just to enable new functionality? Or do you follow a bigger plan like acquiring more users or a higher conversion rate?

We may have the assumption that a faster load time of our page will increase sales. We can measure the load time and the sales numbers and then try to put them together. Did the faster page really generate more sales? Or was it another factor we don’t know about that pushed the numbers in the desired direction? Is one the cause of the other or did they only happen at the same time? This brings us to the difference between correlation and causation and opens a whole new field of challenges.

 

A Sufficient Approximation

For the most parts we will not need a proof at a scientific level. We can go far with a sufficient approximation to push the numbers in the desired direction. As long as we use iterations to refine our assumptions and try to validate them against our application we can start simple. We will not end up with an overall theory that can be applied universally, but we can build a better application for our users

 

A simpler Experiment

Let’s assume we (the developers) want to use new browser features but can’t because the number of Windows XP users is too significant. We may align this with a business goal that our users should use a supported operating system for security reasons.

The measurement in this case would be the number of unique users that use Windows XP and we expect the number to go down. To collect the data we can add this line of code in our authentication module:

Depending on the code in GetOperationSystem() we not only will know which user uses Windows XP but get an overall view on the used operating systems.

When we put the code in place it can go life with the next release. Before we start influencing the behaviour of our users we should collect a baseline of data points. Otherwise we will be unable to measure a difference caused by our actions.

As soon as we have enough data we can start an information campaign. It may be a general mailing to all our users or we may be more direct and display a (warning) message whenever we detect a Windows XP system. Whatever we try we always should let enough time pass to get a useful measurement. Operating systems are not that easily changed and therefore our experiment may take months until we can see a change of behaviour.

 

Analyse the Data

We can easily run multiple experiments at the same time. As long as they target different behaviour changes it should not be a problem. We even may find an experiment that we can do with already collected data. While our experiment for Windows XP is running for months we may have collected a lot of other data points. If we are interested in how fast our users install a new browser or update their iPhones we may already have the data to prove or disprove our assumptions.

 

Gained Knowledge

It’s hard to predict what knowledge you will gain from your data. In the case of the Windows XP users you may find out that only a few power users are still on that operating system and they all work at the same place. Contacting them directly may be far more efficient than to send a message to all users.
Or you may find out that you won’t lose much money when you cut the support for the old browsers. Should users with current browsers spent significantly more money than the ones with older versions your business may force you to focus on new features to improve the overall sales numbers.

All that could happen when you know more about what’s really going on in your application (and in your user base).

 

Next

When we design our experiments and collect data we may miss some important parts. As long as they are inside our applications we can easily add more log messages and collect them in our dashboard. However, if they are in a third-party application we have not that flexibility. We are back where we started with implicit meaning inside flat files.

Over the next weeks we will look into different applications and how we can analyse their log files in a simple and useful manner. Stay tuned.

3 thoughts on “Closing the Feedback Loop from Log Messages to Knowledge”

  1. Hi Johnny,

    Great series of blog posts on structured logging with Serilog. I have thoroughly enjoyed reading them all.
    Just a quick query, are you considering writing any blog posts on logging strategies? I know this is going to be highly influenced by both your application and organisations needs but it would be nice to have some general guide lines to consider?
    Your series of blog posts has got me very excited about using Serilog so any advice / blog posts on best practice or your own experiences with writing logging code would be greatly appreciated.

    Thanks,

    Jon

    Reply
    • Hi Jon,
      Thank you for your support and I hope the posts will make it easy for you to start with Serilog.

      So far I don’t have planned to write about logging strategies. However, I think that would be a great addition. I will make some more experiments and when I have something good to share I will blog about it.

      Thanks again,
      Johnny

      Reply

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.