When we write code to solve a tricky problem, we often end up with methods that show us the whole journey of how we got to the solution: there is code to load data, followed by transformation logic intertwined with data clean-up hacks to prevent the creation of a report from exploding.
It is nothing wrong with writing code that way and to iterate until we reach the goal. The problem is that we keep everything in the order in which we initially wrote it. The developer who needs to fix a bug does not care if we found a missing default value at the beginning of the implementation or at the end. They much more prefer code that separates the different phases (load, clean-up, transform, report) and guides them towards the place where they should implement the fix.
The clear separation of the different phases makes the code simpler to understand and thus reduces the risk of breaking the existing features when we fix bugs. Let’s look how we can refactor code into distinct phases.
The mechanics
The split phase refactoring https://refactoring.com/catalog/splitPhase.html helps us to separate our methods into different phases. We can use this process to separate two phases:
- Use Extract Method for the second phase in your method
- Test
- Introduce an intermediate data structure as an additional argument to the extracted method.
- Test
- Examine each parameter of the extracted method of the second phase. If it is created by the first phase, move it to the intermediate data structure. Test after each move.
- Use Extract Method for the first phase and return the intermediate data structure.
If you have three or more phases, start with the last phase and work your way up to the first phase using the process from above. You may need more than one intermediate data structure depending on what your code does.
A small but realistic example
This little birthday calendar creator uses the Adventure Works database to get a list of all employees and orders them by their birthday in the year:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
public string Create() { var connection = new SqlConnection(_connectionString); // Load data var sql = @"SELECT Title, FirstName, MiddleName, LastName, e.BirthDate FROM Person.Person p INNER JOIN HumanResources.Employee e ON e.BusinessEntityID = p.BusinessEntityID ORDER BY MONTH(e.BirthDate), DAY(e.BirthDate)"; var rows = connection.Query(sql); // Process data var report = new StringBuilder(); foreach (var row in rows) { // Cleanup DateTime birthday = ((DateTime)row.BirthDate).Date; string middleName = (string)row.MiddleName; if (!string.IsNullOrEmpty(middleName) && middleName.Length == 1) { middleName = $"{middleName}."; } string title = (string)row.Title; if (string.IsNullOrEmpty(title)) { title = " "; } else { title = $"({title}) "; } // Print report report.Append($"{title} {row.FirstName} {middleName} "); report.Append($"{row.LastName}: {birthday.ToShortDateString()}\n"); } return report.ToString(); } |
There is a phase to load the employees, a clean-up phase and the report creation that gives us this result:
(Mr.) Brian S. Welcker: 06/06/1977
Suchitra O. Mohan: 10/06/1987
Don L. Hall: 13/06/1971
Ryan L. Cornelsen: 13/06/1972
Michael T. Entin: 15/06/1989
Michael I. Sullivan: 16/06/1979
(Ms.) Jill A. Williams: 18/06/1979
David M. Barber: 21/06/1964
Extract the report creation
The last phase in our example is the report creation. We extract the line in its own method and introduce the intermediate data structure (the class ReportEmployee):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
public string Create() { // ... // Print report var reportEmployee = new ReportEmployee(); PrintReportLine(reportEmployee, report, title, row, middleName, birthday); } return report.ToString(); } private static void PrintReportLine(ReportEmployee reportEmployee, StringBuilder report, string title, dynamic row, string middleName, DateTime birthday) { report.Append($"{title} {row.FirstName} {middleName} "); report.Append($"{row.LastName}: {birthday.ToShortDateString()}\n"); } } public class ReportEmployee { } |
It is a small start, but we need to start somewhere.
Move parameters
Next we move the parameters birthday, middleName and title to the ReportEmployee class (they are all created in the clean-up phase). We add a property to the ReportEmployee class, assign the value in the clean-up phase and use it in the report phase. As a final change we remove the parameter from the method signature:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
public string Create() { // ... // Process data var report = new StringBuilder(); foreach (var row in rows) { // Cleanup var reportEmployee = new ReportEmployee(); reportEmployee.Birthday = ((DateTime)row.BirthDate).Date; string middleName = (string)row.MiddleName; if (!string.IsNullOrEmpty(middleName) && middleName.Length == 1) { reportEmployee.MiddleName = $"{middleName}."; } string title = (string)row.Title; if (string.IsNullOrEmpty(title)) { reportEmployee.Title = " "; } else { reportEmployee.Title = $"({title}) "; } // Print report PrintReportLine(reportEmployee, report, row); } return report.ToString(); } private static void PrintReportLine(ReportEmployee reportEmployee, StringBuilder report, dynamic row) { report.Append($"{reportEmployee.Title} {row.FirstName} {reportEmployee.MiddleName}"} report.Append($"{row.LastName}: {reportEmployee.Birthday.ToShortDateString()}\n"); } } public class ReportEmployee { public string Title { get; set; } public string MiddleName { get; set; } public DateTime Birthday { get; set; } public ReportEmployee() { MiddleName = string.Empty; } } |
Extract the clean-up phase
Next we extract the clean-up phase and return the ReportEmployee:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
public string Create() { // ... // Process data var report = new StringBuilder(); foreach (var row in rows) { // Cleanup ReportEmployee reportEmployee = Cleanup(row); // Print report PrintReportLine(reportEmployee, report, row); } return report.ToString(); } private ReportEmployee Cleanup(dynamic row) { var reportEmployee = new ReportEmployee(); reportEmployee.Birthday = ((DateTime)row.BirthDate).Date; string middleName = (string)row.MiddleName; if (!string.IsNullOrEmpty(middleName) && middleName.Length == 1) { reportEmployee.MiddleName = $"{middleName}."; } string title = (string)row.Title; if (string.IsNullOrEmpty(title)) { reportEmployee.Title = " "; } else { reportEmployee.Title = $"({title}) "; } return reportEmployee; } |
For the moment we keep the dynamic row parameter as it is.
Extract the load phase
The remaining phase in our Create() method is the database access. We can extract everything related to the employee query into its own method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
public string Create() { var rows = LoadEmployees(); // ... } private IEnumerable<dynamic> LoadEmployees() { var connection = new SqlConnection(_connectionString); // Load data var sql = @"SELECT Title, FirstName, MiddleName, LastName, e.BirthDate FROM Person.Person p INNER JOIN HumanResources.Employee e ON e.BusinessEntityID = p.BusinessEntityID ORDER BY MONTH(e.BirthDate), DAY(e.BirthDate)"; var rows = connection.Query(sql); return rows; } } |
The result
Our Create() method is now down to this algorithm with all the concrete work extracted to the phase-specific methods:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
public string Create() { var rows = LoadEmployees(); var report = new StringBuilder(); foreach (var row in rows) { ReportEmployee reportEmployee = Cleanup(row); PrintReportLine(reportEmployee, report); } return report.ToString(); } private IEnumerable<dynamic> LoadEmployees() { var connection = new SqlConnection(_connectionString); // Load data var sql = @"SELECT Title, FirstName, MiddleName, LastName, e.BirthDate FROM Person.Person p INNER JOIN HumanResources.Employee e ON e.BusinessEntityID = p.BusinessEntityID ORDER BY MONTH(e.BirthDate), DAY(e.BirthDate)"; var rows = connection.Query(sql); return rows; } private ReportEmployee Cleanup(dynamic row) { var reportEmployee = new ReportEmployee(); reportEmployee.FirstName = row.FirstName; reportEmployee.LastName = row.LastName; reportEmployee.Birthday = ((DateTime)row.BirthDate).Date; string middleName = (string)row.MiddleName; if (!string.IsNullOrEmpty(middleName) && middleName.Length == 1) { reportEmployee.MiddleName = $"{middleName}."; } string title = (string)row.Title; if (string.IsNullOrEmpty(title)) { reportEmployee.Title = " "; } else { reportEmployee.Title = $"({title}) "; } return reportEmployee; } private static void PrintReportLine(ReportEmployee reportEmployee, StringBuilder report) { report.Append($"{reportEmployee.Title} {reportEmployee.FirstName} {reportEmployee.MiddleName}"} report.Append($"{reportEmployee.LastName}: {reportEmployee.Birthday.ToShortDateString()}\n"); } } public class ReportEmployee { public string Title { get; set; } public string MiddleName { get; set; } public DateTime Birthday { get; set; } public string FirstName { get; set; } public string LastName { get; set; } public ReportEmployee() { MiddleName = string.Empty; } } |
We can now write tests for the extracted methods and address the code smell with the dynamic object for the row parameter. With everything clearly separated, we could move the clean-up behaviour to the ReportEmployee class and turn it into a full object with data and behaviour. The better we understand the code, the more options we have to clean it up.
If you want to try it on your own, you find the code on GitHub.
A graphical representation of the structure before and after the split phase refactoring can look like this:
Conclusion
The split phase refactoring offers us a way to improve the readability of our code. The tiny steps may look boring, but that’s where the safety of this refactoring lies. This safety aspect is of great importance when we need to add automated tests to existing code that everyone considers “untestable”. Try it with a method that needs clarification in your project and let me know how it went.