When we write code to solve a tricky problem, we often end up with methods that show us the whole journey of how we got to the solution: there is code to load data, followed by transformation logic intertwined with data clean-up hacks to prevent the creation of a report from exploding.
It is nothing wrong with writing code that way and to iterate until we reach the goal. The problem is that we keep everything in the order in which we initially wrote it. The developer who needs to fix a bug does not care if we found a missing default value at the beginning of the implementation or at the end. They much more prefer code that separates the different phases (load, clean-up, transform, report) and guides them towards the place where they should implement the fix.
The clear separation of the different phases makes the code simpler to understand and thus reduces the risk of breaking the existing features when we fix bugs. Let’s look how we can refactor code into distinct phases.
The mechanics
The split phase refactoring https://refactoring.com/catalog/splitPhase.html helps us to separate our methods into different phases. We can use this process to separate two phases:
Use Extract Method for the second phase in your method
Test
Introduce an intermediate data structure as an additional argument to the extracted method.
Test
Examine each parameter of the extracted method of the second phase. If it is created by the first phase, move it to the intermediate data structure. Test after each move.
Use Extract Method for the first phase and return the intermediate data structure.
If you have three or more phases, start with the last phase and work your way up to the first phase using the process from above. You may need more than one intermediate data structure depending on what your code does.
A small but realistic example
This little birthday calendar creator uses the Adventure Works database to get a list of all employees and orders them by their birthday in the year:
publicstringCreate(){varconnection=newSqlConnection(_connectionString);// Load datavarsql=@"SELECT Title, FirstName, MiddleName, LastName, e.BirthDate FROM Person.Person p INNER JOIN HumanResources.Employee e ON e.BusinessEntityID = p.BusinessEntityID ORDER BY MONTH(e.BirthDate), DAY(e.BirthDate)";varrows=connection.Query(sql);// Process datavarreport=newStringBuilder();foreach(varrowinrows){// CleanupDateTimebirthday=((DateTime)row.BirthDate).Date;stringmiddleName=(string)row.MiddleName;if(!string.IsNullOrEmpty(middleName)&&middleName.Length==1){middleName=$"{middleName}.";}stringtitle=(string)row.Title;if(string.IsNullOrEmpty(title)){title=" ";}else{title=$"({title}) ";}// Print reportreport.Append($"{title} {row.FirstName} {middleName} ");report.Append($"{row.LastName}: {birthday.ToShortDateString()}\n");}returnreport.ToString();}
There is a phase to load the employees, a clean-up phase and the report creation that gives us this result:
The last phase in our example is the report creation. We extract the line in its own method and introduce the intermediate data structure (the class ReportEmployee):
It is a small start, but we need to start somewhere.
Move parameters
Next we move the parameters birthday, middleName and title to the ReportEmployee class (they are all created in the clean-up phase). We add a property to the ReportEmployee class, assign the value in the clean-up phase and use it in the report phase. As a final change we remove the parameter from the method signature:
publicstringCreate(){varrows=LoadEmployees();// ...}privateIEnumerable<dynamic>LoadEmployees(){varconnection=newSqlConnection(_connectionString);// Load datavarsql=@"SELECT Title, FirstName, MiddleName, LastName, e.BirthDate FROM Person.Person p INNER JOIN HumanResources.Employee e ON e.BusinessEntityID = p.BusinessEntityID ORDER BY MONTH(e.BirthDate), DAY(e.BirthDate)";varrows=connection.Query(sql);returnrows;}}
The result
Our Create() method is now down to this algorithm with all the concrete work extracted to the phase-specific methods:
publicstringCreate(){varrows=LoadEmployees();varreport=newStringBuilder();foreach(varrowinrows){ReportEmployeereportEmployee=Cleanup(row);PrintReportLine(reportEmployee,report);}returnreport.ToString();}privateIEnumerable<dynamic>LoadEmployees(){varconnection=newSqlConnection(_connectionString);// Load datavarsql=@"SELECT Title, FirstName, MiddleName, LastName, e.BirthDate FROM Person.Person p INNER JOIN HumanResources.Employee e ON e.BusinessEntityID = p.BusinessEntityID ORDER BY MONTH(e.BirthDate), DAY(e.BirthDate)";varrows=connection.Query(sql);returnrows;}privateReportEmployeeCleanup(dynamicrow){varreportEmployee=newReportEmployee();reportEmployee.FirstName=row.FirstName;reportEmployee.LastName=row.LastName;reportEmployee.Birthday=((DateTime)row.BirthDate).Date;stringmiddleName=(string)row.MiddleName;if(!string.IsNullOrEmpty(middleName)&&middleName.Length==1){reportEmployee.MiddleName=$"{middleName}.";}stringtitle=(string)row.Title;if(string.IsNullOrEmpty(title)){reportEmployee.Title=" ";}else{reportEmployee.Title=$"({title}) ";}returnreportEmployee;}privatestaticvoidPrintReportLine(ReportEmployeereportEmployee,StringBuilderreport){report.Append($"{reportEmployee.Title} {reportEmployee.FirstName} {reportEmployee.MiddleName}"}report.Append($"{reportEmployee.LastName}: {reportEmployee.Birthday.ToShortDateString()}\n");}}publicclassReportEmployee{publicstringTitle{get;set;}publicstringMiddleName{get;set;}publicDateTimeBirthday{get;set;}publicstringFirstName{get;set;}publicstringLastName{get;set;}publicReportEmployee(){MiddleName=string.Empty;}}
We can now write tests for the extracted methods and address the code smell with the dynamic object for the row parameter. With everything clearly separated, we could move the clean-up behaviour to the ReportEmployee class and turn it into a full object with data and behaviour. The better we understand the code, the more options we have to clean it up.
If you want to try it on your own, you find the code on GitHub.
A graphical representation of the structure before and after the split phase refactoring can look like this:
Conclusion
The split phase refactoring offers us a way to improve the readability of our code. The tiny steps may look boring, but that's where the safety of this refactoring lies. This safety aspect is of great importance when we need to add automated tests to existing code that everyone considers "untestable". Try it with a method that needs clarification in your project and let me know how it went.