I had an interesting problem to solve: How long did a task take to complete? The challenge was that our task history table contains one row for the start and another one for the end of a task:
Id | TaskId | Action | Timestamp |
---|---|---|---|
1 | 100 | Start | 2022-08-01 08:00:00.000 |
2 | 101 | Start | 2022-08-02 07:30:00.000 |
3 | 100 | Stop | 2022-08-03 17:00:00.000 |
4 | 102 | Start | 2022-08-04 08:00:00.000 |
5 | 102 | Stop | 2022-08-04 17:00:00.000 |
6 | 101 | Stop | 2022-08-05 18:00:00.000 |
I needed to combine the two matching rows to create an output like this:
TaskId | Duration |
---|---|
100 | 3 |
101 | 4 |
102 | 1 |
OUTER APPLY
In SQL Server we can use the OUTER APPLY operator to combine a table with itself. It is basically a LEFT OUTER JOIN, but with a bit more functionality.
This statement takes the start times of the tasks and combines them with the end times of the same task, calculates the difference in days and adds 1 to catch the partial days:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
SELECT [TaskId], CONVERT(DATE, taskStart.[Timestamp]) AS 'Start', CONVERT(DATE, taskEnd.[Timestamp]) AS 'End', DATEDIFF(day, taskStart.[Timestamp], taskEnd.[Timestamp]) + 1 as 'Duration' FROM #TaskDuration taskStart OUTER APPLY ( SELECT TOP(1) taskEnd.[Timestamp] FROM #TaskDuration taskEnd WHERE taskEnd.[Timestamp] > taskStart.[Timestamp] AND taskEnd.[Action] = 'Stop' AND taskEnd.TaskId = taskStart.TaskId ORDER BY [Timestamp] ) taskEnd WHERE taskStart.Action = 'Start' |
When I run this query against the example data from above, it produces the result I need:
TaskId | Start | End | Duration |
---|---|---|---|
100 | 2022-08-01 | 2022-08-03 | 3 |
101 | 2022-08-02 | 2022-08-05 | 4 |
102 | 2022-08-04 | 2022-08-04 | 1 |
This query works even when a task needs multiple runs, because the ORDER BY [Timestamp] gives us the end times in ascending order (and so the SELECT TOP(1) finds the closest end time to the specific start time of the task).
Conclusion
SQL Server saved me a lot of programming time with its OUTER APPLY function. I got the result I needed in a few minutes and the query runs so fast that I can run it whenever I need an update. There are other solutions, but I liked this one and did not bother to try something else.