Fixing Unicode Characters When Using DbUp

DbUp is a great tool to manage your database migrations. As I wrote here, it’s simple and easy to use. However, when it comes to edge cases like the German umlauts, that simplicity can have its downside. With neither a flag to change the encoding nor a way to interfere with the execution, it seems as if DbUp isn’t up for the task.

But don’t jump too early to this conclusion. It’s wrong and the source of the error is SQL Server, not DbUp.

 

The Problem

Given that we have those two scripts who should create a table with NVARCHAR fields and another one to add some test data:

DbUp_umlauts_CREATE

DbUp umlauts script

When DbUp executes those scripts we end up with this mess:

DbUp umlauts in UTF8

When the INSERT script is directly executed in the SQL Server Management Studio, then all works as expected – with or without the letter N in front of the string:

DbUp_umlauts_ManagementStudio

(The letter N may be of importance in older versions of SQL Server. In 2014 and 2008 I could not fix the problem with this often given advice.)

 

A Solution

My first guess was that DbUp is the source of the problem. However, a quick search on Google shows many similar errors in scenarios where DbUp isn’t involved. It is therefore safe to say that DbUp is innocent and only the unfortunate messenger.

The source of the problem is the handling of Unicode characters in SQL Server. While most systems use UFT-8 to encode special characters (from German umlauts to Chinese symbols), SQL Server uses UCS 2.

DbUp umlauts in Notepad++

When I open my script in Notepad++, it shows that the encoding is UTF-8. Changing the encoding to UCS-2 BE BOM, saving the file and try it again with DbUp and the correct encoding is in the database:

DbUp_umlauts_fixed

 

Another one

The UCS-2 story is nice and works, but is it really the only way? To try it out I changed the encoding to UTF-8 BOM and it worked as well. For once the byte order mark (BOM) helped and didn’t cause more problems. However, since I already spend hours fixing BOM-related problems, I go with the answers to this question on StackOverflow and don’t use UTF-8 with BOM. Until I find another solution I will encode the files as UCS-2.

 

Conclusion

It’s really not a DbUp problem and with the right encoding even SQL Server gets the umlauts right…

2 thoughts on “Fixing Unicode Characters When Using DbUp”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.