Python Friday #130: Different File Encodings Between Windows and Linux

While Python works great on Linux, Windows and Mac, there are tiny differences that may have big impacts on your code. I found this out the annoying way with a crashing application.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

 

A right arrow in Unicode

While checking the links on my blog, my application found a right arrow in a link text:

An arrow pointing towards right

We can create this right arrow in Python with this command:

According to Python, the right arrow Unicode character is printable:

So far, so good.

 

Encoding error when written to a file

I only noticed this sign because I received an exception when writing the right arrow in a file:

Traceback (most recent call last):
File ““, line 2, in
File “C:\**\lib\encodings\cp1252.py”, line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: ‘charmap’ codec can’t encode character ‘\u2192’ in position 0: character maps to <undefined>

The exception means that Python cannot find a replacement character to write it to the file that matches the used encoding. But should this not be UTF-8? When we set an explicit encoding for UTF-8 we can write the file without an exception:

Strange, what exactly does Windows do with the file encodings?

 

Difference between Linux and Windows

To check what is happening in the background, we can use this code:

When we run this code in Windows, Python creates a file encoding_test.txt with an ANSI character set:

Our file uses ANSI encoding on Windows

But if we run the same code on Linux or in WSL, Python uses UTF-8 to encode the content of the file:

Our file uses UTF-8 encoding on Linux

That difference is the problem. ANSI does not support the right arrow and our Python application crashes on Windows when we try to write unsupported characters.

 

Conclusion

Be aware that Python on Windows does not use UTF-8 for your files by default. If your code should run on Windows, make sure that you use the explicit encoding whenever you access files (read and write) and set it to UTF-8. That way you make sure that UTF-8 is used all the time.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.