Why Python 3 doesn’t write the Unicode BOM

I’ve been using Python scripts to automatically edit and output Windows Resource files (*.rc) for C++ projects in Visual Studio 2013. When handling Unicode, Windows and Visual Studio always want little endian UTF-16 encoding, and the resource file should always start with the Unicode BOM (Byte Order Mark). However, despite the promises in the documentation, I was finding that Python wasn’t outputting the BOM automatically.

I nearly resorted to outputting it manually, but as is often the case with Python, it turns out the correct approach is simpler than it seems. Read more Why Python 3 doesn’t write the Unicode BOM

Output file encoding in Python 3

Unicode is becoming very widespread now (for good reason), and one of the great benefits of Python 3.x is that it handles Unicode natively. There are different ways to represent Unicode though, so how do you set the file encoding in Python when you’re writing out to file?

It’s not immediately obvious from some of the official documentation, but thankfully it’s not complicated. I’ll give a quick overview in this post. Read more Output file encoding in Python 3