White Shining Rock Logo
White Shining Rock Logo

Porting: UTF-8

January 3, 2015 | Dukus | 30 Comments

With the mod kit and steam workshop out there, I've been working on porting the game to OSX and Linux, cleaning up code, and writing some new code. I'll still be fixing bugs and making small changes, but my focus now is going to be on ports, and building prototypes for new games.

I've built a new machine for developing on Linux, and bought some Mac's, so I'm all set with hardware. And it's been a while since I used makefiles, other IDEs, gcc/clang, or did any sort of *nix development, but I have done it so I'm not starting from ground zero.

But before I can actually go about working on the new hardware and compiling things, there are a few issues in the Banished code base that need fixing. I planned on porting the code base one day, so things are nicely setup into common code and platform specific code, but there are still some issues I didn't properly account for.

There's code portability issues. I want to make the code as portable as possible - there's a chance I'll be making games for more than just Windows, Mac, and Linux one day, so I might as well try to fix them now.

The first issue is with text.

When I wrote the code I assumed one day it would support more than just English. Back when I made console games I never worked on any of the text code, so all I really knew is that the system used two bytes per character in a string of text, and this was fine for all the languages that the game was translated into - generally EFIGS and maybe Japanese. Since windows API takes wchar_t for all filenames and text, that's what I used, and I happily and naively coded away using wide strings. This probably would be okay if the game stayed on Windows.

But it's sort of mistake for cross-platform code. I didn't even use it correctly. The released game doesn't currently use UTF-16, it's just UCS-2, so there are some languages that have characters that are unrepresentable.

Not only that, but API calls on other systems generally don't take wchar_t*, they take char* instead. Certainly I could make conversion functions and convert UCS-2 to UTF-8 as needed, but that's not really ideal.

The final problem is that the size of wchar_t on Windows is 2 bytes, but this isn't guaranteed to be the case on other platforms. The size issue shouldn't really matter - but it's possible somewhere I multiplied a string length by 2, instead of multiplying by sizeof(wchar_t); That would cause problems.

I've known about these issues for a while, but I generally code something that works first, and don't refactor until I have to. And now I have to.

So my big fix recently was to remove the use of wchar_t and use char instead. (And make sure there's no multiplies by 2). Not only that but all strings need to use UTF8 properly - when printing text, or reading text from resource files the right thing needs to be done to properly decode and show the right character.

The first step of doing this is easy. You can find and replace a bunch of stuff using a text editor.

  • wchar_t becomes char
  • L"string" becomes "string"
  • wsprintf(buffer, "%ls", param) becomes sprintf(buffer, "%s", param)
  • wcscpy, wcslen, wcscat becomes strcpy, strlen, strcat

Next I fixed my String class to properly use char. No big deal.

All the serialization code that previously written to serialize wchar_t became serialization of char. This conflicted with serialization of single byte signed values, which were already typed as char.

For example

void Serialize(char v);             // serialize a signed 8 bit value
void Serialize(unsigned char v);    // serialize a unsigned 8 bit value
void Serialize(wchar_t v);          // serialize a character

Became

void Serialize(signed char v);     // this looks ambiguous to me, because i think 
void Serialize(unsigned char v);   // of 'char' as signed, even though it's
void Serialize(char v);            // compiler dependent and compiles ok...

This prompted me to redeclare all integers with typedefs and make a distinction between a character and an 8-bit integer.

  • signed int became int32
  • unsigned int became uint32
  • signed short became int16
  • unsigned short became uint16
  • signed char became int8
  • unsigned char because uint8
  • char stays the same, and is only used for characters and strings.

int64 and uint64 were already typedef'd. These typedefs are made in platform code, so per platform types can be declared using the right sizes, regardless of their names on each platform.

So the overloads then became

void Serialize(int8 v);     // serialize a signed 8 bit value
void Serialize(uint8 v);    // serialize an unsigned 8 bit value
void Serialize(char v);     // serialize a character

This took care of the type conflict since the compiler treats 'signed char' as a different type than 'char'. It also clarifies what the types are used for in the code. If I see a char in code, it means character, or char* means null terminated string. If you see int8 or uint8, it's a number stored as 8 bits. This makes things a bit more clear.

This was also a big find and replace, except for char, which I had to go through and determine if they were int8, or actually ascii characters. This wasn't too hard though, as most previous uses of text used wchar_t - so most char's became int8. I'll probably take a few weeks getting used to typing int32 instead of just int.

One issue that came up with the type name change was the existing source data. If you've been modding the game, in the data you might see

int _value = 400;

with the new code, it would become

int32 _value = 400;

While I could version all the text data, I don't really want to break peoples mods that they've already setup and have to do the find and replace on their own, so there's some versioning code that reads the old typename. For the next game I'll get rid of that versioning code, but it's going to stay in Banished for now.

Next came the hard part. Input data could be in any format - generally text files would be in Ascii if I create them, or UTF8 I used a symbol like the Euro. But mod creators and translated strings could be in UTF8, UTF8 with a byte order mark, or UTF16 big-endian, or UTF16 little-endian. Really it shouldn't matter what a mod creator uses as a text format - I just want the game to load it and do the right thing.

While there are libraries for dealing with this, I tend to write my own code since I don't like different code styles mixed in my code, type conflicts, and dealing with crazy code licenses. So 400 or so lines of code later and lots of debugging I wrote two great functions. One detects character encoding using byte order marks and checking for valid UTF8. The other can convert any one text encoding to another. There's also support functions for decoding strings one character at a time.

After that it was a simple matter to convert all text files to UTF8 as they are opened by the game engine.

Once I had all UTF8 strings in memory, they have to be decoded when used as output - the font rendering code now decodes UTF8 into actual characters using 1 to 4 bytes of a string at a time before looking up each glyph in the font texture.

The Windows API can be compiled to use either wide strings or not, but I left them as wide and made a WideString class to deal with the conversion to and from the internal UTF8 String format. The WideString class only exists on windows compilations, and is only used in files that would be rewritten per platform anyway.

After all that, the game compiled and ran just fine, but it wouldn't load old data or save games. This is bad. I can't go breaking save games when a new version comes out. And I'd like to keep all existing mods working.

So then I had to version old strings that are stored in saves and data on disk - strings are just written as an int32 length, followed by all the bytes of data. So new strings set the high bit on the length to mark it as the new version. I doubt that the game will have a string of text 4 gigabytes long so this will be okay. If an old string is detected when the high bit unset, it converts it from UCS-2 to UTF-8 on load, and the game happily continues loading older data.

Again this versioning code will hopefully go away when I make another game, since it won't be needed.

So now the only wchar_t that exists is in platform code on Windows that won't be compiled on OSX or Linux, and the game properly supports various character encodings.

Text encodings are one of those things I never really want to think about - and now that I've spent a while dealing with it, hopefully I never have to deal with it again. Phew.

Leave a Reply

Your email address will not be published.

30 comments on “Porting: UTF-8”

  1. Thanks for putting in the effort to support other platforms! I've already bought your game and play it via Wine, but it's still very much appreciated.

  2. Awesome news that you're getting started on the Linux port.
    It runs perfectly fine in Wine, but playing native games is just much less hassle.
    Hope for your sake that it won't be too hard.

    Thanks and happy coding!

  3. Just wondering, why are you typedef-ing all integer instead of using stdint.h? It will define correct size on all platform iirc.

  4. Hello could you add some benches and lightposts and other decorative items like that? The base game is very good indeed and i have had lots of fun with it but it feels a bit....basic, lacking variation. Citizens feel a bit lifeless as well, maybe some different animations of them doing something on their free time? Kind of different "scenarios" that can happen in different enviroments to breath some life into the game. And a closer zoom with better looking details. It's not the Sims but i believe these "extra" things are the difference between a good game and a great one. I realize you're a one one man band with a lot on his plate. These are simple suggestions and im not expecting anything. You've already done a good job. I hope this is a proper place for this, i wrote in the heat of the moment.

  5. This was very interesting Luke, thanks for sharing!
    It remembers me to your blogs from when the game wasn't released (a year ago). Very interesting.
    I'm looking forward to what ideas you have on new games.

    Also something I wondered: iirc the engine you build for Banished supports lights (originally for the zombie game). Although Banished doesn't uses that feature. Would it be possible for modders to use lights?

    Good luck with everything!

  6. Ugh, I feel so sorry for you to have to deal with that stuff. I hated it when it came up in my university and mostly ignored it as well. Good luck with the bit pushing!

  7. This makes me so happy - I can't wait for the Steam Machines to come out (which run on Linux) and I was so sad to see that Banished was the only game in my library that didn't support it - until now! 😀

  8. Thanks for all your hard work Luke! Looking forward to Linux version. Playing on Windows box right now. Great game! 🙂

  9. A Linux port will be MUCH appreciated! I will be sure to make it shine in the linux gaming community

  10. Thanks for all your work on this! I couldn't wait to play Banished, so I followed a helpful guide to get the game to run via a Wineskin in OSX. It's playable, but not without it's issues, or making me feel like my computer is going to explode with the fan noise. I'm very eager to play a native OSX build, and will happily re-purchase the game for your efforts.

  11. I appreciate your detailed post on the language coding for porting your game. It's interesting and gives insight to other developers.

    I would like to say that as someone who purchased 3 copies of your game for Windows that I'm disappointed that you're choosing to cash in on cross platform before there's more content in the game. Like others have said. It has a ton of potential and the base game is great but it's lackluster and bland after a few hours of gameplay. Relying on devs and players in the mod community and steam workshop to add new content to your game for you so you can make money on other platforms isn't ideal for the players. As someone who supported your early access game, i don't think i'll be purchasing any more from you based on your game release ethics and your idea on what a finished game is.

    In your opening sentence you said that you're done adding content to this game. Bug fixes, ports, and then on to different games. I'm not writing this to be hurtful, i'm not bitter. I just want you to know that not all of your customers are titillated by this news. Best of luck anyway.

  12. That's great news and I always like it when developers share a little about the developing process. I try to always show my support to everyone, who cares about linux. Unfortunately I bought the game already on release date. Now the distributors and the developer can't have my money and it's sad, because I have no way to show how much I appreciate this port. Good ports and great native linux games are close to my heart. For more linux games it's very important to show how much they are loved.

  13. @Ruzzik

    The game has always been presented as what it is. No promises have ever been made about extra content. In fact I seem to remember a dev post from quite some time ago stating once this is complete then it's finished.

    And quite the opposite of "cashing in" on other platforms the dev is fulfilling a promise to make it available on other platforms.

    The Mod features are a bonus that allows additions to what is already a complete game. You may find the vanilla bland, but it is what it is, it does what we were told it would do, no more, no less.

    I'm not really understanding the logic, may be I'm misunderstanding, but it seems you are complaining that a product that you bought (3 times apparently), that you were told would do "A", does "A", but you're annoyed it doesn't do "B"

  14. I discovered banished watching videos and when I found out it has been made by only one person, I didn't believe it... I can't play to banished because i've got a new mac and I didn't want to run windows on it. That's why I appreciate a lot your investment on developing an OS X version. You've got all my support, my admiration and my whishes from France for 2014!!! Good luck!

  15. I just want to add my 2 cents as another mac user who eagerly awaits the day that Banished can be played on OSX. I used to bootcamp windows on my older macs, but now that you can't use your older windows license on the new machines, I've no interest in paying microsoft more money just to be able to run a few games. So, the sooner the better as far as porting to OSX!

  16. thanks for this great insight on our development process and porting the game to more platforms.
    i would like to now if you're planning a beta version for linux and osx clients?

  17. thanks for working on the OSX release. I previously purchased the PC game and played with a crosstie, however as far as I have been able to figure out this won't support mods. Not sure if GOG will allow me to download a mac version if I already purchased the PC version, but I'd happily buy it again if it was mac native w/ mod support!

  18. I will happily re-purchase the Linux version. Every day I'm amazed at the effort you put into Banished. It's really a work of genius.

  19. Very interesting read! I just bought another copy of the game for my fiancee not too long ago so that we both could play simultaneously and yell, "MORE BABIES!". I look forward to the Linux release so that I can sink another large set of hours into it!

  20. Looking forward to the Linux version. The game works under Wine, but for some reason it runs the CPU at 100%.

  21. Thanks for porting to linux!

    I already got the game at release and played through steam in-home streaming, but look forward to playing it natively for the changes since release.

More Posts

Code Rot

April 17, 2022
1 2 3 47
Back to devlog
Back to devlog
© Copyright 2021 Shining Rock Software
Website Design & Branding by Carrboro Creative
menu-circlecross-circle linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram