This is my version.txt, created last night:
Thank you. Not having to delved into it yet, I can already tell that it's almost assuredly an issue with one or more of the SWMH culture files' name lists. Now, the problem is fixing them properly, as the incorrectly-encoded characters in the file may very well not have ANSI equivalents. Also, this can come from corrupted character history (probably also somewhere in SWMH), since character history directly assigns the first name of a given historical character and thus could easily exhibit the same problem. These problems usually result from not using a proper text editor setup to encode only in ANSI; even then, it can happen when copy/pasting a Unicode character from, say, a browser window into the text editor. It only needs to happen once to set a bunch of things off.
Unfortunately, there's no clear, automatic way to detect when a name has improper characters in it, because ANSI and UTF-8 are ambiguous for a portion of their overlapping character sets (i.e., a script wouldn't be tell whether what was actually a Yen sign in ANSI but some Norwegian character in UTF-8 was an error-- it's subjective as far as a dumb script is concerned). However, some ANSI character sequences are completely invalid in UTF-8, so those can be regarded as correct ANSI characters (since there's no other alternative encoding).
We might need some manual help identifying the UTF-8 "infected" text files, and beyond that, it will require manual selection of an alternate character (that's ANSI-correct) for every broken UTF-8 instance that's found (and then re-encoding the file). Sometimes, this is as easy as just opening the file in Notepad++, using its Encoding menu to convert the character set to Windows-1252, converting the whole file to ANSI, "enabling" the menu option "Encode in ANSI," and then just saving the file again. Notepad++ can do a decent job of selecting an appropriate alternate character, because it has a big table of common conversions (as does the platform-independent tool called
iconv that will convert text files between arbitrary character sets and encodings automatically).
So, as far as help goes, are you noticing this with historical characters (i.e., most non-baron startup characters) or ones that were born / randomly-generated after the game started? The distinction narrows things down between either character history or the culture name lists (though it may be both).