• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.
I've created a .mod version of Clear Combat in this post. Since it only shares a modified version of one file with what's in the HIP folder structure I'm thinking that it should work fine. I tried it in my game and there were no problems. I'm using all of HIP except VIET Immersion.
 
Sounds great.
Saturday release seems likely then. Send the files to me as soon as they're ready.
would that be compatible with saves from the existing version? :D
 
Unless VIET Events or Immersion touches it, no.

Shoot. I was hoping it'd add Satanism as a religion.

VIET doesn't touch that at all, so no. And no plans to add Satanism, that is way outside the scope of all hip mods.

Everything but SWMH and VIET Immersion. If you just do the "default" install by pressing "y" or Enter all the way through, it'll automatically skip VIET Immersion.

In the next update we are changing things slightly as VIET Immersion will no longer be skipped in the installer per se, though obviously you'd still have to choose one or another.
 
In the next update we are changing things slightly as VIET Immersion will no longer be skipped in the installer per se, though obviously you'd still have to choose one or another.
More precisely, the installer says that SWMH and VIET Immersion are incompatible before making you choose. The first choice is SWMH though, so pressing enter all the way through will still install SWMH rather than VIET Immersion.
 
I've created a .mod version of Clear Combat in this post. Since it only shares a modified version of one file with what's in the HIP folder structure I'm thinking that it should work fine. I tried it in my game and there were no problems. I'm using all of HIP except VIET Immersion.

IIRC, ClearCombat overrides defines.lua (for one setting, which PB already makes standard). Does it still? That would be a problem if so. Otherwise, overriding combat_tactics.txt is perfectly compatible with HIP.
 
Just to be pedantic: ASCII (and even earlier codes) defined character 0 as NUL, a valid character, years before C was invented. While C uses 0-terminated strings, not all languages do. Even in C, 0 is a valid character, in a binary file for instance.

NULL-terminated strings were a common convention before AT&T released C in 1970. C has nothing to do with this. Most languages DO use NULL-terminated strings but simply do not expose that implementation detail to the programmer. In fact, I cannot think of one that doesn't. Whether it's a zero or some other funny binary sequence, for efficient string algorithms to work you need a sentinel sequence to terminate the string, or else the underlying implementation must do unnecessary arithmetic for bounds checking upon each character access in any algorithm that potentially traverses all the characters in a string sequentially. For regular ASCII or ANSI strings, that is. Once you get to UTF-8 strings, string buffers (i.e., very large strings, sometimes implemented as ropes, originally named for the same-named SGI C++ STL container template that had O(log n) insertion time of a new substring at any arbitrary character index), anything goes, but that's outside the scope of the lexicographic-ordering discussion you reference.

Also, since we're being pedantic and all, obviously '\0' is a valid character. How else could it be part of the string? :D
 
Last edited:
IIRC, ClearCombat overrides defines.lua (for one setting, which PB already makes standard). Does it still? That would be a problem if so. Otherwise, overriding combat_tactics.txt is perfectly compatible with HIP.

There was no defines.lua in the files I downloaded. And the file that is shared with HIP is basically simply substituting some values for others. The files themselves are of the same size and structure.
 
The opposite of "C string" is generally "Pascal string": length header, no terminator. Apple made a funky hybrid with both, to enable easy interoperation of C & Pascal code. :wacko:

C strings seem to be on their way out, but I expect it will still take some years...

On their way out? What, in favor of a "Pascal string"?

They're never going to be on their way out under the hood so long as characters are commonly still handled as fixed-width 1-byte bit sequences. Using a sentinel character/sequence (i.e., this is done in UTF-16 string implementations too; they just use two null bytes) to terminate the string.

Since UTF-8 is pretty much the way of the future, C strings are only likely to continue their predominance. A single null byte signifies the end of string in UTF-8, so basically it will have all the performance advantages which C-strings introduced for string scanning / general string operations on most types of strings and all the advantages of being simple, compatible, and fully supporting the entire Unicode specification for all types of strings.

The other style of string is, however, very common and predominant in network protocols. Not protocols like HTTP, unfortunately, but multi-layer binary protocols that are usually packet-oriented. In that case, it's very handy to not have to assume anything about the binary object encapsulated in a protocol field and merely encode the variable length of the field at the beginning (often with some type of identifier that will be relevant to the unrelated piece of code that actually unpacks and decodes the encapsulated binary object). That way, you always know how many more bytes to expect in advance, can always ensure there's enough network buffer space available for the field in advance, and can treat the field as an opaque structure that could be anything (like a string of ASCII characters).

In general, the same principle applies to generic nested object serialization/deserialization to files (like a binary JSON encoding) or database blobs, although it is usually has nothing to do with the property of knowing the length of the string in advance helping performance or reliability for file I/O like it does for packet-oriented network protocols.

In the end, though everything generally still gets converted to a C-string representation once it's finally in the target application's memory address space simply because of the sheer reduction in the number of instructions and CPU registers in use for, say, doing a lexicographic comparison vs. something that has to check bounds at every step and can't just rely on the elegant property that the end character of the string will simply equal zero whenever it reaches it that point after being able to blindly just increment a memory address for the rest of the characters and not check its current offset from the start vs. the known length each time, requiring a branch prediction on every character evaluated and filling the CPU pipeline with needlessly redundant alternate instruction traces, reducing the IPC the CPU can effectively handle.

EDIT:

There are ways to optimize Pascal-style string scanning that are fairly trivial with pointer arithmetic to avoid most of redundant bounds checking. However, you still can't get away from the branch prediction required by the CPU for each character reference, so they're mostly moot in practice.
 
Last edited:
Both UTF-8 and UTF-16 have the same problem as ASCII in null-terminated strings: NUL is still a valid character that can't be represented. Those implementations that use null-termination aren't technically following the Unicode spec, even those UTF-8 ones that represent NUL as (0xC0, 0x80). UTF-8 has it's own problems, in that 'characters' and 'bytes' are disconnected. This makes proper UTF-8 implementation anything but simple, and prone to bugs. It's a neat compression hack, but to be useful in memory, you have to go back to UTF-16.

Both C++ and Objective-C standard libraries use string objects with a 32 bit (at least) length counter. This makes the length function O(1) instead of O(n). With modern memory size, few are worried about the size of the counter. And the whole "I don't need to check bounds" idea has produced a lot of buggy code when people have forgotten to put the null, or it got overwritten, or the null overwriting something in the next memory.

You can process UTF-16 in modified C strings, working in words instead of bytes, but as stated above NUL is still a legal character (in fact a fairly common one, when serializing objects that have references). So now all code that makes strings has to be complicated by guarding against an actual NUL being in it's input, and possibly valid.

Just as processors started including specialized opcodes for handling Pascal and C strings, I expect they will do so for UTF-16 strings (preferably Pascal-style). Then all the pipelining issues will go away. I'm no expert on x86 assembly, but a quick glance at google shows instructions that either could do this already, or are a small step away from it.
 
Installed the latest HIP, but I'm not sure if everything went well. The thing is about SWMH and NBRT - the toolips show steppes where there should be marshes, all vast real steppe areas are shown as hills (whole Khazaria, Pechengs territory etc.), in Norway the most southern coastal province is listed as arctic though those much more to the north are plains etc. Are these 2 mods incompatible or this is wrong installation?
 
Installed the latest HIP, but I'm not sure if everything went well. The thing is about SWMH and NBRT - the toolips show steppes where there should be marshes, all vast real steppe areas are shown as hills (whole Khazaria, Pechengs territory etc.), in Norway the most southern coastal province is listed as arctic though those much more to the north are plains etc. Are these 2 mods incompatible or this is wrong installation?

Sadly, I think you probably have the install correct. The terrain map of NBRT needs some more synchronization with SWMH. Please post in both the SWMH and NBRT threads (don't worry about double-posting) with your complaints, because they definitely need attention from both teams and will hopefully be addressed by the next release.

Both UTF-8 and UTF-16 have the same problem as ASCII in null-terminated strings: NUL is still a valid character that can't be represented. Those implementations that use null-termination aren't technically following the Unicode spec, even those UTF-8 ones that represent NUL as (0xC0, 0x80). UTF-8 has it's own problems, in that 'characters' and 'bytes' are disconnected. This makes proper UTF-8 implementation anything but simple, and prone to bugs. It's a neat compression hack, but to be useful in memory, you have to go back to UTF-16.

Yes, if one cares about a theoretical NUL character that isn't used as a sentinel (to presumably represent... well, not undefined in practice, since you just don't define something to achieve that effect more cleanly... and not the empty string in a mathematical sense, because that already has a few characters... so, frankly, I've no idea what the real-world need for a semantic NUL character really is now that character-based serial line protocols are basically all dead and wouldn't be using UTF-8 or UTF-16 anyway), I guess that'd be a disadvantage. [...]

Both C++ and Objective-C standard libraries use string objects with a 32 bit (at least) length counter. This makes the length function O(1) instead of O(n). With modern memory size, few are worried about the size of the counter. And the whole "I don't need to check bounds" idea has produced a lot of buggy code when people have forgotten to put the null, or it got overwritten, or the null overwriting something in the next memory.

Yep. In fact, nobody's worried about the extra 4 bytes to cache the length. Of course, the C++ and Objective-C libraries still use NULL-terminated strings in their implementation (and cache the length, so that you have all of the advantages). And, of course, they make full use of the "I don't need to check bounds if I verify my code, especially since it's going to be reused by millions of implementations someday, so we're going to get it right once." license in their find/replace/compare/etc. algorithm implementations. If you don't believe me, here's a question for you: how come, for any std::string instance, I can call the method str.c_str() and it magically returns a C-style string in one instruction? They just return their pointer to the internal C-string in the method, and the method call gets inlined, meaning that std::strings are a syntactic wrapper around an underlying C-string implementation. Read the GNU C++ standard library implementation (or STLport); it's very educational.

Yeah, lots of people have made errors with C-strings. So? With either approach to directly accessing memory, you need to properly check your bounds via the appropriate mechanism. If you screw this up either way, you're going to end up with the same consequences. One of the approaches' bound-checking convention is just intrinsically more efficient for a CPU to do.

Are you arguing with me just to argue? I'm not arguing any of these points that you're bringing-up. That's a shame, because I come across C++ programmers in Paradox Land, well, almost never, and I'd much rather make a friend out of one than go through an introduction that feels like I'm talking to a an ill-tempered programming geek with something to prove on a lot of really bad speed. With deadly vipers interlocked alarmingly in a crown over his bare skull. And an army of rabid pedantic monkeys with typewriters strapped to their chests in tow. :p

Hopefully that gets my point across without offending you. Guess I'll go ahead and respond to the other stuff I didn't bring up and yet is somehow some kind of point of contention too, for consistency's sake... [Plus, I'm often enough the one sounding like you are to me in a slightly different way I imagine, so hey, why not.]

You can process UTF-16 in modified C strings, working in words instead of bytes, but as stated above NUL is still a legal character (in fact a fairly common one, when serializing objects that have references). So now all code that makes strings has to be complicated by guarding against an actual NUL being in it's input, and possibly valid.

Working in half-words, you mean. You must be a Windows guy. All UTF-8/UTF-16 lexers and codecs of which I know the implementation (sadly, quite a few) do convert to 16-bit fixed-width 'character' strings with a sentinel marker at the end (unless ucs4 is the selected Unicode variant, in which case they unpack into 32-bit fixed-width 'characters' internally) after the decoding phase. Your statement about the extra case for NUL realistically has no bearing. Since it is a legal character with a different encoded value than zero, it's not exactly in the critical path. And why does this matter again? Unicode codecs have to jump through about 7,000 other hoops to (de)serialize too. Is that a problem?

Just as processors started including specialized opcodes for handling Pascal and C strings, I expect they will do so for UTF-16 strings (preferably Pascal-style). Then all the pipelining issues will go away. I'm no expert on x86 assembly, but a quick glance at google shows instructions that either could do this already, or are a small step away from it.

Those specialized opcodes have existing for a long time. They're basically implemented in software, as their implementation is far too complex for the CPU to alter its micro-RISC architectural implementation to accomodate. Also, there are some some newer ones now that are aimed at speeding-up cryptographic and checksumming algorithms. Sadly, things are getting a little CISCy again due to applications' inability to scale as well as hoped to many cores. Since they can't push clock rate much at all and adding more cores isn't helping in the consumer sector, they've basically only get ISA extensions for accelerating certain specialized types of CPU-intensive operations to improve IPC. This is more a limitation of the way programmers for the consumer market know how to program (i.e., not scalably in terms of lockless multithreading), as I see it most commonly, as there is still far more performance to be had from proper paradigms to enable exploitation of thread-level parallelism.

So, anyway, Hi: I'm ziji. I may not be who you expected me to be. Likewise. I like working on systems coding and am heavily experienced with such things, although I don't get to actually do it much currently due to my present role in HIP and being in a bit of a weird spot in terms of unexpected turns in life. I keep hoping to meet somebody in this community that shares those interests and qualifications, though, not the least of which because I might be able to put them to work upon projects shaping how people game, but also because I'm passionate about both grand strategy / grand modding and deep coding, and unfortunately, the two seem to so rarely intersect.
 
Installed the latest HIP, but I'm not sure if everything went well. The thing is about SWMH and NBRT - the toolips show steppes where there should be marshes, all vast real steppe areas are shown as hills (whole Khazaria, Pechengs territory etc.), in Norway the most southern coastal province is listed as arctic though those much more to the north are plains etc. Are these 2 mods incompatible or this is wrong installation?

Known Bug, will be fixed ASAP , sorry for that !:eek:o