About the Clausewitz Engine...

Nuraihyon · Mar 9, 2017

That's a false equivalence. A task that is completely impossible to do in parallel (one woman growing a baby) cannot be lumped together with a task that is conceptually easy to do in parallel (calculating the various aspects in a PDS game).
So really, a more accurate example (if you insist on using childbirth as your analogy) would be that sixteen children are born, and currently one woman is giving birth nine times in a row while seven other women each give birth once, then wait around for roughly 6 years before repeating the cycle. Some people look at this situation and think that maybe there's a more efficient distribution.

Groogy · Mar 9, 2017

No it is actually a correct equivalence to how data science works. Most tasks that you program a game to do are not naturally parallel friendly. We have to make them and squeeze it in wherever we can. For instance the second character code interacts with another characters data, then it immediately stops being capable of running in parallel.

Also it isn't a joke, it is a phrase often used when you talk about man hours.

thedarkendstar · Mar 10, 2017

Groogy said:
No it is actually a correct equivalence to how data science works. Most tasks that you program a game to do are not naturally parallel friendly. We have to make them and squeeze it in wherever we can. For instance the second character code interacts with another characters data, then it immediately stops being capable of running in parallel.

Also it isn't a joke, it is a phrase often used when you talk about man hours.

Isn't it somewhat true while it may not double the speed you still get a faster game just not twice as fact so while the saying makes sense there is a speed increase correct?

Groogy · Mar 10, 2017

thedarkendstar said:
Isn't it somewhat true while it may not double the speed you still get a faster game just not twice as fact so while the saying makes sense there is a speed increase correct?

Yes which is why our games does run faster with more cores.

AndrewT · Mar 10, 2017

thedarkendstar said:
Isn't it somewhat true while it may not double the speed you still get a faster game just not twice as fact so while the saying makes sense there is a speed increase correct?

Yes ... as long as the app structure is suited to multiple threading, and it has been written accordingly.

Writing apps that are capable of multiple parallel threads is hard, is more liable to timing conflicts, and harder to diagnose when things go wrong.

It is NOT the magic answer to all performance problems some people seem to think it is. If you are bottlenecked on writing a save file for instance, you can't split that job over two processes or you get corrupted save files. You can have one process write out a save file while others do other things ... as long as they don't change a part of the game you haven't yet saved. If you forget that, again you get corrupted save files.

jpd · Mar 10, 2017

Note that utilizing multiple cores covers a wider area than just how many threads an application has that can run in parallel. A system running a game consists of more that just the game that's running (unlike in the good old days of MS-DOS). The OS itself runs multiple threads, as do all the services it has active. Also, some API calls an application makes can actually run in background threads, spawned by the OS itself, unbeknown to the application. All these benefit from having multiple cores, even if the application itself would be single thread/single core.

By offloading all these background processes to the other cores, even a single thread/single core application benefits, as it doesn't have to share that core with background processes.

Writing contents to a file is one such task. When an application writes contents to a file, it doesn't need to wait until all the writing is completed, and actually present in the file on disk. The OS intervenes, copies all to-be-written data to internal buffers, and then uses a background thread (possibly running on a different core) to handle the actual writing of this buffered data to disk, while the application's thread is already doing something else, in parallel.

Having said that, there is of course low hanging fruit that applications can take advantage of, such as handing the playing of (background) music off to a separate thread, which could then run on a separate core. And I'm pretty sure the Clausewitz engine does that. Other tasks that don't rely on sharing memory with the game engine are also obvious candidates for background threads running on other cores.

But as soon as multiple threads need access to the same pool of shared data, then you need to regulate that access, or risk corruption of data (as AndrewT pointed out). This means that such access needs to be serialized. Which, as the name suggests, allows only one thread at each given time access. All the others have to wait. And if this happens a lot (which you can assume in a game engine), you end up in a situation where pretty much only one of multiple parallel threads is actually executing at any given moment in time. If that's the case, then it's more efficient to run that code in one single thread, and eliminate the overhead of this serialization.

There is no magic bullet here. No golden solution. Each case has to be reviewed, analyzed and profiled to see what the best solution is. And realize that splitting up your code over multiple , parallel running threads introduces it's own additional overhead that isn't needed if you keep an algorithm in single thread.

Examples where multi threading works great is, for example, a web server. Each connected user to the server will be serviced via it's own thread, running in parallel with the threads for the other connected users. Same thing if the web server hosts multiple web sites. Each web site will be serviced by it's own listener thread, running in parallel with all the others. And this works, because neither the various web sites, nor the various users connecting to it make use of (much) share data. Which means that each of these threads can run unhindered, without (much) need for serialization.

TheDeadlyShoe · Mar 11, 2017

Facepalm said:
Isn't it more a matter of budget/need though?

Building a Paradox game/engine from the ground up with maximum multicore in mind looks easier then some big chaotic 3D game.
Restricting entanglement should be fairly comprehensible, considering the static nature of much of the games?

it is way harder than it looks like from a laymans perspective. Anywhere anything interacts, there's a potential conflict for multithreading. And everything interacts.

Grallak · Mar 12, 2017

You guys just have to read this. Apparently wogudwkd12 found a way to increase the game speed by 30%. I don't know if this solution could apply to vanilla as well.

link removed - Had a dad

Groogy · Mar 12, 2017

Relying on pure MTTH events is a performance hog is something we, and modders, have known for ages. Which is why we have the on_actions now days.

Trin Tragula · Mar 12, 2017

Grallak said:
You guys just have to read this. Apparently wogudwkd12 found a way to increase the game speed by 30%. I don't know if this solution could apply to vanilla as well.

There is nothing there that should be unknown to either the MEIOU team or to us

The quoted part was common knowledge even back when I modded EU3.
The effect of optimizing script will be much larger in general on MEIOU than in vanilla though as the entire performance difference between the two can be attributed to script more or less. Triggered modifiers are another major resource hog script side btw (but I believe the MEIOU team is aware of this as well).

Grallak · Mar 12, 2017

Trin Tragula said:
There is nothing there that should be unknown to either the MEIOU team or to us The quoted part was common knowledge even back when I modded EU3.
The effect of optimizing script will be much larger in general on MEIOU than in vanilla though as the entire performance difference between the two can be attributed to script more or less. Triggered modifiers are another major resource hog script side btw (but I believe the MEIOU team is aware of this as well).

I see, too bad.

Would have been nice have a 'blitz' in performance. (Lol blitz, how equivocate)

gluck3d · Mar 16, 2017

100% usage of one core is not a windows idle process, but some thread in a game that is not/can not be paralleled.
Unfortunately this is an issue for Stellaris late game for example where it starts to slow down to unplayable speed.

At the same time, to parallel an existing code is a tough task which require significant resource investment. In modern days this means both expert programmers and a huge salary that you have to pay them.
A common view of business/project management on investing into technological debt is "How we gonna sell this?". And answer to that question is not good enough unless the game becomes totally unplayable due to lags and sales would go down no matter how many good functional DLCs are released.

From my personal experience with PDX games, it is not a huge problem for EU, CK or HOI (I'm playing solo only). But Stellaris is now half-alive now due lags, so probably someday PDX devs will be able to convince business to invest into engine refactoring as well

jpd · Mar 16, 2017

100 % load on a core, as displayed in the Task Manager, only means that no time is spent in the System Idle thread. It still does not mean that the application/game engine is busy doing calculations.

I can, quite easily in fact, create an application that does absolutely nothing but waiting for key strokes to be pressed, and echo the characters typed in a window, and still have it show up in the Task Manager as a 100% CPU load. Simply by having my own idle loop.

Another way would be to update the UI while having nothing else to calculate. That yanks up the FPS rate immensely, show up as 100% CPU load in the Task Manager, while still basically do no (game engine related) calculations at all.

As people always seem to like high frame rate values, game developers give them just that by replacing the System Idle thread with one of their own, which constantly repaints the graphics display. It's still an idle loop (as usually nothing, or very little happens between two successive frames), but not one that shows up in the Task Manager. In implementations like this, you must not look at the CPU load in the task manager to gauge if the application is doing calculations, but at the value of the frame rate. In such an implementation, you would only see less than 100% CPU load in the Task Manager, if you limit the FPS rate to a preset maximum. For example, by tying the frame rate to the VSync of the monitor.

The latter is something I do with World of Warcraft, Star Craft and Diablo 3.

gluck3d · Mar 16, 2017

jpd said:
100 % load on a core, as displayed in the Task Manager, only means that no time is spent in the System Idle thread. It still does not mean that the application/game engine is busy doing calculations.
...

I'd name both examples above as a bad coding practice and I doubt that this is the case with Clauzewitz.
Also taking into account the amount of performance issues with games, I'd bet that all unnecessary loops were already removed/remediated even if ever existed.

Game is just calculation-heavy (thats why we like it) and has already grown out of the engine architecture.
Inability to load all available cores while lagging is a clear symptom of this (as long as there are no evidences that lagging may be caused by other things like memory, graphics, disk issues or anything like that).

jpd · Mar 16, 2017

You were stating that 100% CPU load automatically means that an app/game is choking on it's load of calculations. All I was pointing out is that you cannot leap to that conclusion. There are various types of idle loops. One that repaints the user interface on every pass of the loop is one of them. Still produces 100% CPU load, while doing absolutely nothing in the form of back-end calculations. And gives the warm, fuzzy feeling of seeing a high FPS value.

Now you pull a new fact out of the hat (which you did not mention before): a lagging UI (either permanently or in spike form, which you don't specify). That's an indication of the (above mentioned) main loop doing one iteration not fast enough to respond to UI input in a timely fashion.

Lagging UI does not automatically mean the game engine is choked with back-end calculations which could (somehow) be alleviated by using more cores more extensively than it does now. Maybe your graphics subsystem just isn't fast enough to render one single frame in such a timely fashion your keyboard and mouse input can be processed without noticeable delay (which, for humans, is in the ballpark of 0.1 - 0.2 seconds). In which case you can optimize the game engine with as many cores as you like until pigs learn to fly, but it won't solve the lag.

Also, even if you have lag spikes (which are likely caused by back-end calculations) the noticeable lag won't necessarily be removed or reduced by using more cores. And that comes back to how easy it is to split the calculations over multiple threads without having to be serialized frequently to get access to shared data without data corruptions or data inconsistencies. And how fast these cores actually run. If you have an Intel CPU, you have TurboBoost. Meaning, if only one core is used, the others will be slowed down. The thermal room this creates inside the CPU package is then used up by overclocking the CPU core under load. Meaning that the net effect of having the back-end calculations in a single thread, and without any syncing/serialisation code overhead may very well be performing better than having the same calculations in multiple threads, on multiple cores, but with all the syncing/serlalisation code overhead added, all running on cores that are now clocked at a significantly lower clock speed.

gluck3d · Mar 16, 2017

Idle loop to redraw the screen when nothing has changed to increase FPS numbers is just a bad coding to spread additional power into atmosphere. I am pretty sure that this is not the case, it is below level of professional game studios.

I was also describing constantly lagging late-game Stellaris. It has nothing with UI redraw and is not an indication of the graphics engine approach that you have described before (it just makes no sense to trigger additional redraws if you can't draw previous one in time).

Also my GPU subsystem is 1080 GTX SLI, so I doubt it is the choke point ROFL

Not sure why you are rejecting a simple idea that if the only system that is shown in OS monitoring to be near to the limit is a single core of CPU out of 4 physical ones, it is something else other than lack of calculation speed in mostly single-thread game design? Which may be effectively increased by utilizing parallelism.

Using Occam razor, it is the simplest reason for the lags in Stellaris case, while you can of course press on other possible issues up to the Putin hacker playing in my computer when I play the game.

RichyYoung · Mar 24, 2017

Will the engine be moved over from dx11 to dx12 because that api is specifically designed to address the poor load balancing across cores and not just cpu cores but the gpu cores as well.

I would love to know why paradox hasn't jumped at the chance to add this as it's a big boost for games with complex calculations, ashes of the singularity has built there game around dx12 to allow more units on screen.

This could address the big slow down that happens late game in stellaris.

Wagonlitz · Mar 25, 2017

Had a dad said:
If you want some fun and have an i7 launch 2 Pds games at the same time and run them.

Wouldn't that give quite a slowdown?

jpd said:
As far as most threads ending up on one core instead of being evenly spread. That, too, is due to the Windows thread scheduler. It prefers threads belonging to the same process to be run on the same core, as that places a lesser burden on the cache controllers inside the CPU. Each core has a private level 1 cache, and all cores share the level 2/3 caches. Windows (rightly) assumes that threads of the same process use the same memory pool. When these run on multiple cores, the CPU needs to put in extra time to maintain the integrity of these level 1 caches, slowing the cores down.

I assume that Linux does something similar?

jpd said:
Edit : I don't know if the Windows Scheduler actively supports it, but keeping most threads on one core when possible utilises Intel's Turbo Boost feature to the max. With this feature, cores that aren't doing very much get their clock speed reduced (thus drawing less power and emmenating less heat) so that the thus freed room in power draw and heat generation can be put on the one core that's busy, by cranking the clock speed of that core over the maximum. In short, workload that put's two cores at 50% runs actually slower than putting all that workload on one single core.

I've always wondered. Does the turbo boost start automatically?

jpd said:
And this works, because neither the various web sites, nor the various users connecting to it make use of (much) share data. Which means that each of these threads can run unhindered, without (much) need for serialization.

What is this little shared data?

gluck3d said:
But Stellaris is now half-alive now due lags

In fact the late game lag is the reason I don't really play Stellaris anymore.

jpd said:
As people always seem to like high frame rate values, game developers give them just that by replacing the System Idle thread with one of their own, which constantly repaints the graphics display. It's still an idle loop (as usually nothing, or very little happens between two successive frames), but not one that shows up in the Task Manager. In implementations like this, you must not look at the CPU load in the task manager to gauge if the application is doing calculations, but at the value of the frame rate. In such an implementation, you would only see less than 100% CPU load in the Task Manager, if you limit the FPS rate to a preset maximum. For example, by tying the frame rate to the VSync of the monitor.

That would explain why I get 150+ fps when paused in PDS games.

jpd said:
Also, even if you have lag spikes (which are likely caused by back-end calculations) the noticeable lag won't necessarily be removed or reduced by using more cores.

By lag spike do you then mean how PDS games at late game can end up lagging when you move around the map and the game is running, but no lag moving around the map if the game is paused? (Only think I really have experienced it in CKII and it pretty much only happens when I make my monster dynasties (dynasties with 10k+ total members and 4k+ living members. It's so tempting and beautiful to have most/all landowners in the World of the same dynasty, but it gives huge slowdowns unfortunately.)

gluck3d said:
Idle loop to redraw the screen when nothing has changed to increase FPS numbers is just a bad coding to spread additional power into atmosphere. I am pretty sure that this is not the case, it is below level of professional game studios.

I assume that PI does use those idle loops given how you get 150+ fps when the game's paused.

AndrewT · Mar 25, 2017

RichyYoung said:
Will the engine be moved over from dx11 to dx12 because that api is specifically designed to address the poor load balancing across cores and not just cpu cores but the gpu cores as well.

All current PDS games use DirectX 9.0c - solely, so far as I'm aware.

Philadelphus · Mar 25, 2017

AndrewT said:
All current PDS games use DirectX 9.0c - solely, so far as I'm aware.

And presumably OpenGL for CKII and on, given I can play them natively on Linux.

Though that makes me wonder, have you guys thought about using Vulcan for future versions of Clausewitz?

About the Clausewitz Engine...

Second Lieutenant

Fossilized Platypus

General

Fossilized Platypus

The Full Monty Python

Entil'Zha Anla'Shok

Lt. General

Major

Fossilized Platypus

Design Lead - Crusader Kings 3

Major

First Lieutenant

Entil'Zha Anla'Shok

First Lieutenant

Entil'Zha Anla'Shok

First Lieutenant

Private

Resident WW Foreigner

The Full Monty Python

Not the Ptolemy. Nor the shrub.