• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Victoria 3 - Dev Diary #76 - Performance

16_9.jpg

Hello and welcome to this week's Victoria 3 dev diary. This time we will be talking a bit about performance and how the game works under the hood. It will get somewhat detailed along the way and if you are mostly interested in what has improved in 1.2 then you can find that towards the end.

For those of you who don’t know me, my name is Emil and I’ve been at Paradox since 2018. I joined the Victoria 3 team as Tech Lead back in 2020 having previously been working in the same role on other projects.

What is performance​

It’s hard to talk about performance without first having some understanding of what we mean by it. For many games it is mostly about how high fps you can get without having to turn the graphics settings down too far. But with simulation heavy games like the ones we make at PDS another aspect comes into play. Namely tick speed. This metric is not as consistently named across the games industry as fps is, but you might be familiar with the names Ticks Per Second or Updates Per Second from some other games. Here I will instead be using the inverse metric, or how long a tick takes on average to complete in either seconds or milliseconds. Some graphs will be from debug builds and some from release builds, so numbers might not always be directly comparable.

What exactly a tick means in terms of in game time varies a bit. In CK3 and EU4 a tick is a single day, while on HOI4 it's just one hour. For Victoria 3 a tick is six hours, or a quarter of a day. Not all ticks are equal though. Some work might not need to happen as often as others, so we divide the ticks into categories. On Victoria 3 we have yearly, monthly, weekly, daily, and (regular) ticks.

If you thought 1.1 was slow you should have seen the game a year before release…
DD1.png

Content of a tick​

Victoria 3 is very simulation driven and as such there is a lot of work that needs to happen in the tick. To keep the code organized we have our tick broken down into what we call tick tasks. A tick task is a distinct set of operations to perform on the gamestate along with information on how often it should happen and what other tick tasks it depends on before it is allowed to run.

An overview of some of the tick tasks in the game. Available with the console command TickTask.Graph.
DD2.png

Many of the tick tasks are small things just updating one or a few values. On the other hand some of them are quite massive. Depending on how often they run and what game objects they operate on their impact on the game speed will vary. One of the most expensive things in the game is the employment update, followed by the pop need cache update and the modifier update.

Top ten most expensive tick tasks in our nightly tests as of Feb 15. Numbers in seconds averaged from multiple runs using a debug build.
DD3.png

As you can see from the graph above many of our most expensive tick tasks are run on a weekly basis. This combined with the fact that a weekly tick also includes all daily and tickly tick tasks means it usually ends up taking quite long. So let’s dive a bit deeper into what’s going on during a weekly tick. To do this we can use a profiler. One of the profilers we use here at PDS is Optick which is an open source profiler targeted mainly at game development.

Optick capture of a weekly tick around 1890 in a release build.
DD4.png

There’s a lot going on in the screenshot above so let’s break it down a bit. On the left you see the name of the threads we are looking at. First you have the Network/Session thread which is the main thread for the game logic. It’s responsible for running the simulation and acting on player commands. Then we have the primary task threads. The number will vary from machine to machine as the engine will create a different number of task threads depending on how many cores your cpu has. Here I have artificially limited it to eight to make things more readable. Task threads are responsible for doing work that can be parallelized. Then we have the Main Thread. This is the initial thread created by the operating system when the game starts and it is responsible for handling the interface and graphics updates. Then we have the Render Thread which does the actual rendering, and finally we have the secondary task threads. These are similar to the primary ones, but are generally responsible for non game logic things like helping out with the graphics update or with saving the game.

All the colored boxes with text in them are different parts of the code that we’ve deemed interesting enough to have it show up in the profiler. If we want an even more in depth we could instead use a different profiler like Superluminal or VTune which would allow us to look directly at function level or even assembly.

The pink bars indicate a thread is waiting for something. For the task threads this usually means they are waiting for more work, while for the session thread it usually means it is blocked from modifying the game state because the interface or graphics updates need to read from it.

When looking at tick speed we are mostly interested in the session thread and the primary task threads. I’ve expanded the session thread here so we can see what is going on in the weekly tick. There are some things that stand out here.

First we have the commonly occurring red CScopedGameStateRelease blocks. These are when we need to take a break from updating to let the interface and graphics read the data it needs in order to keep rendering at as close to 60 fps as possible. This can’t happen anywhere though, it’s limited to in between tick tasks or between certain steps inside the tick tasks. This is in order to guarantee data consistency so the interface doesn’t fetch data when say just half the country budget has been updated.

The next thing that stands out is again the UpdateEmployment tick task just as seen in the graph above. Here we get a bit more information though. Just at a glance we can see it’s split into (at least) two parts. One parallel and one serial. Ideally we want all work to be done in parallel because that allows us to better utilize modern cpus. Unfortunately not all of the things going on during employment can be done in parallel because it needs to do global operations like creating and destroying pop objects and executing script. So we’ve broken out as much as possible into a parallel pre-step to reduce the serial part as much as possible. There is actually a third step in between here that can’t be seen because it’s too quick, but in order to avoid issues with parallel execution order causing out of syncs between game clients in multiplayer games we have a sorting step in between.

Closer look at the UpdateEmployment tick task.
DD5.png

Modifiers are slow​

One concept that’s common throughout PDS games is modifiers and Victoria 3 is no exception. Quite the opposite. Compared to CK3 our modifier setup is about an order of magnitude more complex. In order to manage this we use a system similar to Stellaris which we call modifier nodes. In essence it’s a dependency management system that allows us to flag modifiers as dirty and only recalculate it and the other modifiers that depend on it. This is quite beneficial as recalculating a modifier is somewhat expensive.

However, this system used to be very single threaded which meant a large part of our tick was still spent updating modifiers. If you look at the graph at the top of this dev diary you can see that performance improved quite rapidly during early 2022. One of the main contributors to this was the parallelization of the modifier node calculations. Since we know which nodes depend on which we can make sure to divide the nodes into batches where each batch only depends on previous batches.

Closer look at the RecalculateModifierNodes tick task.
DD6.png

Countries come in all sizes​

A lot of the work going on in a tick needs to be done for every country in the world. But with the massive difference in scale between a small country like Luxembourg and a large one like Russia some operations are going to sometimes take more than a hundred times as long for one country compared to another. When you do things serially this doesn’t really matter because all the work needs to happen and it doesn’t really matter which one you do first. But when we start parallelizing things we can run into an issue where too many of the larger countries end up on the same thread. This means that after all the threads are done with their work we still have to wait for this last thread to finish. In order to get around this we came up with a system where tick tasks can specify a heuristic cost for each part of the update. This then allows us to identify parts that stand out by checking the standard deviation of the expected computation time and schedule them separately.

One place where this makes a large difference is the country budget update. Not having say China, Russia, and Great Britain all update on the same thread significantly reduces the time needed for the budget update.

(And this is also why the game runs slower during your world conquest playthroughs!)

Closer look at the WeeklyCountryBudgetUpdateParallel tick task. Note the Expensive vs Affordable jobs.
DD7.png

Improvements in 1.2​

I’m going to guess that this is the part most of you are interested in. There have been many improvements both large and small.

If you’ve paid attention to the open beta so far you might have noticed some interface changes relating to the construction queue. With how many people play the game the queue can end up quite large. Unfortunately the old interface here was using a widget type that needs to compute the size of all its elements to properly layout them. Including the elements not visible on screen.

New construction queue interface.
DD8.png

To compound this issue even further the queued constructions had a lot of dependencies on each other in order to compute things like time until completion and similar. This too has been addressed and should be available in today’s beta build.

Side by side comparison of old vs new construction queue.
DD9.gif

DD10.gif

One big improvement to tick speed is a consequence of changes we’ve done to our graphics update. Later in the game updating the map could sometimes end up taking a lot of time which then in turn led to the game logic having to wait a lot for the graphics update. There’s been both engine improvements and changes to our game side code here to reduce the time needed for the graphics update. Some things here include improving the threading of the map name update, optimizing the air entity update, and reducing the work needed to find out where buildings should show up in the city graphics.

Graphics update before/after optimization.
DD11.png

As we talked about above, the employment update has a significant impact on performance. This is very strongly correlated with the number of pops in the game. As in the number of objects, not the total population. Especially in late game you could end up with large amounts of tiny pops which would make the employment update extremely slow. To alleviate this design has tweaked how aggressively the game merges small pops which should improve late game performance. For modders this can be changed with the POP_MERGE_MAX_WORKFORCE and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION defines.

Another improvement we’ve done for 1.2 is replacing how we do memory allocation in Clausewitz. While we’ve always had dedicated allocators for special cases (pool allocators, game object “databases”, etc) there were still a lot of allocations ending up with the default allocator which just deferred to the operating system. And especially on Windows this can be slow. To solve this we now make use of a library called mimalloc. It’s a very performant memory allocator library and basically a drop in replacement for the functionality provided by the operating system. It’s already being used by other large engines such as Unreal Engine. While not as significant as the two things above, it did make the game run around 4% faster when measured over a year about two thirds into the timeline. And since it’s an engine improvement you can likely see it in CK3 as well some time in the future.

In addition to these larger changes there’s also been many small improvements that together add up to a lot. All in all the game should be noticeably faster in 1.2 compared to 1.1 as you can see in the graph below. Unfortunately the 1.1 overnight tests weren’t as stable as 1.2 so for the sake of clarity I cut the graph off at 1871, but in general the performance improvements in 1.2 are even more noticeable in the late game.

Year by year comparison of tick times between 1.1 and 1.2 with 1.2 being much faster. Numbers are yearly averages from multiple nightly tests over several weeks using debug builds.
DD12.png

That’s all from me for this week. Next week Nik will present the various improvements done to warfare mechanics in 1.2, including the new Strategic Objectives feature.
 
  • 120Like
  • 36Love
  • 21
  • 4
Reactions:
I have been reading PDX dev diaries for many years, and I have to say this ranks as one of the most interesting I have seen. Honestly!
Thanks for helping explain what happens in the engine and why this game differs from others in the stable.
 
  • 9Like
  • 2Love
  • 1
Reactions:
The first would lead to inaccurate results. For example the market prices update on the weekly ticks but if you update only part of the market on Monday and another part on Wednesday then on Tuesday the market will do something incorrect.
I would depend on the weekly task and how other tasks interacted with the data. If the results are only needed at the end of the week and do not interact and is done by 700 actors then it might be possibly to do 100 actors a day or 25 actors a tick and still have the results be valid. They did mention they so some things like that but call the daily splits tasks as tickly tasks.

The second doesn't really make sense. You have ticks every six hours but no ticks between this. Every four ticks you get a daily tick no matter what you do even if you offset the hour values.
I guess what I mean is do weekly tasks fall on a tick that includes a daily task. Since there are 4 tasks per day and only one would include daily tasks for days that also include the weekly task you could use one of the three tasks that didn't include the daily. TD-TW-T-T instead of TDW-T-T-T. The same could be don't for Monthly and Yearly so if all the tasks occurred on the same day it could be TD-TW-TM-TY instead of TDWMT-T-T-T.
 
Even tho gpu does gpu things, gpu things require cpu oversight so to speak. So how much impact does graphical effects such as particles and „building profitting” or „sol decreasing” effects we see poping in and out on top of cities impact cpu load? Do they make a tangible impact on cpu and if so can we get a graphics setting to off them?

I’ve never been a big fan of some of those popping fx anyway :D
 
  • 2
Reactions:
Oh great mr.dev, do you perhaps have a fancy graph that tracks the amount of pops over time? I have a general sense that the amount of pops grow but no idea of the extent

I don't have a graph right now, but in general the game tends to start out at between 20k and 25k pops and usually end up somewhere around 100k pops towards the end in our automated tests. Since this is in observer mode the numbers will of course be somewhat different with a human player.

Even tho gpu does gpu things, gpu things require cpu oversight so to speak. So how much impact does graphical effects such as particles and „building profitting” or „sol decreasing” effects we see poping in and out on top of cities impact cpu load? Do they make a tangible impact on cpu and if so can we get a graphics setting to off them?

I’ve never been a big fan of some of those popping fx anyway :D
The vfx (currently) don't affect the cpu side that much, they mostly put strain on the gpu. The most cpu-intensive things for the graphics update is changes to the dynamic terrain and to the city layouts in reaction to what happens in the game. Both of these two are somewhat expensive and we do some things both to parallelize it but also to limit how much is allowed to update visually in a single frame. You can notice for example when starting a new game. If you are very quick to zoom in you'll see cities updating for a short while.
 
  • 15
  • 6Like
Reactions:
If employment is such an expensive calculation, would it make sense to shift employment (and perhaps pop need as well) to a monthly calculation? Most employment related factors don't shift that often on a weekly basis. You lose granularity but the improvement in performance would be noticeable from cutting the number of times it needs to run by 4.
In addition to @egladil's answer I'll say I'm not generally a fan of the solution of putting heavier tasks on a less frequent update cycle. It effectively "hides" the performance problem by just making an annoying freeze happen more rarely. The trick is to get the execution time of the most frequent tick interval down low enough that we can seamlessly update frames in between every tick. So the more common strategy (like Emil also points out in another answer) is to increase the frequency of the updates but process fewer entities on each update.

For example, pops only process their growth once per month, but since this is a pretty heavy calculation that can also involve a lot of pop split/merge operations, we spread it out such that we only process 1/120th of all pops each tick. I'd love to do the same for the employment update, but it wouldn't be possible to break it up by pop since every possible pop in a state could potentially take any job offered in that same state, and we have to evaluate who would be eligible for every job in some priority sequence. We could potentially split it up by state, but if state employment updates on a different schedule than the market then potentially changes in the economy might not have been done in time to properly inform employment, etcetera. So it's possible to make further optimizations there, but it's complex and potentially bug-prone, and as much a design challenge as a technical challenge.
 
  • 24
  • 10Like
  • 2
Reactions:
In addition to @egladil's answer I'll say I'm not generally a fan of the solution of putting heavier tasks on a less frequent update cycle. It effectively "hides" the performance problem by just making an annoying freeze happen more rarely. The trick is to get the execution time of the most frequent tick interval down low enough that we can seamlessly update frames in between every tick. So the more common strategy (like Emil also points out in another answer) is to increase the frequency of the updates but process fewer entities on each update.

For example, pops only process their growth once per month, but since this is a pretty heavy calculation that can also involve a lot of pop split/merge operations, we spread it out such that we only process 1/120th of all pops each tick. I'd love to do the same for the employment update, but it wouldn't be possible to break it up by pop since every possible pop in a state could potentially take any job offered in that same state, and we have to evaluate who would be eligible for every job in some priority sequence. We could potentially split it up by state, but if state employment updates on a different schedule than the market then potentially changes in the economy might not have been done in time to properly inform employment, etcetera. So it's possible to make further optimizations there, but it's complex and potentially bug-prone, and as much a design challenge as a technical challenge.
Could employment be sliced by nations or markets and then spread over multiple ticks (assuming this isn't already done to spread out the weight of the task)?
 
I’ll once again ask you to repost these as new threads so they properly appear on the news feed at the top of the forum. Changing visibility/moving to a different forum section makes it too easy to miss the updates for Vic 3 in particular.
 
  • 6
Reactions:
Could employment be sliced by nations or markets and then spread over multiple ticks (assuming this isn't already done to spread out the weight of the task)?
The issue would be syncing it up with the economy update, and since that needs* to happen on the same tick everywhere in the world in order to get trade to work properly, it's tricky.

* "needs" is questionable here - it's more a matter of it being safer to do it at the same time, to pre-empt bugs and side effects. We could absolutely look into spreading out employment updates on a different schedule and account for any side effects that may arise from it, but it's a task of unknown complexity at the moment.
 
  • 13
  • 1Like
  • 1
Reactions:
I’ll once again ask you to repost these as new threads so they properly appear on the news feed at the top of the forum. Changing visibility/moving to a different forum section makes it too easy to miss the updates for Vic 3 in particular.
We're aware and looking into it! There's some workflow kinks we have to sort out first.
 
  • 13
  • 3Like
Reactions:
I can confirm the public beta has seen some changes to the better. As I have mentioned before, I have a good PC so I didn't get affected as bad as many. But the effect still were there. In my current Belgium game (up to 1920s now) the performance is only slightly down, and the problem is not tick speed. At all. Sometimes the UI is slow for some reason, but that is not consistent. :)
 
  • 4
  • 1Like
Reactions:
I can confirm the public beta has seen some changes to the better. As I have mentioned before, I have a good PC so I didn't get affected as bad as many. But the effect still were there. In my current Belgium game (up to 1920s now) the performance is only slightly down, and the problem is not tick speed. At all. Sometimes the UI is slow for some reason, but that is not consistent. :)
There can still be individual UI panels or such that needs to be tweaked and we do this as we go along but generally tickspeed is favoured :)
 
  • 8
  • 1Like
Reactions:
I’m going to guess that this is the part most of you are interested in. There have been many improvements both large and small.​
TBH, as a software engineer myself, I'm too busy being entranced by these flame graphs. :D

With that said, I'm glad to hear about what looks like squeezing around two to three seconds out of any week after pops start migrating. (I'm also fascinated that the improvement doesn't quite increase proportionally to the overall increase in tick time, but that's veering into the realm of me telling you how to do your job, which neither of us should want.)
 
Oh great mr.dev, do you perhaps have a fancy graph that tracks the amount of pops over time? I have a general sense that the amount of pops grow but no idea of the extent

I don't have a graph right now, but in general the game tends to start out at between 20k and 25k pops and usually end up somewhere around 100k pops towards the end in our automated tests. Since this is in observer mode the numbers will of course be somewhat different with a human player.


The vfx (currently) don't affect the cpu side that much, they mostly put strain on the gpu. The most cpu-intensive things for the graphics update is changes to the dynamic terrain and to the city layouts in reaction to what happens in the game. Both of these two are somewhat expensive and we do some things both to parallelize it but also to limit how much is allowed to update visually in a single frame. You can notice for example when starting a new game. If you are very quick to zoom in you'll see cities updating for a short while.
You know, this is funny, because, four months ago, I suggested adding statistics showing global population (along with global GDP) for Victoria 3. It would probably not be helpful, though, because such statistics does not tell you how many pops as a number of distinct groups/units. Still, it would be pretty cool to see the graph accompanying the number for global population to see whether it doubles, triples, or even quadruples. :p

Now that I think about it, I somehow attained a population of like 100 million or even more for Great Britain in a pre-1.1 game (played through to the end by 1936) even though, in real life, it only has 60 (almost 70) million today. :oops: Just to be clear, I came up with that number for the isles, excluding all other states outside the metropolis so to avoid these states skewing the number.

EDIT: By the way, population given for Great Britain in 1931-1951 (within which the end-game year is in) is over 46 million, but I should also note that the comparison is somewhat complicated because all the states in what is now the Republic of Ireland is still part of Great Britain in my game (mainly because I am too soft to actively oppress them. ;)).
 
Last edited:
  • 2Like
Reactions:
The issue would be syncing it up with the economy update, and since that needs* to happen on the same tick everywhere in the world in order to get trade to work properly, it's tricky.

* "needs" is questionable here - it's more a matter of it being safer to do it at the same time, to pre-empt bugs and side effects. We could absolutely look into spreading out employment updates on a different schedule and account for any side effects that may arise from it, but it's a task of unknown complexity at the moment.
Willfully misunderstanding this comment, I can't help thinking that it would kind of simulate market communication friction for each state to be updated out of phase with the actual market and then have the market produce whatever it does based on the employment at market update time. On the other hand, and I assume this is your actual worry, there are enough ways to get employment to thrash between buildings as it is, and decoupling employment from market signals would almost certainly make this worse. It's fun to talk about simulating bad investments, but a lot of people (possibly myself included) would find it infuriating to actually play around.
 
  • 2Like
  • 1
Reactions:
You know, this is funny, because, four months ago, I suggested adding statistics showing global population (along with global GDP) for Victoria 3. It would probably not be helpful, though, because such statistics does not tell you how many pops as a number of distinct groups/units. Still, it would be pretty cool to see the graph accompanying the number for global population to see whether it doubles, triples, or even quadruples. :p

Now that I think about it, I somehow attained a population of like 100 million or even more for Great Britain in a pre-1.1 game (played through to the end by 1936) even though, in real life, it only has 60 (almost 70) million today. :oops: Just to be clear, I came up with that number for the isles, excluding all other states outside the metropolis so to avoid these states skewing the number.

EDIT: By the way, population given for Great Britain in 1931-1951 (within which the end-game year is in) is over 46 million, but I should also note that the comparison is somewhat complicated because all the states in what is now the Republic of Ireland is still part of Great Britain in my game (mainly because I am too soft to actively oppress them. ;)).

We do log this for all our automated tests. But since it's after office hours in Sweden I'm not logged in to my work machine and so I can't produce a graph of it right this moment. I'll see if I can dig one out tomorrow :)
 
  • 20Like
Reactions:
he most cpu-intensive things for the graphics update is changes to the dynamic terrain and to the city layouts in reaction to what happens in the game. Both of these two are somewhat expensive

While I appreciate all your sweet graphics I'd love to play without the fancy map if that means I can squeak out a few more UPS... :/
 
  • 1Like
Reactions:
The issue would be syncing it up with the economy update, and since that needs* to happen on the same tick everywhere in the world in order to get trade to work properly, it's tricky.

* "needs" is questionable here - it's more a matter of it being safer to do it at the same time, to pre-empt bugs and side effects. We could absolutely look into spreading out employment updates on a different schedule and account for any side effects that may arise from it, but it's a task of unknown complexity at the moment.

On the subject of eployment checks. Hearing that it is such a heavy calculation is concerning. One big issue I am running into is radicalism from fiering from buildings. The biggest source of this is Urban Centers that can grow to gargantuan proportions. so when they start oscilating it is a lot of people getting hired and fired as the hireing and fireing works on fixed percentages. I am therefore guessing that putting in place a system that makes the swings smaller when it starts oscilating is not feasible?
 
As we talked about above, the employment update has a significant impact on performance. This is very strongly correlated with the number of pops in the game. As in the number of objects, not the total population. Especially in late game you could end up with large amounts of tiny pops which would make the employment update extremely slow. To alleviate this design has tweaked how aggressively the game merges small pops which should improve late game performance. For modders this can be changed with the POP_MERGE_MAX_WORKFORCE and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION defines.​
What about reducing the culture sprawl on top of this ?
I think the V2 [continent] minor were a very viable concept
 
Two questions,

1. Could have an option to turn off some visual parts of the game, like the queue, that hinder performance?
2. What are some parts of the game that had to be turned down due to the physical restraints on performance? For instance, connected subtrees of the market disconnected from the main node seem too be doable in O(f(n) a(n)), where n is the inverse ackerman function (using the Union find datastructure) and f(n) is the cost function of the market calculation. To my theory oriented brain, this seems feasible, but I suspect there will be practical implications.