• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Victoria 3 - Dev Diary #76 - Performance

16_9.jpg

Hello and welcome to this week's Victoria 3 dev diary. This time we will be talking a bit about performance and how the game works under the hood. It will get somewhat detailed along the way and if you are mostly interested in what has improved in 1.2 then you can find that towards the end.

For those of you who don’t know me, my name is Emil and I’ve been at Paradox since 2018. I joined the Victoria 3 team as Tech Lead back in 2020 having previously been working in the same role on other projects.

What is performance​

It’s hard to talk about performance without first having some understanding of what we mean by it. For many games it is mostly about how high fps you can get without having to turn the graphics settings down too far. But with simulation heavy games like the ones we make at PDS another aspect comes into play. Namely tick speed. This metric is not as consistently named across the games industry as fps is, but you might be familiar with the names Ticks Per Second or Updates Per Second from some other games. Here I will instead be using the inverse metric, or how long a tick takes on average to complete in either seconds or milliseconds. Some graphs will be from debug builds and some from release builds, so numbers might not always be directly comparable.

What exactly a tick means in terms of in game time varies a bit. In CK3 and EU4 a tick is a single day, while on HOI4 it's just one hour. For Victoria 3 a tick is six hours, or a quarter of a day. Not all ticks are equal though. Some work might not need to happen as often as others, so we divide the ticks into categories. On Victoria 3 we have yearly, monthly, weekly, daily, and (regular) ticks.

If you thought 1.1 was slow you should have seen the game a year before release…
DD1.png

Content of a tick​

Victoria 3 is very simulation driven and as such there is a lot of work that needs to happen in the tick. To keep the code organized we have our tick broken down into what we call tick tasks. A tick task is a distinct set of operations to perform on the gamestate along with information on how often it should happen and what other tick tasks it depends on before it is allowed to run.

An overview of some of the tick tasks in the game. Available with the console command TickTask.Graph.
DD2.png

Many of the tick tasks are small things just updating one or a few values. On the other hand some of them are quite massive. Depending on how often they run and what game objects they operate on their impact on the game speed will vary. One of the most expensive things in the game is the employment update, followed by the pop need cache update and the modifier update.

Top ten most expensive tick tasks in our nightly tests as of Feb 15. Numbers in seconds averaged from multiple runs using a debug build.
DD3.png

As you can see from the graph above many of our most expensive tick tasks are run on a weekly basis. This combined with the fact that a weekly tick also includes all daily and tickly tick tasks means it usually ends up taking quite long. So let’s dive a bit deeper into what’s going on during a weekly tick. To do this we can use a profiler. One of the profilers we use here at PDS is Optick which is an open source profiler targeted mainly at game development.

Optick capture of a weekly tick around 1890 in a release build.
DD4.png

There’s a lot going on in the screenshot above so let’s break it down a bit. On the left you see the name of the threads we are looking at. First you have the Network/Session thread which is the main thread for the game logic. It’s responsible for running the simulation and acting on player commands. Then we have the primary task threads. The number will vary from machine to machine as the engine will create a different number of task threads depending on how many cores your cpu has. Here I have artificially limited it to eight to make things more readable. Task threads are responsible for doing work that can be parallelized. Then we have the Main Thread. This is the initial thread created by the operating system when the game starts and it is responsible for handling the interface and graphics updates. Then we have the Render Thread which does the actual rendering, and finally we have the secondary task threads. These are similar to the primary ones, but are generally responsible for non game logic things like helping out with the graphics update or with saving the game.

All the colored boxes with text in them are different parts of the code that we’ve deemed interesting enough to have it show up in the profiler. If we want an even more in depth we could instead use a different profiler like Superluminal or VTune which would allow us to look directly at function level or even assembly.

The pink bars indicate a thread is waiting for something. For the task threads this usually means they are waiting for more work, while for the session thread it usually means it is blocked from modifying the game state because the interface or graphics updates need to read from it.

When looking at tick speed we are mostly interested in the session thread and the primary task threads. I’ve expanded the session thread here so we can see what is going on in the weekly tick. There are some things that stand out here.

First we have the commonly occurring red CScopedGameStateRelease blocks. These are when we need to take a break from updating to let the interface and graphics read the data it needs in order to keep rendering at as close to 60 fps as possible. This can’t happen anywhere though, it’s limited to in between tick tasks or between certain steps inside the tick tasks. This is in order to guarantee data consistency so the interface doesn’t fetch data when say just half the country budget has been updated.

The next thing that stands out is again the UpdateEmployment tick task just as seen in the graph above. Here we get a bit more information though. Just at a glance we can see it’s split into (at least) two parts. One parallel and one serial. Ideally we want all work to be done in parallel because that allows us to better utilize modern cpus. Unfortunately not all of the things going on during employment can be done in parallel because it needs to do global operations like creating and destroying pop objects and executing script. So we’ve broken out as much as possible into a parallel pre-step to reduce the serial part as much as possible. There is actually a third step in between here that can’t be seen because it’s too quick, but in order to avoid issues with parallel execution order causing out of syncs between game clients in multiplayer games we have a sorting step in between.

Closer look at the UpdateEmployment tick task.
DD5.png

Modifiers are slow​

One concept that’s common throughout PDS games is modifiers and Victoria 3 is no exception. Quite the opposite. Compared to CK3 our modifier setup is about an order of magnitude more complex. In order to manage this we use a system similar to Stellaris which we call modifier nodes. In essence it’s a dependency management system that allows us to flag modifiers as dirty and only recalculate it and the other modifiers that depend on it. This is quite beneficial as recalculating a modifier is somewhat expensive.

However, this system used to be very single threaded which meant a large part of our tick was still spent updating modifiers. If you look at the graph at the top of this dev diary you can see that performance improved quite rapidly during early 2022. One of the main contributors to this was the parallelization of the modifier node calculations. Since we know which nodes depend on which we can make sure to divide the nodes into batches where each batch only depends on previous batches.

Closer look at the RecalculateModifierNodes tick task.
DD6.png

Countries come in all sizes​

A lot of the work going on in a tick needs to be done for every country in the world. But with the massive difference in scale between a small country like Luxembourg and a large one like Russia some operations are going to sometimes take more than a hundred times as long for one country compared to another. When you do things serially this doesn’t really matter because all the work needs to happen and it doesn’t really matter which one you do first. But when we start parallelizing things we can run into an issue where too many of the larger countries end up on the same thread. This means that after all the threads are done with their work we still have to wait for this last thread to finish. In order to get around this we came up with a system where tick tasks can specify a heuristic cost for each part of the update. This then allows us to identify parts that stand out by checking the standard deviation of the expected computation time and schedule them separately.

One place where this makes a large difference is the country budget update. Not having say China, Russia, and Great Britain all update on the same thread significantly reduces the time needed for the budget update.

(And this is also why the game runs slower during your world conquest playthroughs!)

Closer look at the WeeklyCountryBudgetUpdateParallel tick task. Note the Expensive vs Affordable jobs.
DD7.png

Improvements in 1.2​

I’m going to guess that this is the part most of you are interested in. There have been many improvements both large and small.

If you’ve paid attention to the open beta so far you might have noticed some interface changes relating to the construction queue. With how many people play the game the queue can end up quite large. Unfortunately the old interface here was using a widget type that needs to compute the size of all its elements to properly layout them. Including the elements not visible on screen.

New construction queue interface.
DD8.png

To compound this issue even further the queued constructions had a lot of dependencies on each other in order to compute things like time until completion and similar. This too has been addressed and should be available in today’s beta build.

Side by side comparison of old vs new construction queue.
DD9.gif

DD10.gif

One big improvement to tick speed is a consequence of changes we’ve done to our graphics update. Later in the game updating the map could sometimes end up taking a lot of time which then in turn led to the game logic having to wait a lot for the graphics update. There’s been both engine improvements and changes to our game side code here to reduce the time needed for the graphics update. Some things here include improving the threading of the map name update, optimizing the air entity update, and reducing the work needed to find out where buildings should show up in the city graphics.

Graphics update before/after optimization.
DD11.png

As we talked about above, the employment update has a significant impact on performance. This is very strongly correlated with the number of pops in the game. As in the number of objects, not the total population. Especially in late game you could end up with large amounts of tiny pops which would make the employment update extremely slow. To alleviate this design has tweaked how aggressively the game merges small pops which should improve late game performance. For modders this can be changed with the POP_MERGE_MAX_WORKFORCE and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION defines.

Another improvement we’ve done for 1.2 is replacing how we do memory allocation in Clausewitz. While we’ve always had dedicated allocators for special cases (pool allocators, game object “databases”, etc) there were still a lot of allocations ending up with the default allocator which just deferred to the operating system. And especially on Windows this can be slow. To solve this we now make use of a library called mimalloc. It’s a very performant memory allocator library and basically a drop in replacement for the functionality provided by the operating system. It’s already being used by other large engines such as Unreal Engine. While not as significant as the two things above, it did make the game run around 4% faster when measured over a year about two thirds into the timeline. And since it’s an engine improvement you can likely see it in CK3 as well some time in the future.

In addition to these larger changes there’s also been many small improvements that together add up to a lot. All in all the game should be noticeably faster in 1.2 compared to 1.1 as you can see in the graph below. Unfortunately the 1.1 overnight tests weren’t as stable as 1.2 so for the sake of clarity I cut the graph off at 1871, but in general the performance improvements in 1.2 are even more noticeable in the late game.

Year by year comparison of tick times between 1.1 and 1.2 with 1.2 being much faster. Numbers are yearly averages from multiple nightly tests over several weeks using debug builds.
DD12.png

That’s all from me for this week. Next week Nik will present the various improvements done to warfare mechanics in 1.2, including the new Strategic Objectives feature.
 
  • 120Like
  • 36Love
  • 21
  • 4
Reactions:
Speaking of performance - why does it take so long to shut down the game?

I mean, start up I can see having to unfold a lot of stuff, but after I have saved the game and exit to desktop, why does it take several minutes for the (non-beta) game to actually return memory and disappear? (Last time we're talking ten minutes or more, but that one was an outlier. Earlier it has been fewer minutes, but still several)
 
Speaking of performance - why does it take so long to shut down the game?

I mean, start up I can see having to unfold a lot of stuff, but after I have saved the game and exit to desktop, why does it take several minutes for the (non-beta) game to actually return memory and disappear? (Last time we're talking ten minutes or more, but that one was an outlier. Earlier it has been fewer minutes, but still several)
Are you sure that's not an issue that only you are having, that the devs are even aware of this? That's bizarre. I have an outdated office laptop, so the performance is a pain, but it does shut down when asked in a matter of seconds.

I'd make an uneducated guess something's wrong with your HDD.
 
  • 1Like
  • 1
  • 1
Reactions:
Speaking of performance - why does it take so long to shut down the game?

I mean, start up I can see having to unfold a lot of stuff, but after I have saved the game and exit to desktop, why does it take several minutes for the (non-beta) game to actually return memory and disappear? (Last time we're talking ten minutes or more, but that one was an outlier. Earlier it has been fewer minutes, but still several)
Do you have 8 GB of RAM and HDD?
It seems like game is using windows pagefile in such case, so it takes time to exit completely.
Work around: First save game, then force shutdown it.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

Is it possible to implement equivalent of Unity DOTS stuff in this engine?
It claims to be very performant.
 
  • 1
  • 1Like
  • 1
Reactions:
Just curious but could you take some of the more expensive parallel weekly tasks and start their calculations a few days prior and just pool the results to push on the weekly tick? I imagine the variation in accuracy from the base data being updated wouldn't be that noticeable and then we wouldn't have such a long freeze.
 
  • 1
Reactions:
Do you have 8 GB of RAM and HDD?
It seems like game is using windows pagefile in such case, so it takes time to exit completely.
Work around: First save game, then force shutdown it.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

Is it possible to implement equivalent of Unity DOTS stuff in this engine?
It claims to be very performant.
24 Gb RAM, but yea, Vicky only uses 8Gb of that. [Edit: Now I'm starting Viccy and it's using 11 Gb+, it was just at the latest shutdown it held on to 8Gb for a very long time, then handed memory back in like 2Gb increments, also over a long time]

Either way, why does it need to pagefile for a century and a day during shutdown, when I can just "force shutdown" (because as I implied, I don't use the 'save on exit' function, it's just slow anyway)
 
Is it possible to implement equivalent of Unity DOTS stuff in this engine?
It claims to be very performant.
ECS like Unity DOTS is great for games with lots of entities that don't have that many dependencies on each other. Its the perfect fit for e.g. particle systems, and also fits very well for entities in an FPS or MMO. But in Victoria 3 everything depends on everything. We do make limited use of similar approaches where it makes sense, especially on engine level. But it unfortunaley isn't a magic bullet we can apply to everything.

(And since I know the word engine is commonly misunderstood: The engine is responsible for providing basic functionality like rendering, input handling, access to the filesystem, steam integration, the multiplayer layer, the gui system, the script system etc. All game mechanic related things like countries, characters, pops etc are on game level and not engine level.)

Just curious but could you take some of the more expensive parallel weekly tasks and start their calculations a few days prior and just pool the results to push on the weekly tick? I imagine the variation in accuracy from the base data being updated wouldn't be that noticeable and then we wouldn't have such a long freeze.
The parallel tasks are internally parallel, not parallel with each other. In fact most of them depend on the result of one or more of the other tasks so moving them to separate ticks could lead to very different outcomes in behavior visible to the player.

There would also be issue with letting tasks run across multiple ticks. In order to work the task needs to work on a consistent gamestate, and if multiple ticks happen during that time the data used for that task would end up being a mix of things. And taking a snapshot of the data isn't really an option due to the amount of RAM needed. We can't really store two copies of the gamestate. Finally it would also complicate saving the game a lot, as a save can happen between any two ticks. If a task is still in progress at this point you would need to store all the intermediate data somehow.
 
Last edited:
  • 5
  • 2Like
  • 1
Reactions:
I know you guys have to develop the game for a super wide range of systems, but many of us who love Paradox games will actually change our hardware purchasing decisions if something makes a big difference in these games.

I'm really curious whether the game is aware of asymmetric CPU architectures - meaning Alder/Raptor lake Intel CPUs (efficiency cores and performance cores), and the upcoming 7900X3D/7950X3D from AMD (different cache/clock on half the cores). Does the game have to do anything to indicate what core is better for a given thread? Or is that distribution completely out of the game's hands - its up to the OS/drivers/etc?

With how well the AMD 5800X3D has performed in paradox and other simulation games, I've been intending to buy a new 7000X3D after its released in a few days here. I certainly might wait until we actually see some benchmarks. But I'm super curious whether this asymmetric CPU stuff is something game devs have to account for or if its abstracted.
 
  • 4Like
Reactions:
@egladil How much RAM are we theoratically talking? Something like 32-64GB so far to expensive on a businessside to lean on, but still quite obtainable for a regular user, or something ridicules like 128GB+?

The game itself isn't as big, but probably compressed in some form and an uncompressed full gamestate could be bigger.
 
  • 6Haha
  • 1
Reactions:
He's had 1, or maybe 20, more cups of coffee that are considered safe for human consumption?
View attachment 950398
One, because a second would have spared me the typo...28x more ticks, depending on daily or weekly progress. I could even go with monthly for EU series.
 
  • 1Like
  • 1
Reactions:
I could even go with monthly for EU series.
There is a reason I don't play turn-based strategy any more.

(Turn-based tactical games? yepperoni. Turn-based strategy? Nope.)
 
  • 1Like
  • 1
Reactions:
Maybe I'm thinking about this too naively, but is there a reason not to completely separate the UI from the simulation in the background by maintaining a display and a work copy of the game data and only do the simulation on the work copy and update the display copy after each tick? This way, it should be pretty hard for the UI and the simulation to effect each other's performance. I have a distinct memory from Stellaris where in the late game the UI would become very sluggish unless the game was paused.
 
  • 1
Reactions:
Maybe I'm thinking about this too naively, but is there a reason not to completely separate the UI from the simulation in the background by maintaining a display and a work copy of the game data
CPU and RAM usage.
 
  • 1
  • 1
Reactions:
What is the effect on performance of 3D V-cache on performance? Is Victoria one of those games that benefit from more cache in the CPU?
I'm sure it is. Its similar enough to other PDX games that theres no reason to think it doesn't benefit massively.

I am hoping though that since Vic3 is known to slow down so much late game, and with the new 3D cache Ryzens about to come out, that some people will do some benchmarks specifically of Vic3. So we'll know for sure, hopefully.
 
  • 1
Reactions:
Sorry if this is a naive question but what is the reason why the country the player plays is more computationnally intensive than the other countries ? (I ask because thata is how I understand that lagging might occur if you play a very big country)
 
  • 1
Reactions: