• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Victoria 3 - Dev Diary #76 - Performance

16_9.jpg

Hello and welcome to this week's Victoria 3 dev diary. This time we will be talking a bit about performance and how the game works under the hood. It will get somewhat detailed along the way and if you are mostly interested in what has improved in 1.2 then you can find that towards the end.

For those of you who don’t know me, my name is Emil and I’ve been at Paradox since 2018. I joined the Victoria 3 team as Tech Lead back in 2020 having previously been working in the same role on other projects.

What is performance​

It’s hard to talk about performance without first having some understanding of what we mean by it. For many games it is mostly about how high fps you can get without having to turn the graphics settings down too far. But with simulation heavy games like the ones we make at PDS another aspect comes into play. Namely tick speed. This metric is not as consistently named across the games industry as fps is, but you might be familiar with the names Ticks Per Second or Updates Per Second from some other games. Here I will instead be using the inverse metric, or how long a tick takes on average to complete in either seconds or milliseconds. Some graphs will be from debug builds and some from release builds, so numbers might not always be directly comparable.

What exactly a tick means in terms of in game time varies a bit. In CK3 and EU4 a tick is a single day, while on HOI4 it's just one hour. For Victoria 3 a tick is six hours, or a quarter of a day. Not all ticks are equal though. Some work might not need to happen as often as others, so we divide the ticks into categories. On Victoria 3 we have yearly, monthly, weekly, daily, and (regular) ticks.

If you thought 1.1 was slow you should have seen the game a year before release…
DD1.png

Content of a tick​

Victoria 3 is very simulation driven and as such there is a lot of work that needs to happen in the tick. To keep the code organized we have our tick broken down into what we call tick tasks. A tick task is a distinct set of operations to perform on the gamestate along with information on how often it should happen and what other tick tasks it depends on before it is allowed to run.

An overview of some of the tick tasks in the game. Available with the console command TickTask.Graph.
DD2.png

Many of the tick tasks are small things just updating one or a few values. On the other hand some of them are quite massive. Depending on how often they run and what game objects they operate on their impact on the game speed will vary. One of the most expensive things in the game is the employment update, followed by the pop need cache update and the modifier update.

Top ten most expensive tick tasks in our nightly tests as of Feb 15. Numbers in seconds averaged from multiple runs using a debug build.
DD3.png

As you can see from the graph above many of our most expensive tick tasks are run on a weekly basis. This combined with the fact that a weekly tick also includes all daily and tickly tick tasks means it usually ends up taking quite long. So let’s dive a bit deeper into what’s going on during a weekly tick. To do this we can use a profiler. One of the profilers we use here at PDS is Optick which is an open source profiler targeted mainly at game development.

Optick capture of a weekly tick around 1890 in a release build.
DD4.png

There’s a lot going on in the screenshot above so let’s break it down a bit. On the left you see the name of the threads we are looking at. First you have the Network/Session thread which is the main thread for the game logic. It’s responsible for running the simulation and acting on player commands. Then we have the primary task threads. The number will vary from machine to machine as the engine will create a different number of task threads depending on how many cores your cpu has. Here I have artificially limited it to eight to make things more readable. Task threads are responsible for doing work that can be parallelized. Then we have the Main Thread. This is the initial thread created by the operating system when the game starts and it is responsible for handling the interface and graphics updates. Then we have the Render Thread which does the actual rendering, and finally we have the secondary task threads. These are similar to the primary ones, but are generally responsible for non game logic things like helping out with the graphics update or with saving the game.

All the colored boxes with text in them are different parts of the code that we’ve deemed interesting enough to have it show up in the profiler. If we want an even more in depth we could instead use a different profiler like Superluminal or VTune which would allow us to look directly at function level or even assembly.

The pink bars indicate a thread is waiting for something. For the task threads this usually means they are waiting for more work, while for the session thread it usually means it is blocked from modifying the game state because the interface or graphics updates need to read from it.

When looking at tick speed we are mostly interested in the session thread and the primary task threads. I’ve expanded the session thread here so we can see what is going on in the weekly tick. There are some things that stand out here.

First we have the commonly occurring red CScopedGameStateRelease blocks. These are when we need to take a break from updating to let the interface and graphics read the data it needs in order to keep rendering at as close to 60 fps as possible. This can’t happen anywhere though, it’s limited to in between tick tasks or between certain steps inside the tick tasks. This is in order to guarantee data consistency so the interface doesn’t fetch data when say just half the country budget has been updated.

The next thing that stands out is again the UpdateEmployment tick task just as seen in the graph above. Here we get a bit more information though. Just at a glance we can see it’s split into (at least) two parts. One parallel and one serial. Ideally we want all work to be done in parallel because that allows us to better utilize modern cpus. Unfortunately not all of the things going on during employment can be done in parallel because it needs to do global operations like creating and destroying pop objects and executing script. So we’ve broken out as much as possible into a parallel pre-step to reduce the serial part as much as possible. There is actually a third step in between here that can’t be seen because it’s too quick, but in order to avoid issues with parallel execution order causing out of syncs between game clients in multiplayer games we have a sorting step in between.

Closer look at the UpdateEmployment tick task.
DD5.png

Modifiers are slow​

One concept that’s common throughout PDS games is modifiers and Victoria 3 is no exception. Quite the opposite. Compared to CK3 our modifier setup is about an order of magnitude more complex. In order to manage this we use a system similar to Stellaris which we call modifier nodes. In essence it’s a dependency management system that allows us to flag modifiers as dirty and only recalculate it and the other modifiers that depend on it. This is quite beneficial as recalculating a modifier is somewhat expensive.

However, this system used to be very single threaded which meant a large part of our tick was still spent updating modifiers. If you look at the graph at the top of this dev diary you can see that performance improved quite rapidly during early 2022. One of the main contributors to this was the parallelization of the modifier node calculations. Since we know which nodes depend on which we can make sure to divide the nodes into batches where each batch only depends on previous batches.

Closer look at the RecalculateModifierNodes tick task.
DD6.png

Countries come in all sizes​

A lot of the work going on in a tick needs to be done for every country in the world. But with the massive difference in scale between a small country like Luxembourg and a large one like Russia some operations are going to sometimes take more than a hundred times as long for one country compared to another. When you do things serially this doesn’t really matter because all the work needs to happen and it doesn’t really matter which one you do first. But when we start parallelizing things we can run into an issue where too many of the larger countries end up on the same thread. This means that after all the threads are done with their work we still have to wait for this last thread to finish. In order to get around this we came up with a system where tick tasks can specify a heuristic cost for each part of the update. This then allows us to identify parts that stand out by checking the standard deviation of the expected computation time and schedule them separately.

One place where this makes a large difference is the country budget update. Not having say China, Russia, and Great Britain all update on the same thread significantly reduces the time needed for the budget update.

(And this is also why the game runs slower during your world conquest playthroughs!)

Closer look at the WeeklyCountryBudgetUpdateParallel tick task. Note the Expensive vs Affordable jobs.
DD7.png

Improvements in 1.2​

I’m going to guess that this is the part most of you are interested in. There have been many improvements both large and small.

If you’ve paid attention to the open beta so far you might have noticed some interface changes relating to the construction queue. With how many people play the game the queue can end up quite large. Unfortunately the old interface here was using a widget type that needs to compute the size of all its elements to properly layout them. Including the elements not visible on screen.

New construction queue interface.
DD8.png

To compound this issue even further the queued constructions had a lot of dependencies on each other in order to compute things like time until completion and similar. This too has been addressed and should be available in today’s beta build.

Side by side comparison of old vs new construction queue.
DD9.gif

DD10.gif

One big improvement to tick speed is a consequence of changes we’ve done to our graphics update. Later in the game updating the map could sometimes end up taking a lot of time which then in turn led to the game logic having to wait a lot for the graphics update. There’s been both engine improvements and changes to our game side code here to reduce the time needed for the graphics update. Some things here include improving the threading of the map name update, optimizing the air entity update, and reducing the work needed to find out where buildings should show up in the city graphics.

Graphics update before/after optimization.
DD11.png

As we talked about above, the employment update has a significant impact on performance. This is very strongly correlated with the number of pops in the game. As in the number of objects, not the total population. Especially in late game you could end up with large amounts of tiny pops which would make the employment update extremely slow. To alleviate this design has tweaked how aggressively the game merges small pops which should improve late game performance. For modders this can be changed with the POP_MERGE_MAX_WORKFORCE and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION defines.

Another improvement we’ve done for 1.2 is replacing how we do memory allocation in Clausewitz. While we’ve always had dedicated allocators for special cases (pool allocators, game object “databases”, etc) there were still a lot of allocations ending up with the default allocator which just deferred to the operating system. And especially on Windows this can be slow. To solve this we now make use of a library called mimalloc. It’s a very performant memory allocator library and basically a drop in replacement for the functionality provided by the operating system. It’s already being used by other large engines such as Unreal Engine. While not as significant as the two things above, it did make the game run around 4% faster when measured over a year about two thirds into the timeline. And since it’s an engine improvement you can likely see it in CK3 as well some time in the future.

In addition to these larger changes there’s also been many small improvements that together add up to a lot. All in all the game should be noticeably faster in 1.2 compared to 1.1 as you can see in the graph below. Unfortunately the 1.1 overnight tests weren’t as stable as 1.2 so for the sake of clarity I cut the graph off at 1871, but in general the performance improvements in 1.2 are even more noticeable in the late game.

Year by year comparison of tick times between 1.1 and 1.2 with 1.2 being much faster. Numbers are yearly averages from multiple nightly tests over several weeks using debug builds.
DD12.png

That’s all from me for this week. Next week Nik will present the various improvements done to warfare mechanics in 1.2, including the new Strategic Objectives feature.
 
  • 120Like
  • 36Love
  • 21
  • 4
Reactions:
Fun question: Why after deleting all pops, buildings and nations from map game won't go faster than 100 weeks per second? ;^)
 
This was a good read.

I'm seeing some good performance numbers in that last graph. However that graph need the end suggests performance is still slowing down considerably during the game. Visually it looks like 1870s performance in 1.2 is like 1855 performance in 1.1

To me that suggests now is not the time to add extra goods, extra countries, extra cultures, extra religions, or go easy on pop merging. Is that your view too? Do you do any metrics on what performance impact would come with design changes like that before they are done? Or is performance a more reactive role?

Two questions,

1. Could have an option to turn off some visual parts of the game, like the queue, that hinder performance?
I'm interested in this too, but for the map graphics (eg urban centers visually growing). For me having an option to turn this off would be a quick performance win as I don't care about that kind of thing. I've seen mods which turn the actual drawing parts off, but I assume they can't change the way the calculations needed beforehand are done.

Of course, if those kinds of calculations are now put in a quiet thread and done gradually (say 1/360th per tick, so they never end up on the critical path) then it's not important anymore.
 

An overview of some of the tick tasks in the game. Available with the console command TickTask.Graph.
View attachment 949725
Is this GraphViz using the `dot` layout engine? If yes, how did you get the edges to bend around nodes to avoid overlap?
 
Two questions,

1. Could have an option to turn off some visual parts of the game, like the queue, that hinder performance?
2. What are some parts of the game that had to be turned down due to the physical restraints on performance? For instance, connected subtrees of the market disconnected from the main node seem too be doable in O(f(n) a(n)), where n is the inverse ackerman function (using the Union find datastructure) and f(n) is the cost function of the market calculation. To my theory oriented brain, this seems feasible, but I suspect there will be practical implications.
1. If there are UI elements severely lowering framerate (like construction queue or specific lenses) we need to simply fix them.
2. Anything pathfinding is dodgy, an old iteration of Markets had each state trace market/infrastructure connections back to its market capital. Was dropped primarily for other reasons but would've been iffy for performance too.
 
  • 2Like
  • 1Love
  • 1
Reactions:
As a member of the <10 Standard of Living interest group which can only buy standard computers, I appreciate the work on performance very much.

Question: Is the highest tick speed still limited or can it tick as quickly as the computer finishes its calculations?
 
  • 5Haha
  • 1
  • 1
Reactions:
This was a good read.

I'm seeing some good performance numbers in that last graph. However that graph need the end suggests performance is still slowing down considerably during the game. Visually it looks like 1870s performance in 1.2 is like 1855 performance in 1.1

To me that suggests now is not the time to add extra goods, extra countries, extra cultures, extra religions, or go easy on pop merging. Is that your view too? Do you do any metrics on what performance impact would come with design changes like that before they are done? Or is performance a more reactive role?


I'm interested in this too, but for the map graphics (eg urban centers visually growing). For me having an option to turn this off would be a quick performance win as I don't care about that kind of thing. I've seen mods which turn the actual drawing parts off, but I assume they can't change the way the calculations needed beforehand are done.

Of course, if those kinds of calculations are now put in a quiet thread and done gradually (say 1/360th per tick, so they never end up on the critical path) then it's not important anymore.
We have to consider new features from a performance perspective, yes. If the initial design would be problematic we can offer a compromise code solution that achieves virtually the same thing but at much lower costs. Good dialogue between design and programmers is key!
We can't micromanage everything ahead of time though so we still need to do performance passes after feature implementations etc.

1.2 should already have some hefty improvements to map graphics performance but there are more things we could do. There are several stakeholders involved though, Artists have to agree to any changes so we don't kneecap the visuals completely for the sake of performance.
 
  • 5Like
  • 1
Reactions:
As a member of the <10 Standard of Living interest group which can only buy standard computers, I appreciate the work on performance very much.

Question: Is the highest tick speed still limited or can it tick as quickly as the computer finishes its calculations?
Speed 5 is unlimited/as fast as your computer goes
 
  • 8
  • 2Like
Reactions:
Setting POP_MERGE_MAX_WORKFORCE to 30000 instead of 30 and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION to 1 instead of 4 makes the game run quite a lot faster from my tests.

1.2.2 takes 6:30 minutes to reach 1838 in observation mode.

These settings bring it down to 5:43 minutes. And setting automatic saves to half a year instead of monthly makes it go down further to 5:26.

That is an improvement of 16.5% overall. Pretty good in my books.

Though setting the first define to 300k instead of 30k seems to reverse speed gains.

Think I'll post this as a mod later.
 
  • 1
  • 1
Reactions:
Setting POP_MERGE_MAX_WORKFORCE to 30000 instead of 30 and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION to 1 instead of 4 makes the game run quite a lot faster from my tests.

1.2.2 takes 6:30 minutes to reach 1838 in observation mode.

These settings bring it down to 5:43 minutes. And setting automatic saves to half a year instead of monthly makes it go down further to 5:26.

That is an improvement of 16.5% overall. Pretty good in my books.

Though setting the first define to 300k instead of 30k seems to reverse speed gains.

Think I'll post this as a mod later.
I wonder if by that point there's some issue with that number being higher than the population of a lot of states/countries.
 
  • 2
Reactions:
well... it's good to know making progress, but still there is no ETA of the V1.2 will arrive? Every week keep posting these worklog to buy time and patient from us?
Honestly, i dont care much about this worklog reporting of what have been done over time, this should have been done before the game release.

The game had gone into serious performance issue at and after 1910-ish, even people with decent hardware, the game is not sluggish, it's non-playable. The problem was so serious that the dev team cant miss if they had briefly play through the game. Treating pre-sale customer who pay fullprice as beta tester, let them wait 4-month (game release on Oct 24th) and still no fix to the issue. This is a bad practice!
 
Last edited:
  • 3
  • 1Haha
Reactions:
Am I correct in thinking that some 1.0.N version (1.0.6 maybe?) was faster than the following ones? Could you show curves for more versions than just 1.1 and 1.2?
 
Can we have a brief explanation of how pop merging works?

Also, what do the modifiers POP_MERGE_MAX_WORKFORCE and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION do?

It would also be nice if there was a modifier to allow assimilation/conversion in unincorporated states. I’ve found a good way to improve performance is to increase assimilation/conversion rates (reducing the number of late-game pops especially post-multiculturalism). However this would be much more effective if it applied to unincorporated states as well.


Thanks!
 
  • 1
Reactions:
There can still be individual UI panels or such that needs to be tweaked and we do this as we go along but generally tickspeed is favoured :)
I have other two cases that slowed my games:
- lots of Provinces with turmoils (lots of UI icons with lots of data). Maybe statically allocate a separate "unrest array" composed by [province_id, bool_unrest] for every province, like [ [id1,0], [id2, 1], [...]] could be a faster process instead allocating the whole province data. (I know that could be data redundant, but I think we could have a faster data processing). It'll be used only for the UI.

- lots of war fronts in different world zone. (without seeing the any stats, is like the UI initialized a lot of datas that slowed my game)

I hope it could help to find possible performance issues and have a better game :)
 
  • 1
Reactions:
well... it's good to know making progress, but still there is no ETA of the V1.2 will arrive?
Planned release date is March 13th. I've seen it posted somewhere on the forums, but found confirmation on the Discord. (Note: planned means if something goes horribly wrong, it could still be delayed.)

Interesting that they're launching a day before the new Stellaris DLC. I thought usually Paradox spread out their releases to one game update per week, but I guess everyone is trying to get their Q1 update out all in March before it's not Q1 anymore.
 
  • 2Like
  • 1
Reactions:
Setting POP_MERGE_MAX_WORKFORCE to 30000 instead of 30 and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION to 1 instead of 4 makes the game run quite a lot faster from my tests.

1.2.2 takes 6:30 minutes to reach 1838 in observation mode.

These settings bring it down to 5:43 minutes. And setting automatic saves to half a year instead of monthly makes it go down further to 5:26.

That is an improvement of 16.5% overall. Pretty good in my books.

Though setting the first define to 300k instead of 30k seems to reverse speed gains.

Think I'll post this as a mod later.
Doing this will likely tank endgame performance as that is where the pop numbers become a real issue.
 
  • 7
Reactions:
Thanks for the dev diary!
I really like the look behind the curtains of how the optimisations are done. Vic3 is obviously a pretty complex simulation so the process of optimising is definitely necessary. I'm personally also a fan of graphs.
 
Fun question: Why after deleting all pops, buildings and nations from map game won't go faster than 100 weeks per second? ;^)
100 weeks * 7 days/week * 4 ticks/day, that's 2800 ticks per second. I think that should be good enough ;)

Is this GraphViz using the `dot` layout engine? If yes, how did you get the edges to bend around nodes to avoid overlap?
Its our own implementation of the Sugiyama graph layout algorithm based on a bunch of scientific papers. Its the same thing we use for the tech trees. The bending around nodes is achieved by inserting virtual invisible nodes at every layer along the edges.

This wikipedia article explains some of the concepts used: https://en.wikipedia.org/wiki/Layered_graph_drawing

Any plan of an M1 native version? I remember this was discussed in early development...
Yes, this is still on the table. We're still waiting for the engine team for some of the things needed so I can't give you a timeline, but it is being worked on.
 
  • 4Like
  • 4
Reactions: