• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Stellaris Dev Diary #149 - Technical improvements

Hi everyone, this is Moah. I’m the tech lead on Stellaris and today I’m here to talk about the free 2.3 "Wolfe" update that will be arriving together with Ancient Relics, and what it brings to the table in terms of tech.

Stellaris is going 64 bits.
People have been clamoring for this for a while now, and various factors have led us to finally do this for this patch. I should temper your expectations though: while many have claimed that this would be a miracle cure for all their issues with Stellaris, the reality is somewhat more tame.

What does it mean?
The one solid benefit is that Stellaris is no longer limited to 4gb of memory, and won’t crash anymore in situations where it was reaching that limit. For people who play on huge galaxies, with many empires, many mods or well into 3000s, this will be a boon.

In terms of performance, though, it doesn’t change much. Without drowning you in technical details, let’s just say that some things go faster because you handle more data at once, some things go slower because you have more data to handle. In the end, our measurements have shown no perceptible difference.

Finally, the last effect of switching to 64 bits is that the game will no longer playable on 32 bits computers or OSes. We don’t think this will affect many people, but there you have it.


What about Performance?
I know that’s everyone’s favourite question, so let’s do our best to talk about it. First, let me dispel some notions floating around in various forums: Stellaris does use multithreading, and we’re always on the lookout for new things to thread. In fact between 2.2.0 and 2.2.7, a huge effort was made to thread jobs and pops, and it’s one of the main drivers of performance improvement between these version.

Pops and jobs are indeed what’s consuming most of our CPU time nowadays. We’ve improved on that by reducing the amount of jobs each pop evaluate. We’ve also found other areas where we were doing too much work, and cut on:
  • Ships calculating their daily regeneration when they’re at full health
  • Off-screen icons being updated
  • Uninhabitable planets doing the same evaluations as populated planets
Why do these seemingly pointless things happen? Well, we generally focus on getting gameplay up and working quickly so that our content designers can iterate quickly, and sometimes things fall through the cracks. Some of these systems are also quite complex and the scale of the new code is not so easily apparent. Sometimes, not limiting the number of targets is good enough because you’re not doing much but then, months later, someone adds more calculations or the number of objects explodes for unrelated reasons, and suddenly you’ve got a performance issue.

Modifiers
One thing that sets Stellaris apart from other PDS title is how much we use (or abuse) modifiers. Everything is a modifier. Modifiers are modified by other modifiers themselves modified by other modifiers, and sometimes by themselves. It’s quite hard to follow, and leads to every value being able to change at any time without your noticing.

“Why don’t you just compute jobs when a new one appears?” has often been asked around these parts. Well, a short answer to that is it’s really hard to know when a new job appears. You can get jobs from any modifier to: country, planet, pops. Each of these can get modifiers from ethics, traditions, perks, events, buildings, jobs, country, planets, pop, technology, etc.

Until now we were trying to calculate modifiers manually, forced to follow the chain in its entirety: when you recompute a country modifier, you then calculate their planets modifiers, and then each planet would recalculate their pops modifiers. Some of our freezes were just that tangled ball of yarn trying to sort itself out.

NexRiPkna2utTqAzF9H0DEjOCwHVsI4EejYO-vMQMh6QwUB-_uP7dXmpjkwXzOOKoiwDqkSzd9tlLmN3DlFN2R06A62od6XxWm8xh99XRDfRFRP3vVj42GBIaDaXSK7jjyKdS39b

This is our modifier flow charts. It’s not quite up to date, but gives you an idea of the complexity of the system (Unpolished because it’s a dev tool, and not made for the article).

No More!
For 2.3 “Wolfe” we have switched to a system of modifier nodes, where each node register what node they follow, and is recalculated when used, following the chain itself. We have modifiers that are more up to date, and calculated only when needed. This also reduces the number of pointless recalculations.

This system has shown remarkable promise, and cut the number of “big freezes” happening around the game (notably after loading, for example). It has some issues, but as we continue working with it, it’ll get better and help both with performance and our programmers’ sanity.

So, what’s the verdict?
In our tests, 2.3 “Wolfe” is between 10% and 30% faster than 2.2.7 right now. Hopefully it’ll stay that way until release, but the nature of the beast is that some of these optimizations break things and fixing the issues negate them, so we can’t promise anything.

IuIGuQ4cXPvjCEMWG_AowiNIFXhzpsPIcphmCVJD79vQqVMqUeZCqCoVfDlWDNZ3YNkAScYAJh2ebft947YsqoOhG7A_4pNBWxjZ6L9se5lkEEImNYZ4uOpTMWj-amEiwSYdirpd


Measurements provided by @sabrenity , using detailed info from the beta build. It’s worth noting the “SHIPS_SERIAL” purple line has since been eliminated.

AI
Another forum favorite, we have done some improvements to the AI. First, with @Glavius ’s permission, we’ve used his job weights to improve general AI job distribution. We’ve also done the usual pass of polish and improvements, and of course taught the AI how to use all our new features.


What else is new?
We’re also getting a new crash reporter that will send your crash report as soon as they happen rather than next time you start the game. We’ve improved our non-steam network stack for connectivity issues, etc.


All right, enough of my yammering. This has turned into a GRRM length novel, and even though there are many more areas we could cover, we’ll just turn this for your perusal.
 
Last edited:
  • 1
Reactions:
Actually in my country and other parts of the world we don't change our computer every time something new is out, we change them when they broke. So by personal experience i assure you people is still using 32-bit CPU more than 17 years old. I could make the change to windows 7 last year. For economics problems as well as import troubles and pay rates if a family or even one person buys something is something they will use a long time, a very long time.
Even Windows XP had a 64 bit version though.
 
But it would facilitate moving from short to long integers, and from single to double precision floats right? By itself, using a 64b register won't fix overflows, but moving variables from 32b to 64b should correct? And a 64b executable is a first step.
Well not really. One issue is, we don't use floating points (much) but have instead our own "decimal fixed point" for various reasons. Switching those to 64 bits is possible, it's been done on projects that are in development, but it would be a small change with massive consequences.
We would replace 3 very visible bugs with potentially hundreds of bugs we don't see immediately. We're also unsure what the performance impact would be.
In short, we are not planning to do this. We plan to fix the overflows instead.
 
@Moah Thanks for the detailed technical info. The changes for 2.3 look really promising. After the bumpy update to 2.2 it seems to be going in the right direction again. And a special praise that you haven't been too vain to include improvements from Glavius in the game. :)
 
You dont need 64bit to use 64bit data types, its just less efficient than in would be on a 64bit machine. IIRC a 32bit machine just simulates it using two 32bit values.
Sure, you just use PAE, but that, as you said, has its own problems including performance issue which runs counter to what the devs are trying for.
 
Well not really. One issue is, we don't use floating points (much) but have instead our own "decimal fixed point" for various reasons. Switching those to 64 bits is possible, it's been done on projects that are in development, but it would be a small change with massive consequences.
We would replace 3 very visible bugs with potentially hundreds of bugs we don't see immediately. We're also unsure what the performance impact would be.
In short, we are not planning to do this. We plan to fix the overflows instead.

Reminds of the song: 99 little bugs in the code, 99 little bugs. Take one down, patch it around 127 bugs in the code.
 
Thanks for this. It might not be as "shiny shiny" as a feature diary, but it's just as important. Apart from being generally interesting to many, it's also great to see the results of deep dives into performance. You guys have always told us that performance is something you are constantly working on and (most of) us have never doubted that, but it's good to see a break down of what you've found, fixed, and achieved, so again. thank you for taking the time to write this up.
 
This looks very good and I'm looking forward for the 64 Bit Stellaris and the other performance improvements.

Just one Question:
There was other topics, which pointed out that L-Gates and overall wormholes and other gates are also a major player for performance loss, because of the path finding.
Were you also able into looking this one? Is the patch also addressing this?
 
Last edited:
But it would facilitate moving from short to long integers, and from single to double precision floats right? By itself, using a 64b register won't fix overflows, but moving variables from 32b to 64b should correct? And a 64b executable is a first step.

You can use short / int / long / long long / whatever regardless of whether you are compiling to a 32 bit or 64 bit executable. Switching to 64 bit will only affect integer sizes that directly depend on the target architecture, which depends on the programming language and compiler used, and whether they are using those types to begin with. (But it seems likely some of them were, thus why I was asking). Same with float / double / long double, which in C are usually sized independent of target architecture and which they apparently are not widely using anyway.

That said, the diplomacy (and to a lesser extent technology) overflows really need to be fixed one way or another, as they are not just visible but meaningful enough they've caused me to abandon games.
 
My opinion, even if people complain about performance, don't take it too seriously... In my laptop the games runs very smoothly, while Imperator seems a slow-motion movie. So, congratulations.
 
@Moah The biggest lag I encounter in the game is when a new system is generated (i.e. completing precursor discoveries that makes their home system spawn). The game then freezes for a little moment. Will there be improvements too regarding this?
Also, do you plan to try to improve loading times? If it's even possible of course.
 
For example, many hospitals in the US have systems that are still running MS 95 or older because the program cant run on newer OSes, including often things like their databases.
Because of reliability requirements of that equipment. Some of it runs people's lives, so there's no place couple-of-years-old-new-shiny-buggy-fashionable-stuff. At least you are not working with ADA ;)

There are pieces of medical equipment still in use that use assembler. Fucking Assembler.
Assembler is required when performance requirements are so strict so you have to count literally every tick in hard real time. This is the case when you need to process measurements from a tonload of transducers into digital data.

You dont need 64bit to use 64bit data types, its just less efficient than in would be on a 64bit machine. IIRC a 32bit machine just simulates it using two 32bit values.
Well, not a machine, but the code running on that machine.
Instead of native and hardware-atomic "A + B" with single overflow catch in 64-bit system, you need to sum higher and lower parts separately, catch overflows from both (if any), perform logic for cases if there was overflow(s) or wasn't, process possible overflows of that logic and somewhere store all the data needed to run this emulation. It requires a lot more space for data and is much, very much longer in execution.
 
AI
Another forum favorite, we have done some improvements to the AI. First, with @Glavius ’s permission, we’ve used his job weights to improve general AI job distribution. We’ve also done the usual pass of polish and improvements, and of course taught the AI how to use all our new features.

Do you mind expanding a bit more on how the AI has been changed? :) pops being in the wrong jobs for their traits is always frustrating to see. In addition are there any plans to adjust pop growth based on available jobs? It makes sense that a planet with many science jobs would attract members of species with science traits.
 
There was other topics, which pointed out that L-Gates and overall wormholes and other gates are also a major player for performance loss, because of the path finding.
Were you also able into looking this one? Is the patch also addressing this?
We haven't worked on that (yet). Unfortunately this is a problem for which we don't have a good solution yet.
Our issue is that we have a cache that contains the distance from any system to any other system, and when you add/remove bypasses or systems we need to recalculate this cache. This is further compounded by the fact not everyone has the same access to every bypass.
We have the "basic" cache which is only for hyperlan distances, and then we have a cache patch that adds distance through gateways accessible to that country. This country specific cache needs to be emptied whenever a bypass gets added, and towards the end game, every country starts building gateways, leading to mass cache invalidations and reconstructions.
Add to that that, invariable, the pathfinding itself becomes more complicated because you get many more ways to reach the same point.
Until we find a genius idea, i'm not sure we can do much to improve that. I've suggested removing gateways/wormholes/l-gates but for some reason nobody likes it when i suggest we remove features. Go figure!
 
@Moah The biggest lag I encounter in the game is when a new system is generated (i.e. completing precursor discoveries that makes their home system spawn). The game then freezes for a little moment. Will there be improvements too regarding this?
I've just explained why this happens, and we don't have a good solution right now.

Also, do you plan to try to improve loading times? If it's even possible of course.
There's some improvement, but we weren't able to dedicate as much time to this as we wanted.
 
We haven't worked on that (yet). Unfortunately this is a problem for which we don't have a good solution yet.
Our issue is that we have a cache that contains the distance from any system to any other system, and when you add/remove bypasses or systems we need to recalculate this cache. This is further compounded by the fact not everyone has the same access to every bypass.
We have the "basic" cache which is only for hyperlan distances, and then we have a cache patch that adds distance through gateways accessible to that country. This country specific cache needs to be emptied whenever a bypass gets added, and towards the end game, every country starts building gateways, leading to mass cache invalidations and reconstructions.
Add to that that, invariable, the pathfinding itself becomes more complicated because you get many more ways to reach the same point.
Until we find a genius idea, i'm not sure we can do much to improve that. I've suggested removing gateways/wormholes/l-gates but for some reason nobody likes it when i suggest we remove features. Go figure!
replace the pathfinding job application problem with this problem