• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Stellaris Dev Diary #149 - Technical improvements

Hi everyone, this is Moah. I’m the tech lead on Stellaris and today I’m here to talk about the free 2.3 "Wolfe" update that will be arriving together with Ancient Relics, and what it brings to the table in terms of tech.

Stellaris is going 64 bits.
People have been clamoring for this for a while now, and various factors have led us to finally do this for this patch. I should temper your expectations though: while many have claimed that this would be a miracle cure for all their issues with Stellaris, the reality is somewhat more tame.

What does it mean?
The one solid benefit is that Stellaris is no longer limited to 4gb of memory, and won’t crash anymore in situations where it was reaching that limit. For people who play on huge galaxies, with many empires, many mods or well into 3000s, this will be a boon.

In terms of performance, though, it doesn’t change much. Without drowning you in technical details, let’s just say that some things go faster because you handle more data at once, some things go slower because you have more data to handle. In the end, our measurements have shown no perceptible difference.

Finally, the last effect of switching to 64 bits is that the game will no longer playable on 32 bits computers or OSes. We don’t think this will affect many people, but there you have it.


What about Performance?
I know that’s everyone’s favourite question, so let’s do our best to talk about it. First, let me dispel some notions floating around in various forums: Stellaris does use multithreading, and we’re always on the lookout for new things to thread. In fact between 2.2.0 and 2.2.7, a huge effort was made to thread jobs and pops, and it’s one of the main drivers of performance improvement between these version.

Pops and jobs are indeed what’s consuming most of our CPU time nowadays. We’ve improved on that by reducing the amount of jobs each pop evaluate. We’ve also found other areas where we were doing too much work, and cut on:
  • Ships calculating their daily regeneration when they’re at full health
  • Off-screen icons being updated
  • Uninhabitable planets doing the same evaluations as populated planets
Why do these seemingly pointless things happen? Well, we generally focus on getting gameplay up and working quickly so that our content designers can iterate quickly, and sometimes things fall through the cracks. Some of these systems are also quite complex and the scale of the new code is not so easily apparent. Sometimes, not limiting the number of targets is good enough because you’re not doing much but then, months later, someone adds more calculations or the number of objects explodes for unrelated reasons, and suddenly you’ve got a performance issue.

Modifiers
One thing that sets Stellaris apart from other PDS title is how much we use (or abuse) modifiers. Everything is a modifier. Modifiers are modified by other modifiers themselves modified by other modifiers, and sometimes by themselves. It’s quite hard to follow, and leads to every value being able to change at any time without your noticing.

“Why don’t you just compute jobs when a new one appears?” has often been asked around these parts. Well, a short answer to that is it’s really hard to know when a new job appears. You can get jobs from any modifier to: country, planet, pops. Each of these can get modifiers from ethics, traditions, perks, events, buildings, jobs, country, planets, pop, technology, etc.

Until now we were trying to calculate modifiers manually, forced to follow the chain in its entirety: when you recompute a country modifier, you then calculate their planets modifiers, and then each planet would recalculate their pops modifiers. Some of our freezes were just that tangled ball of yarn trying to sort itself out.

NexRiPkna2utTqAzF9H0DEjOCwHVsI4EejYO-vMQMh6QwUB-_uP7dXmpjkwXzOOKoiwDqkSzd9tlLmN3DlFN2R06A62od6XxWm8xh99XRDfRFRP3vVj42GBIaDaXSK7jjyKdS39b

This is our modifier flow charts. It’s not quite up to date, but gives you an idea of the complexity of the system (Unpolished because it’s a dev tool, and not made for the article).

No More!
For 2.3 “Wolfe” we have switched to a system of modifier nodes, where each node register what node they follow, and is recalculated when used, following the chain itself. We have modifiers that are more up to date, and calculated only when needed. This also reduces the number of pointless recalculations.

This system has shown remarkable promise, and cut the number of “big freezes” happening around the game (notably after loading, for example). It has some issues, but as we continue working with it, it’ll get better and help both with performance and our programmers’ sanity.

So, what’s the verdict?
In our tests, 2.3 “Wolfe” is between 10% and 30% faster than 2.2.7 right now. Hopefully it’ll stay that way until release, but the nature of the beast is that some of these optimizations break things and fixing the issues negate them, so we can’t promise anything.

IuIGuQ4cXPvjCEMWG_AowiNIFXhzpsPIcphmCVJD79vQqVMqUeZCqCoVfDlWDNZ3YNkAScYAJh2ebft947YsqoOhG7A_4pNBWxjZ6L9se5lkEEImNYZ4uOpTMWj-amEiwSYdirpd


Measurements provided by @sabrenity , using detailed info from the beta build. It’s worth noting the “SHIPS_SERIAL” purple line has since been eliminated.

AI
Another forum favorite, we have done some improvements to the AI. First, with @Glavius ’s permission, we’ve used his job weights to improve general AI job distribution. We’ve also done the usual pass of polish and improvements, and of course taught the AI how to use all our new features.


What else is new?
We’re also getting a new crash reporter that will send your crash report as soon as they happen rather than next time you start the game. We’ve improved our non-steam network stack for connectivity issues, etc.


All right, enough of my yammering. This has turned into a GRRM length novel, and even though there are many more areas we could cover, we’ll just turn this for your perusal.
 
Last edited:
  • 1
Reactions:
Hello
I know it may not have a lot of common with this Dev Diary, but I had an idea to add into Stellaris.
I have been playing Stellaris for a year and a half now, and I think that megastructures, apart from the enormous bonuses they provide, do not offer a lot of gameplay. So I have thought about adding a colony like view to megastructures. When I thought about them, the first thing that came to my mind was how many people can fit in there, and in the example of a the scientific megastructure, where do all the scientists that work in there come from ? So,(without including ringworlds and habitats) I have thought about transforming all megastructures into special planet types, retaking the example of a scientific megastructure, with a specific job type( in this case scientists ). I think it will add more realism to megastructures and a new range of possibilities with them.
Thanks for considering my idea.
Does a megastructure need a population in the hundreds of millions to run? That's a rough estimate of the pop size for a "mature" planet. If they have multilple pops, then you could be getting on for having billions of people on board the megastructure.
 
Stellaris is going 64 bits.
In terms of performance, though, it doesn’t change much.
Yay!
Though I'm surprised no drastic performance changes happened. Moderate performance shifts are common, though can go in either direction.

All right, enough of my yammering. This has turned into a GRRM length novel, and even though there are many more areas we could cover, we’ll just turn this for your perusal.

There is a glaring question of ... questionable quality of the galaxy generators and their poor moddability. Could you (the dev team) look into it and maybe disclose some data for the modders?
 
Yay!
Though I'm surprised no drastic performance changes happened. Moderate performance shifts are common, though can go in either direction.
I would imagine there would be more improvements coming as they get the game more optimized for 64bit later on. But a step is a step and this is a pretty important one.
 
Last edited:
@Guraan
Could you get a significant boost by ditching the necessity to calculate optimal routes for long distances? If we're sending fleets half across the galaxy most people won't even notice if the calculated routes was a couple jumps longer then an optimal one. To use a real world map as example, if you wanted to go from Stockholm to Berlin, according to google maps you're using a route through Jönköping. But if we came up with one through Göteborg instead it wouldn't be too bad. Only local routes, for example within a cluster would need to be optimal.
In some cases yes we can get away with it, not sure if it will boost performance in large since it will just be in a few cases we do a full pathfind, its just alot of them lol

yes it is. To be honest still don't know why you need all distances for jour pathfinging algorithm.
I understands that distances are needed for AI to strategize: what systems are of interest, how far is enemy fleet etc.
Well first of all, you underestimate the amount of systems that want to know the distance between them and X, yes we cheat here usually and do not care about hyperlanes/wormsholes at all.

but for pathfinding all you need is a data structure with all connections and distances. It could be different data structure (simplified), that is updated less comon than "distance cache" which could be then updated with less priority.
Nah, you need to check if a connection or node is valid for that jump (in these cases), that check is usually more expensive in reality than the constant C is in theoretical computer science due to mem lookups and other cache misses lol
 
In some cases yes we can get away with it, not sure if it will boost performance in large since it will just be in a few cases we do a full pathfind, its just alot of them lol
That's not what I meant. From what I understood from the previous dev comments it's not the pathfinding itself that's taking it's toll but frequent and rather expensive cache invalidations.
but if the cache would consist of clusters which would receive independent invalidations they would be cheaper to recalculate. With the possible drawback that t he global picture would nolonger be guaranteed to be optimal.
To stick with my example, if your optimal route goes through the Jönköping clustet and some system ob the route becomes blocked you can either do a detour within the cluster or avoid it completely by going via Göteborg.
But wich one of those is actually best is not that important if they are at least close enough. What matters is that you on!y invalidate part of the cache.
 
Well first of all, you underestimate the amount of systems that want to know the distance between them and X, yes we cheat here usually and do not care about hyperlanes/wormsholes at all.
If we assume a galaxy size <= ~1000 and number of empires <50 than static per-empire distance matrix will take
2 bytes per distance in jumps * 2 bytes per star id of the next jump * 1000^2 (number of stars in galaxy) * 50 (empires) ~ 200 Mb. Totally worth it to store them if this is actually a bottleneck, isn't it?

This estimate may be highly improved if you accept some limitations on topology of the hyperlane graph (fully connected constellations, for example)
 
Well first of all, you underestimate the amount of systems that want to know the distance between them and X, yes we cheat here usually and do not care about hyperlanes/wormsholes at all.

ok, if you cheat, and check only distance for most systems that need it, why don't you calculate distance between star coordinates ? then multiply it by some "variable" to simulate several jumps between star1 and star2 - the bigger the distance the higher variable would be.

Distance(Star1-Star2)= square root [(x2 − x1)^2 + (y2 − y1)^2] *variable

Wouldn't it be faster for that systems to calculate?

I know that in some cases it would be to much of simplification - for stars very close to each other but not connected, Then you could add higher multiplayer - just check if there is direct connection between those close stars if not give higher multiplayer to simulate 1- 2 jumps between those stars.

so what does other "systems" need other than simplified distance? number of jumps or true distance calculated with pathfinding? and what are those systems?


Nah, you need to check if a connection or node is valid for that jump (in these cases), that check is usually more expensive in reality than the constant C is in theoretical computer science due to mem lookups and other cache misses lol

Agree that connection or node should be validated if this empire/ship can go through it.
but you have to do if for pathfinding anyway, so for sure you do such a check. I cant imagine you use simplified distance for path calculation because it is just distance, without nodes, without validations.

Only "systems" that need true and accurate distance should use calculation with pathfinding and take into account validations. I can imagine (but might be mistaken) only one systems that need it - when ship sets path to another system.
I hope such checks does not happen for each ship from each empire many times in a game month, It would be not good.
When AI wants to check if it should send fleet from Star1 to Star2 it should check simplified distance, if the result is "ok" then you calculate true distance with validations and if it is still ok, ship is sent to the target.
If such pathfinding calculations are done dozens of times per month per empire maybe it would be more resource friendly.
 
That's not what I meant. From what I understood from the previous dev comments it's not the pathfinding itself that's taking it's toll but frequent and rather expensive cache invalidations.
but if the cache would consist of clusters which would receive independent invalidations they would be cheaper to recalculate. With the possible drawback that t he global picture would nolonger be guaranteed to be optimal.
Nah, i do not c the pathfinding as a major culprit in performance right now, there is some conditions that will recalc it excessively but there we also have some logic for doing a lazy recalc from script.
To stick with my example, if your optimal route goes through the Jönköping clustet and some system ob the route becomes blocked you can either do a detour within the cluster or avoid it completely by going via Göteborg.
But wich one of those is actually best is not that important if they are at least close enough. What matters is that you on!y invalidate part of the cache.
Yes i love the comparison, but the problem still remains.for a 1000 star system a optimal heuristics might be worse than none at all.
If we assume a galaxy size <= ~1000 and number of empires <50 than static per-empire distance matrix will take
2 bytes per distance in jumps * 2 bytes per star id of the next jump * 1000^2 (number of stars in galaxy) * 50 (empires) ~ 200 Mb. Totally worth it to store them if this is actually a bottleneck, isn't it?
Distance is actually 4bytes (do not ask me why). Storing them is no prob, doing a lookup in that huge ass table "might" give some expensive cache misses on the other hand.
 
Distance is actually 4bytes (do not ask me why). Storing them is no prob, doing a lookup in that huge ass table "might" give some expensive cache misses on the other hand.
Potentially one pair of bytes for distance in the plane and one for distance vertically?

Or it's just *really* precise?
 
I can't believe someone okay'd a space 4X game that runs large scenarios on 4gb. Take a look at Galactic Civilizations 3. It's less performance intensive than Stellaris but still suggests up to 32gb for larger maps.
 
Does being turn-based make it use more RAM?

It probably accounts for at least some of the "less resource intensive" nature of the game.

And hey...
Guess what the required and recommended amounts of RAM are when you look at the Steam page?

4GB required, 6 GB recommended.
So GalCiv 3 is a 4X that runs large maps on... ...wait for it... ...4GB of RAM.
 
We haven't worked on that (yet). Unfortunately this is a problem for which we don't have a good solution yet.
Our issue is that we have a cache that contains the distance from any system to any other system, and when you add/remove bypasses or systems we need to recalculate this cache. This is further compounded by the fact not everyone has the same access to every bypass.
We have the "basic" cache which is only for hyperlan distances, and then we have a cache patch that adds distance through gateways accessible to that country. This country specific cache needs to be emptied whenever a bypass gets added, and towards the end game, every country starts building gateways, leading to mass cache invalidations and reconstructions.
Add to that that, invariable, the pathfinding itself becomes more complicated because you get many more ways to reach the same point.
Until we find a genius idea, i'm not sure we can do much to improve that. I've suggested removing gateways/wormholes/l-gates but for some reason nobody likes it when i suggest we remove features. Go figure!
So as pathfinding gets harder with gateways the problem might have a very easy solution: Start doing the calculations both for starting location AND target location and STOP whenever pathing hits a gateway first as every gateway is conected to any other gateway no point in doing stupid amounts of calculations from now on. Now connect the two and you are done. This does not yet consider the fact that there migth be a better way without gateways but that can be done apperently easily anyway as problems with pathfinding does not appear pre a lot of gateways so simple enough do either a second calculation for paths that ignore all gateways or continue one of the other 2 calculations already going on while ignoring the gateway.

Edit: worst case the amount of calculation triples when just doing each of the 3 calculations seperatly + the comparing which is shorter which should be negligible and ofc hitting open gateways.
 
Last edited:
So as pathfinding gets harder with gateways the problem might have a very easy solution: Start doing the calculations both for starting location AND target location and STOP whenever pathing hits a gateway first as every gateway is conected to any other gateway no point in doing stupid amounts of calculations from now on. Now connect the two and you are done. This does not yet consider the fact that there migth be a better way without gateways but that can be done apperently easily anyway as problems with pathfinding does not appear pre a lot of gateways so simple enough do either a second calculation for paths that ignore all gateways or continue one of the other 2 calculations already going on while ignoring the gateway.
Not all gateways are open to you.
Just tracing until you hit gateways is not effective.
 
Not all gateways are open to you.
Just tracing until you hit gateways is not effective.
if the gateway is not open to you then the system should not be open either right? either way it doesnt matter much with my system you just need to hit an open gateway rest is the same and as it worked before gateways where built it will do just the same afterwards only you do 2 calculations from different starting points instead of 1 from a single point (which will bloat a graph a LOT more i guess).