Stellaris Dev Diary #149 - Technical improvements

Moah · May 23, 2019

Hi everyone, this is Moah. I’m the tech lead on Stellaris and today I’m here to talk about the free 2.3 "Wolfe" update that will be arriving together with Ancient Relics, and what it brings to the table in terms of tech.

Stellaris is going 64 bits.
People have been clamoring for this for a while now, and various factors have led us to finally do this for this patch. I should temper your expectations though: while many have claimed that this would be a miracle cure for all their issues with Stellaris, the reality is somewhat more tame.

What does it mean?
The one solid benefit is that Stellaris is no longer limited to 4gb of memory, and won’t crash anymore in situations where it was reaching that limit. For people who play on huge galaxies, with many empires, many mods or well into 3000s, this will be a boon.

In terms of performance, though, it doesn’t change much. Without drowning you in technical details, let’s just say that some things go faster because you handle more data at once, some things go slower because you have more data to handle. In the end, our measurements have shown no perceptible difference.

Finally, the last effect of switching to 64 bits is that the game will no longer playable on 32 bits computers or OSes. We don’t think this will affect many people, but there you have it.

What about Performance?
I know that’s everyone’s favourite question, so let’s do our best to talk about it. First, let me dispel some notions floating around in various forums: Stellaris does use multithreading, and we’re always on the lookout for new things to thread. In fact between 2.2.0 and 2.2.7, a huge effort was made to thread jobs and pops, and it’s one of the main drivers of performance improvement between these version.

Pops and jobs are indeed what’s consuming most of our CPU time nowadays. We’ve improved on that by reducing the amount of jobs each pop evaluate. We’ve also found other areas where we were doing too much work, and cut on:

Ships calculating their daily regeneration when they’re at full health
Off-screen icons being updated
Uninhabitable planets doing the same evaluations as populated planets

Why do these seemingly pointless things happen? Well, we generally focus on getting gameplay up and working quickly so that our content designers can iterate quickly, and sometimes things fall through the cracks. Some of these systems are also quite complex and the scale of the new code is not so easily apparent. Sometimes, not limiting the number of targets is good enough because you’re not doing much but then, months later, someone adds more calculations or the number of objects explodes for unrelated reasons, and suddenly you’ve got a performance issue.

Modifiers
One thing that sets Stellaris apart from other PDS title is how much we use (or abuse) modifiers. Everything is a modifier. Modifiers are modified by other modifiers themselves modified by other modifiers, and sometimes by themselves. It’s quite hard to follow, and leads to every value being able to change at any time without your noticing.

“Why don’t you just compute jobs when a new one appears?” has often been asked around these parts. Well, a short answer to that is it’s really hard to know when a new job appears. You can get jobs from any modifier to: country, planet, pops. Each of these can get modifiers from ethics, traditions, perks, events, buildings, jobs, country, planets, pop, technology, etc.

Until now we were trying to calculate modifiers manually, forced to follow the chain in its entirety: when you recompute a country modifier, you then calculate their planets modifiers, and then each planet would recalculate their pops modifiers. Some of our freezes were just that tangled ball of yarn trying to sort itself out.

This is our modifier flow charts. It’s not quite up to date, but gives you an idea of the complexity of the system (Unpolished because it’s a dev tool, and not made for the article).

No More!
For 2.3 “Wolfe” we have switched to a system of modifier nodes, where each node register what node they follow, and is recalculated when used, following the chain itself. We have modifiers that are more up to date, and calculated only when needed. This also reduces the number of pointless recalculations.

This system has shown remarkable promise, and cut the number of “big freezes” happening around the game (notably after loading, for example). It has some issues, but as we continue working with it, it’ll get better and help both with performance and our programmers’ sanity.

So, what’s the verdict?
In our tests, 2.3 “Wolfe” is between 10% and 30% faster than 2.2.7 right now. Hopefully it’ll stay that way until release, but the nature of the beast is that some of these optimizations break things and fixing the issues negate them, so we can’t promise anything.

IuIGuQ4cXPvjCEMWG_AowiNIFXhzpsPIcphmCVJD79vQqVMqUeZCqCoVfDlWDNZ3YNkAScYAJh2ebft947YsqoOhG7A_4pNBWxjZ6L9se5lkEEImNYZ4uOpTMWj-amEiwSYdirpd

Measurements provided by @sabrenity , using detailed info from the beta build. It’s worth noting the “SHIPS_SERIAL” purple line has since been eliminated.

AI
Another forum favorite, we have done some improvements to the AI. First, with @Glavius ’s permission, we’ve used his job weights to improve general AI job distribution. We’ve also done the usual pass of polish and improvements, and of course taught the AI how to use all our new features.

What else is new?
We’re also getting a new crash reporter that will send your crash report as soon as they happen rather than next time you start the game. We’ve improved our non-steam network stack for connectivity issues, etc.

All right, enough of my yammering. This has turned into a GRRM length novel, and even though there are many more areas we could cover, we’ll just turn this for your perusal.

Jamor · May 23, 2019

Ridixo said:
It will affect more people than you think.

Our stats show less than 1% of users are still on 32 bit systems. The intent is never to abandon anyone, but ultimately tech needs to move on to allow further development, and our other option was to abandon all Mac users, period. That is not going to be done.

Moah · May 23, 2019

Kingman said:
Yes! Please the tech overflow bug will be automatically solved with this update?
All the overflow issues hamstring late games I reach into

As a generic answer: 64 bit won't magically fix the overflows we have, but many of those have been fixed independantly by the team.

Moah · May 23, 2019

nstgc said:
But it would facilitate moving from short to long integers, and from single to double precision floats right? By itself, using a 64b register won't fix overflows, but moving variables from 32b to 64b should correct? And a 64b executable is a first step.

Well not really. One issue is, we don't use floating points (much) but have instead our own "decimal fixed point" for various reasons. Switching those to 64 bits is possible, it's been done on projects that are in development, but it would be a small change with massive consequences.
We would replace 3 very visible bugs with potentially hundreds of bugs we don't see immediately. We're also unsure what the performance impact would be.
In short, we are not planning to do this. We plan to fix the overflows instead.

Moah · May 23, 2019

Wurstw4sser said:
There was other topics, which pointed out that L-Gates and overall wormholes and other gates are also a major player for performance loss, because of the path finding.
Were you also able into looking this one? Is the patch also addressing this?

We haven't worked on that (yet). Unfortunately this is a problem for which we don't have a good solution yet.
Our issue is that we have a cache that contains the distance from any system to any other system, and when you add/remove bypasses or systems we need to recalculate this cache. This is further compounded by the fact not everyone has the same access to every bypass.
We have the "basic" cache which is only for hyperlan distances, and then we have a cache patch that adds distance through gateways accessible to that country. This country specific cache needs to be emptied whenever a bypass gets added, and towards the end game, every country starts building gateways, leading to mass cache invalidations and reconstructions.
Add to that that, invariable, the pathfinding itself becomes more complicated because you get many more ways to reach the same point.
Until we find a genius idea, i'm not sure we can do much to improve that. I've suggested removing gateways/wormholes/l-gates but for some reason nobody likes it when i suggest we remove features. Go figure!

Moah · May 23, 2019

Inny said:
@Moah The biggest lag I encounter in the game is when a new system is generated (i.e. completing precursor discoveries that makes their home system spawn). The game then freezes for a little moment. Will there be improvements too regarding this?

I've just explained why this happens, and we don't have a good solution right now.

Also, do you plan to try to improve loading times? If it's even possible of course.

There's some improvement, but we weren't able to dedicate as much time to this as we wanted.

Moah · May 23, 2019

Methone said:
I mean, if you're fine with 75% of your pops dying off.

More seriously it's nice to see some of the magic behind the curtain. I know I could never follow that Modifier chart. I always appreciate the hard work programmers put into this game, but now especially moreso.

Also, I suspect that if we're on 'dev diary about technical issues' the release itself can't be too far off!

Funnily enough, killing 75% of all pops would resolve our performance issues.

Moah · May 23, 2019

Do'tasarr said:
Thats great, but i have to ask you to activate the Steam rich presence mentioned in a previous changelog and dev diary

That's the first time I hear it isn't active. I'll ask QA to check what's up.

Jamor · May 23, 2019

Moah said:
Funnily enough, killing 75% of all pops would resolve our performance issues.

Don't tempt me.

Moah · May 23, 2019

shad321 said:
So we have an idea for next Galactic Crisis! A Space Flu!
Kills 75% of empire population, and reshuffles the cards on the table ( 75% totally, but not equally! )

I've been wanting a space flu kind of crisis, forcing nations to cooperate to solve an intergalactic extinction level event without fighting for a while. It's unfortunately not in the plans right now.

Moah · May 23, 2019

Arcvalons said:
Stellaris: Reaper's Due

This would bring memories back to @Darkrenown and I.

Darkrenown · May 23, 2019

Moah said:
I've been wanting a space flu kind of crisis, forcing nations to cooperate to solve an intergalactic extinction level event without fighting for a while. It's unfortunately not in the plans right now.

It'll be like the good old days on Reaper's Due

Moah · May 23, 2019

Spartakus said:
This right there is not correct. Opening a Gateway only effects the shortest distance calculation for systems in it's vicinity. Let me elaborate:
Let's for a moment only use Hyperlanes (and Wormholes which in this regard act like very long Hyperlanes)
Assume we have a table that gives us for each two systems A and B The distance from A to B (and of course the Hyperlane used on the shortest path). Needless to say, this table is symmetric, so it also gives us that for the route from B to A.

Now a Gateway opens. We create a second table, that for every System A gives the distance from A to the Gateway (and the Hyperlane). So far not a lot of calculations necessary.
Whenever a new Gateway opens we start doing a BFS from the System it tis in. For each visted system there are two options:

The new Gateway is closer to the visted system then it's shortest distance to a previous Gateway. Then we update it's entry int he shortest-distance-to-gateway table. Then we add all it's neighbours to the BFS-list.

The visted system has already an equally long route to a previous Gateway. Then we don't add it's neighbours to the BFS-List.

After this we have an updated table that gives us shortest distances to gateways. Now for every two Systems A and B the shortest distance between them including both Hyperlanes and Gateways is Min(HyperlaneDistance(A,B), GatewayDistance(A) + GatewayDistance(B))
In both cases the tables deliver the next system on the route as well.

Now we do the same with L-Gates. This opens up one problem: It's possible that the sortes route uses both a Gateway and an L-Gate. But in this case we know that we will use the shortest route between a Gateway and an L-Gate and this can be included in the above calculations without much trouble.

Distance(A,B) = Min(HyperlaneDistance(A,B), GatewayDistance(A) + GatewayDistance(B), LGateDistance(A) + LGateDistance(B), LGateAndGateWayDistance(A,B))
With LGateAndGateWayDistance(A,B) = Min(GateWayDistance(A), LGateDistance(A)) + Min(GateWayDistance(B), LGateDistance(B)) + DistanceBetweenLGateAndGateway.

Spartakus said:
Ok, here's how this one second table works. As I mentioned there will also be a third table for L-Gates but it works exactly the same:
For each of your n systems you store the distance to it's nearest Gateway and the Hyperlane leading to it. So exactly n entries and only one table. And this is the table you update. When more and more Gateways open you need to update more frequently, but each update is faster.

If you have enough Gateways that each system is at most 10 jumps from a gateway and you open a new one, the BFS will at most reach a depth of 10 before terminating.

This only works if wormholes, gateways and l Gates are hardcoded, right? The issue is that all those types are moddable, and there could be dozens networks of each of them.

Moah · May 23, 2019

Gothbert said:
If the system is hidden from the galaxy map, players won't be able to select it to travel to. You could then setup a flag on the system to block AI empires from traveling to it until the event chain is completed. This might also better "regionalize" the precursor event chains much in the same way that fallen empires are placed on the map.

Most event chains that create a system create it close to the players who finished the chain, though.

Bayushi Tasogare said:
I'm clearly not as good at algorithms as Spartakus, but I definitely believe that the root issue is one of technical debt. If they're still using a code base that has not been updated since removal of the other propulsion systems, their pathfinding is likely using structures that were not optimized for starlanes.

We did rewrite and optimise it for the new system.

Moah · May 23, 2019

Bayushi Tasogare said:
Are you saying that it is possible to mod different types of Gateways, so that they are not interchangeable? The same approach could be used, but you would have n! different checks to make by Spartakus's algorithm where n is the number of different types of Gateways.

And then you're back to a good ol' A* I think.

Guraan · May 23, 2019

Ooh i really love this pathfinder rant going on, but just to raise the level a bit:
- Each fleet has different stances that need to be considered if a system is a valid point or not.
- Each system may or may not give you access to pass by ftl inhibitors and your diplo standing with that empire.
- Each wormhole has an requirement that they have to been explored.

Side note: The underlying hyperlane distance cache is ofc just a upper triangle since distance is bidirectional so space wise it should be n(n - 1)/2, still to big to be in L1 cache for huge galaxies O;P

Guraan · May 23, 2019

nstgc said:
Is this implying that wormholes are being treated as hyperlanes?

Its a jump to another node just as anything else ofc, just other restrictions if you can use it or not O

Guraan · May 23, 2019

Bayushi Tasogare said:
I'm not sure I'd classify it as a rant, but it is certainly something of interest to many of us.

Matter of perspective, there is a few previous threads about both pathfinder and 64bit here at the stellaris forum.

Bayushi Tasogare said:
Also, isn't 'lane explored' in the case of wormholes just another case of 'can I use this hyperlane'?

Yes they are, its the `can i use` that becomes a vital factor, but treat them as the same for now O

Bayushi Tasogare said:
True enough. And thank goodness that hyperlanes are bidirectional... directed cyclic graphs are the worst.

Lol yes they are, but i can think of alot of fun things to do if they where lol

Tsu Chi said:
I am sorry, I never wanted to start a rant.

never be sorry to start good "rants" O;P

Tsu Chi said:
I am sorry, I never wanted to start a rant.
from what you say, these rules further restricts nodes that are accessible which should speed up the algorithm (less nodes, faster calculations)
slow down might be bcs of accessing those additional data (rules, which nodes to eliminate from the graph).

In theory yes, in practice you will have to account for the cpu cache misses that nowdays is a vital factor. (the rules are quite complex and gives alot of overhead even that those is cached)

Tsu Chi said:
I still don't know why you need a cache with all distances for all stars and why pathfinding is not calculated on the fly.
can you explain it?

Hehehehe will answer that question day after tomorrow, it would spoil a bit of the challenge O

Guraan · May 23, 2019

Spartakus said:
If you just needed a single path for a single fleet you would indeed be right that calculating on the fly would be fastest. However, my best guess is that we need dozens, by the late game hundreds of paths to be calculated, possibly every tick. On for every seperate fleet or civilian ship. So an algorithm that takes several milliseconds is out of the question, we'Re aiming for microseconds. Hence lookup-tables.

You might be on something their, AI re-evaluating their strategy, planets that want to know how close they are to the capital etc O

The pathfinder itself is not slow, it just gets hammered with calls lol

Spartakus said:
Problem is, that the tables need to be up to date, so you've got an overhead for keeping the tables fresh.

Yep and it also needs to be synchronized between the players in mp as well O

Much cred and 2x golden stars @Spartakus

Guraan · May 23, 2019

Tsu Chi said:
Ok, but ...
this cache (all distances) is recalculated each time new node is added (so for each distance pathfinding algorithms multiplied by huge number), and in later game it is frequent, so some time gained at first is lost later (and you can feel the slow down in late game)

Nah that is the hyperlane cache, that one is only recalculated if you add or remove a hyperlane... Rest is the "patch" cache and that one is not a 1:1 mapping but on demand.

Tsu Chi said:
Additionally if the cache consist of only distances from any star to any star how this data is useful??
it is not a path through selected stars (nodes) only distance.
It is not every path through every node bcs it is "bad idea" ? or not.
and even if it is a path through some nodes, even if there are multiple paths from A>B, some of nodes on those paths could be restricted for this fleet
so no viable paths
then you calculate from scratch.
or maybe it is resolved differently

Yes that is the big question isnt it O

Stellaris Dev Diary #149 - Technical improvements

Platypus Admirer

PDS Producer

Platypus Admirer

Platypus Admirer

Platypus Admirer

Platypus Admirer

Platypus Admirer

Platypus Admirer

PDS Producer

Platypus Admirer

Platypus Admirer

Star marshal

Platypus Admirer

Platypus Admirer

Platypus Admirer

Emperor Penguin

Emperor Penguin

Emperor Penguin

Emperor Penguin

Emperor Penguin