• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Stellaris Dev Diary #170 - Performance and other technical issues

Hello, my friends! This is Moah, Tech Lead of Stellaris typing. I can finally talk about what you’ve all been waiting for: How many new platypi will there be in Federations? After weeks of…

Well, apparently, I should be "more technical." But before we jump into the mysteries of the Stellaris code, I want to take the time to talk a little about the balance between adding new features, improving performance and stability – especially in terms of multiplayer and the dreaded out-of-syncs (dreaded at least by me).

The Delicate Balance
Stellaris, like most decently sized code bases, is like a complex game of Mikado or Jenga: every part is connected in some way to every other part. When you add a feature, you add more connections. If you’re careful, you add only a few, if you’re in a rush you add a bit too many. This generally leads to Unplanned Features (aka bugs). In addition, once we see them perform in the actual game, we tend to expand features in new, unexpected ways, leading to more Unplanned Features(tm).

Once we realize what is happening, we start being more careful. Maybe too careful. Checking too many things, too often, ensuring that this interaction that is supposed to never actually happen is actually not happening. Not now, not later. Not ever.

So you have removed the unplanned features, but the game is a bit, ah… too careful. Some would say slow.

So you remove some of these checks. You realize that you don’t need to loop around the galaxy, you can just loop around this one tiny planet. Then you go one step further, and think “well I can maybe do that check only every three weeks, and this calculation needed by all these checks, I could store it in here and reuse it until the next time it changes.”

So now the game isn’t so careful anymore, we’re back in unplanned feature territory. But if the caching (storing/reusing calculations) happens at different times on different machines, you get slightly different results (like asking a developer for something before and after they had coffee).

Slightly different results are what OOS thrives on! Clients and servers have 0.0001 cost difference, compounded over time, that corvette is bought on the server but not on the client.

So you remove your “smart” algorithm. You replace it with the correct algorithm. You lose half of what you gained in step 2 and reintroduce some bugs. Probably.
Rinse and repeat.

But enough about my morning routine! Let’s talk about…

Performance
Stellaris fans are like C++ programmers: performance is always on their mind. To be fair, it has also been on ours a lot lately. We know that it’s not all that it could be, especially in late game and with the bigger galaxies. With that in mind, we’ve taken time to improve performance in a bit more depth than we usually can. We looked at what was taking the most time, and as everyone knows that is…

G3Zg2ENmwufWgqUXGFjTEebkxlbQzYRGI0diuSOCrFfUcSl9Xn8EkYCyzAUtWAyCdVXt5biT3vv65T4n-EnA5YmHZXb_Gpp9ydvqh28lj_Oa7py3yU3MHETwURjuo1QD4sFZiZNB


Pops.

There are many reasons why pops consume a lot of time in Stellaris, but the main one is that by endgame we have SO MANY of them. SO So so so so many. And they do so much! Pops have to calculate how good they’d be at every job (they do so every 7 days). Then they have to fight every other pop on the planet to get the job they’re best at. They also have to check if they could have a specific ethic. If they could join a specific faction. How happy they are. How happy they could be. How happy they would be on that planet over there.
All these things trigger modifiers calculations. If you remember my last dev diary, you know that modifiers are the only thing more numerous than Pops in Stellaris. And they all depend on each other. Calculating them is like pulling on a thread and getting the whole sweater.


OK, but what did we actually do about it?
Well first, I’ll admit I may have been a bit pigheaded on the whole “we need to do the jobs distribution every day because we don’t know when new jobs are added.” We reexamined this assumption, and jobs distribution is now only done on demand. It was also rewritten to iterate over a lot fewer things.

We also noticed a few triggers going through every pop of an empire to check if one or more are enslaved, decadent, or other things that can be tested at the species level. So we made new triggers to test these things at the species levels. In the same spirit, we had events going through every ship to find a fleet, so we added triggers at the fleet level.

Second, We’ve also reworked the approach to checking if pops can change ethics (and also made it work again), or if they can join factions.

Finally, we’ve looked for (and found) opportunities to use more multi threading.

But enough talk! What’s the result? Well, if a picture is worth a thousand words, here’s the answer at 30000 words a second:


The video compares the performance of 2.5.1 “Shelley” to 2.6 “Verne'' when running a save game from the community, which can be found attached to this post, with over 20000 pops. It was recorded on my work computer (Intel Core7-7900X @ 3.30Ghz, 10 cores and 20 threads, and AMD R9 Fury). You won’t necessarily get the same results, the exact difference in performance will vary with your computer, and the exact situation in your own save games, of course. On average, we’ve found something between 15% and 30% improvement in late game situations.
This save is just ideal to showcase the impact of the pops improvement.

DYxcPB_pqZfHKxxtAj0sh_Y3nx7zXM4OMcUHTkgNsDK9csuQgEECkgc6jVmUEgWpoa6lD2e9kfYdssD61j2I57mhM0XcyT20wfu8fFIZbP-Usqnw2PShuEAD0_-n-ZTNFcH0NJR6


What is this average anyway? How do you know?
Well, we have synths playing the game all night, every night. In the morning, we check how far they were able to go. We also ask them how many errors they encountered, what their endgame looked like, whether they got any OOS and then put all of that in tables and graphs, with many colors. Then we wipe the synths, so they don’t ask pesky questions about souls and whatnot.

EwNw1Mhvr5FLcwYQYuZClsoMxr8qHs3nF3VPqExEcAJrWCvISTEc2fcl3fNLWzQlWKdxuDLAGHEagL9FXOrtio6XazmKpx_rsR7Ri58Ts2tFbq7OcWPdsIG_ayumIutkMGm2VnD_


In conclusion
Although we keep performance in mind and do our best to keep it reasonable, we’re happy we had a chance to take a deeper dive into the issue. Hopefully the changes will spark as much joy for you as it did for us, and we’re looking forward to your feedback!

Next week will feature another dev diary about the other thing you’ve all been waiting for… MORE PLATYPI!

PS: The save file we're using is from the community, one of the performance threads. We are however unsure where we originally got it from. So if you recognize it, or if it's yours please tell us so we can credit you properly.
 

Attachments

  • perf_massive.sav
    4 MB · Views: 289
Last edited:
  • 1Like
Reactions:
Thank you for the answer anyways :). Please go on with the good work! But as a general idea, why not dedicate a thread for every specific game task (as long enough threads available of course). That would maybe give much better (balanced) CPU usage and reduce lag. But i'm shure you already had that idea and there are reasons why it didn't happen until now.

Because not all functions can be run in parallel. Many functions rely on a previous function to provide what they need. It isn't as simple as you think it is. The reason why it didn't happen now is because multi-threading doesn't work like that.
 
would you kindly release the patch as a beta optin for people whos dying for the performance improvement. it allows those who hold of their midgame to finish. as optin it wont break anything for the casual gamer
 
Because not all functions can be run in parallel. Many functions rely on a previous function to provide what they need. It isn't as simple as you think it is. The reason why it didn't happen now is because multi-threading doesn't work like that.
of course you can open up a new thread for a specific task. The only tasks that can't really be paralized are those waiting for input parameters of another thread. And things like pop jobs assignment or ship movement can be archieved by a thread on it's own
 
Although we calculate how good a pop is at every job every seven day, we redistribute pops to jobs (or jobs to pops) every day.
The reason why this was being done is that jobs amounts are actually modifiers and modifiers can come from everywhere, so there's no real way to check only when something change.
HOWEVER in practice, there are only a few places where these things actually happen and they were easy to localize. Which we've done. We missed a few, but then we got a couple of bug reports saying "Jobs don't update when XXX happens" so we fixed that, and now, probably like 99% of cases where the number of jobs change update the job distribution.
There might be a special case where your asteroids event adds a "Bruce Willis" job where it won't be reflected immediately but only right before the economy update, but otherwise it works well.

This function for "redistribute pops to jobs" should work very well multi threaded, with one thread per empire.
 
and your right :). I mentioned specific funktions which are not dependend on each other. But maybe i wasn't precise enough what i meant, my bad!

*nod* np. It's the main reason you can't spread across threads evenly. Moah said he's parallellising what he can. S'all good.
 
15% more than the current state is still a long way from acceptable.

Are there plans to continue focussing on performance to the same extent, or will we return to the status quo of "we are always working on performance"?
This is highly unfair. If 15% is unacceptable, then it may be that the settings you are using is beyond your hardware. My Surface Pro can run Stellaris in its current form fine on a small galaxy. Maybe this update won't make a very large galaxy possible on that hardware, but it may make a normal sized one. That is great news.
 
if u want to keep the design of the game but offer speed to those who want. why not make a slider that says half popgrowth but x2 pop production. so those with slower systems can play.
 
so what i'm hearing is Megacorp won't be Lagcorp anymore?

granted i'm not one to really talk, I only recently got that DLC and haven't hit end game yet to see the supposed lag everyone speaks of
 
This is highly unfair. If 15% is unacceptable, then it may be that the settings you are using is beyond your hardware. My Surface Pro can run Stellaris in its current form fine on a small galaxy. Maybe this update won't make a very large galaxy possible on that hardware, but it may make a normal sized one. That is great news.
The problem is that no hardware can handle 1k stars galaxy with 20+ AIs in the endgame.
 
First, I really appreciate the work you have put into increasing performance and I for my part am glad to not see as many features as expected in the new expansion for it would probably have meant that there would be many more potential sources for buggyness..
One question though I do still have: What about the overflow bugs with long integer values like those for resources (unity etc.)? Did you convert those too?
 
Pops have to calculate how good they’d be at every job (they do so every 7 days). Then they have to fight every other pop on the planet to get the job they’re best at.
Oh jeez, I WISH this worked in practice. As a synth empire, I experience a tremendous amount of incorrect job placement. I don't know what exactly is wrong with robots but you guys just keep assigning them to random jobs regardless of traits. I made several sub-species for the three major worker occupations (energy, minerals, and food) and in the end, the AI just makes the default robots (main species that have +5% to all jobs) and assigns them everywhere, instead of constructing the appropriate models for specific jobs. So you get mining bots working the fields, agri-bots creating electricity, etc. I know this is more of an AI issue, but since you mentioned the algorithm I thought I'd jump in. Really hope this is gonna get addressed someday...

P.S. Love the performance improvements though! Y'all definitely did an amazing job on that, can't wait for the update.
 
Watching that video my thoughts...."Well that's alot better, it still looks annoyingly laggy though, I wonder how many pops thi...*notices pop count at bottom* oooohhhhhh....that... is... alot..."

I'm glad yall showed that extremes are playable but I hope the stream shows a more medium/average game with the performance improvements =)
 
Thank you for putting this topic together.
 
While these improvements are more than welcome from a usability perspective, the more pressing question I have is "What about multiplayer sync?" Do these performance optimizations also help reduce or eliminate the need to rollback and rehost the game in multiplayer, or has some other cause been identified? The group I play with has had to abandon our weekly 4-player matches because we can't manage to make it past the mid-game crisis without the game breaking down in near-constant out-of-sync errors.