• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.

Stellaris Dev Diary #170 - Performance and other technical issues

Hello, my friends! This is Moah, Tech Lead of Stellaris typing. I can finally talk about what you’ve all been waiting for: How many new platypi will there be in Federations? After weeks of…

Well, apparently, I should be "more technical." But before we jump into the mysteries of the Stellaris code, I want to take the time to talk a little about the balance between adding new features, improving performance and stability – especially in terms of multiplayer and the dreaded out-of-syncs (dreaded at least by me).

The Delicate Balance
Stellaris, like most decently sized code bases, is like a complex game of Mikado or Jenga: every part is connected in some way to every other part. When you add a feature, you add more connections. If you’re careful, you add only a few, if you’re in a rush you add a bit too many. This generally leads to Unplanned Features (aka bugs). In addition, once we see them perform in the actual game, we tend to expand features in new, unexpected ways, leading to more Unplanned Features(tm).

Once we realize what is happening, we start being more careful. Maybe too careful. Checking too many things, too often, ensuring that this interaction that is supposed to never actually happen is actually not happening. Not now, not later. Not ever.

So you have removed the unplanned features, but the game is a bit, ah… too careful. Some would say slow.

So you remove some of these checks. You realize that you don’t need to loop around the galaxy, you can just loop around this one tiny planet. Then you go one step further, and think “well I can maybe do that check only every three weeks, and this calculation needed by all these checks, I could store it in here and reuse it until the next time it changes.”

So now the game isn’t so careful anymore, we’re back in unplanned feature territory. But if the caching (storing/reusing calculations) happens at different times on different machines, you get slightly different results (like asking a developer for something before and after they had coffee).

Slightly different results are what OOS thrives on! Clients and servers have 0.0001 cost difference, compounded over time, that corvette is bought on the server but not on the client.

So you remove your “smart” algorithm. You replace it with the correct algorithm. You lose half of what you gained in step 2 and reintroduce some bugs. Probably.
Rinse and repeat.

But enough about my morning routine! Let’s talk about…

Performance
Stellaris fans are like C++ programmers: performance is always on their mind. To be fair, it has also been on ours a lot lately. We know that it’s not all that it could be, especially in late game and with the bigger galaxies. With that in mind, we’ve taken time to improve performance in a bit more depth than we usually can. We looked at what was taking the most time, and as everyone knows that is…

G3Zg2ENmwufWgqUXGFjTEebkxlbQzYRGI0diuSOCrFfUcSl9Xn8EkYCyzAUtWAyCdVXt5biT3vv65T4n-EnA5YmHZXb_Gpp9ydvqh28lj_Oa7py3yU3MHETwURjuo1QD4sFZiZNB


Pops.

There are many reasons why pops consume a lot of time in Stellaris, but the main one is that by endgame we have SO MANY of them. SO So so so so many. And they do so much! Pops have to calculate how good they’d be at every job (they do so every 7 days). Then they have to fight every other pop on the planet to get the job they’re best at. They also have to check if they could have a specific ethic. If they could join a specific faction. How happy they are. How happy they could be. How happy they would be on that planet over there.
All these things trigger modifiers calculations. If you remember my last dev diary, you know that modifiers are the only thing more numerous than Pops in Stellaris. And they all depend on each other. Calculating them is like pulling on a thread and getting the whole sweater.


OK, but what did we actually do about it?
Well first, I’ll admit I may have been a bit pigheaded on the whole “we need to do the jobs distribution every day because we don’t know when new jobs are added.” We reexamined this assumption, and jobs distribution is now only done on demand. It was also rewritten to iterate over a lot fewer things.

We also noticed a few triggers going through every pop of an empire to check if one or more are enslaved, decadent, or other things that can be tested at the species level. So we made new triggers to test these things at the species levels. In the same spirit, we had events going through every ship to find a fleet, so we added triggers at the fleet level.

Second, We’ve also reworked the approach to checking if pops can change ethics (and also made it work again), or if they can join factions.

Finally, we’ve looked for (and found) opportunities to use more multi threading.

But enough talk! What’s the result? Well, if a picture is worth a thousand words, here’s the answer at 30000 words a second:


The video compares the performance of 2.5.1 “Shelley” to 2.6 “Verne'' when running a save game from the community, which can be found attached to this post, with over 20000 pops. It was recorded on my work computer (Intel Core7-7900X @ 3.30Ghz, 10 cores and 20 threads, and AMD R9 Fury). You won’t necessarily get the same results, the exact difference in performance will vary with your computer, and the exact situation in your own save games, of course. On average, we’ve found something between 15% and 30% improvement in late game situations.
This save is just ideal to showcase the impact of the pops improvement.

DYxcPB_pqZfHKxxtAj0sh_Y3nx7zXM4OMcUHTkgNsDK9csuQgEECkgc6jVmUEgWpoa6lD2e9kfYdssD61j2I57mhM0XcyT20wfu8fFIZbP-Usqnw2PShuEAD0_-n-ZTNFcH0NJR6


What is this average anyway? How do you know?
Well, we have synths playing the game all night, every night. In the morning, we check how far they were able to go. We also ask them how many errors they encountered, what their endgame looked like, whether they got any OOS and then put all of that in tables and graphs, with many colors. Then we wipe the synths, so they don’t ask pesky questions about souls and whatnot.

EwNw1Mhvr5FLcwYQYuZClsoMxr8qHs3nF3VPqExEcAJrWCvISTEc2fcl3fNLWzQlWKdxuDLAGHEagL9FXOrtio6XazmKpx_rsR7Ri58Ts2tFbq7OcWPdsIG_ayumIutkMGm2VnD_


In conclusion
Although we keep performance in mind and do our best to keep it reasonable, we’re happy we had a chance to take a deeper dive into the issue. Hopefully the changes will spark as much joy for you as it did for us, and we’re looking forward to your feedback!

Next week will feature another dev diary about the other thing you’ve all been waiting for… MORE PLATYPI!

PS: The save file we're using is from the community, one of the performance threads. We are however unsure where we originally got it from. So if you recognize it, or if it's yours please tell us so we can credit you properly.
 

Attachments

  • perf_massive.sav
    4 MB · Views: 289
Last edited:
  • 1Like
Reactions:
@Keulinchen : I think the issue with threadding is how HARD it can be to do threadding well. For some of the AAAA engines you likely have super-elite people wringing out every single ounce of performance out of the core engine which includes lots of threadding. These companies will then license the engine to others or share it amongst MANY titles within the same company. Thus they can afford the huge investment to make the engine awesome because they can spread the investment out more.

Paradox is likely in a different position as they likely don't have the sales volume to afford the same level of investment. This is why we likely don't have the same level of multi-threadding that other games do. In addition I suspect 'strategy' games similar to Stellaris have a lot of data dependencies that need to be synchronized first before you can spin a lot of the threads off.
 
Would be nice to have some more insight.

The core issue is that it could easily be a multi-year project to transform the engine and game programming to support this sort of multi-threading paradigm. Multiple years where nothing else gets done, all patches and features get frozen until this project is complete. For what could easily be 10% performance gain.

It might be higher, in all likelihood it would be higher than just 10% with such a total rework, but you'd have no idea until the tail-end of the project. It could have no gains for performance, since new inefficiencies could arise from unexpected places to replace old inefficiencies. Compare that with a few months of directed effort for 200% gain. For a new game or a sequel, having an upgraded engine with this sort of paradigm would make sense, since you could make the game features without worrying about remaking the old ones directly or retaining feature parity. Replacing an old game engine in an existing game would generally be a terrible idea, or at least a time consuming idea, since you would be making a completely new game, practically speaking.

Also, different games need different things. Some games allow for some insane parallelization (Factorio is pure programming wizardry), while others function basically on just a single thread thanks to how the design and logic works. Direct comparisons between games, especially ones with different engines, are basically useless.
 
Thanks for you reply! :)

And dont get me wrong, I am just trying to understand why its such a difference to other games. Its not always shared even but MUCH more balanced to the other cores. Like for example a snapshot of a running game of Assasins Creed Odyssey:

View attachment 548636

So as I am no programmer or developer and cannot evaluate whats the difference between programming styles and techniques, different engines that are used or languages used. Also its not clear to me how one can spread workload across cores, but some games seem to be able to and some not, so wouldnt it be worth to redesign if you have a game like Stellaris which is supposed to be running live for a long time, considering Paradox strategy of rather adding features and evolve games instead of launching new versions every year? And when you follow this startegy, then you should consider redesigning it to meet the needs instead of just removing checkpoints or protocolling entries.

Would be nice to have some more insight.

Cheers
Stefan
Redesigning an engine is such a huge project that you might be better off scrapping everything in favour of a Stellaris 2. More importantly it propably also requires developers who do this on a regular basis. Paradox propably has a couple people who work on the Clausewitz engine every day, but they might be occupied supporting the CK3 team or whatever mystery project PDX has in it's pipeline right now and a Stellaris rewrite isn't scheduled to be done anytime soon.

As for why Stellaris looks so unbalanced in it's core use compared to your Assasins Creed picture, my guess is that it used to do pretty well. But then the 2.2 economic rewrite came and as others have pointed out, you can't just dump the addational workload on any thread you like, so the main thread who used to do fine got a ton of new jobs - no pun intended - to handle as well.

Really the only thing that baffles me is how the economy rewrite was greenlighted in the first place without a prototyping that showed the problems that still haunt us more then on eyear later! Either no one screamed STOP at some point or those in charge ignored the ones who did.
 
They could be, and I've been toying with the fact, but if the comparison/grouping would be long fast enough to be worth it.
We've also talked about approaching this more as a statistics model and not have pops as a single entity. It's a lot of gameplay changes, though. So I'd rather we find a way to make this work and let design decide what they want, rather than force choices on them for better performance.
I could not agree more with the sentiment of avoiding breaking something that is working (and current system is working). Keeping separate the technical and GD is perfectly sensible too.
However, if the game allows unlimited growth, the technical side must allow handling infinite pops. And this is not something technical can handle. GD has to impose restrictions of that sort that would extremely discourage growing beyond certain size or allow handling unlimited population or whatever third option they can come up with. Otherwise, see above, people who find lag of 2k pops unbearable will have the same experience at 2.5k. And even if tech side miraculously will allow 10k pops, people will be complaining about 20k and also lack of automation at 5k. Which is currently not the first level issue, because lag is worse. But "fix" the lag and automation will become a number one grievance.
Kudos to you for your care and attention. I know how taxing it is to discuss your professional area with people who do not know what they are talking about (which is probably me in this case, but anyway).
 
In a democracy, the popup menu for elections just refers to your leader by the default title ("President" usually) instead of your custom ruler title.

SWEET! I noticed that in my play through as well. I assumed it was supposed to be that way. It wasn't just in Democracies, either! I got it in Oligarchies, Dictatorships, and Corporations as well!
 
Whatever the AI and the average players CPU can handle. Bonus points for reducing micromanagement. I take a playable came over a more complex one any day.
No. Dumbing down the system because it's moderately difficult to make work is not an option.
 
I think the issue with threadding is how HARD it can be to do threadding well. For some of the AAAA engines you likely have super-elite people wringing out every single ounce of performance out of the core engine which includes lots of threadding. These companies will then license the engine to others or share it amongst MANY titles within the same company
That's only for a handful of engines that actually get licensed around though. Most games doing their own thing just aren't optimized well


One of the generally most optimized games is Factorio. They spend an almost absurd amount of time on tiny things. Multi-threading also comes up often and it's super complicated. Turns out the biggest issue at that point isn't CPU speed, but memory access. If your data structures are spread all over the memory it can actually make things slower as cache misses and cache copying increase drastically. So for example standard linked lists can be pretty bad.
They also found that the slowest things simply take some time no matter what you do and that putting them into another thread is not going to make them take zero time. And that they are very reluctant to deal with all the possible overhead and side effects of multi-threading for an uncertain benefit. So they rather focus on other smaller things that are easier. A few percent here and there also add up
 
I'm just hoping that the First President disappears bug and the Pulsating stars event finally get fixed... They bug me.
 
I'm just hoping that the First President disappears bug and the Pulsating stars event finally get fixed... They bug me.

You know what really bothers me? The fact that it is impossible to close your borders to an empire that is a subject, regardless of who their overlord is. Your rival could subjugate one of your neighboring empires and you cannot close your borders to that empire, because they are a subject.
 
You know what really bothers me? The fact that it is impossible to close your borders to an empire that is a subject, regardless of who their overlord is. Your rival could subjugate one of your neighboring empires and you cannot close your borders to that empire, because they are a subject.
Pretty sure you control borders with subjects by controlling borders with their overlords; they have exactly the same border access as their overlords. Closing borders to the overlord closes them to the subjects as well. The opposite is also true; if an overlord closes their borders to you, their subjects are closed to you as well.

That’s been my experience, anyway.
 
Pretty sure you control borders with subjects by controlling borders with their overlords; they have exactly the same border access as their overlords. Closing borders to the overlord closes them to the subjects as well. The opposite is also true; if an overlord closes their borders to you, their subjects are closed to you as well.

That’s been my experience, anyway.

Edit: After looking at the game files and running a quick test, it seems that it is no longer the case that you can't close your borders to the subjects of other empires. I am glad that this was apparently silently fixed.

I remember a 2.1 game where I was a Devouring Swarm and I was going insane because a bunch of nearby tiny little empires that had been subjugated by the biggest non-AE empire on the map (other than myself) kept sending their fleets through my territory and I couldn't do anything about it.
 
Last edited: