• We have updated our Community Code of Conduct. Please read through the new rules for the forum that are an integral part of Paradox Interactive’s User Agreement.
I would, but the overhead of switching threads was too high...:p
You know, I realize that was a bit rude and I apologize. Since you responded nicely I'll try to answer your questions as concisely as possible.

1. Are any of these threads dependent on any other?
2. Do the threads switch CPUs at any point?
3. How much of the CPU activity is dedicated to switching processes vs actually running them?
4. How much memory does each of these threads actually use?
5. How many threads the OS can handle is rather besides the point, isn't it - the question is if you will see any performance improvement by rewriting the entire game to be able to run everything in parallel.

1. Certainly many of them would be, but its not terribly relevant. The point is that Stellaris planets aren't dependent on other planets, only a monthly update of the global state of the game (err... galactic state) which makes them ideal for multi-threading. Threading is hard when you need to sync up constantly, on the order of multiple times per frame. Every 10-15 seconds? Fairly trivial for the level of processing that Stellaris is doing (e.g. we aren't working with 100 GB datasets or something).
2. Just not terribly relevant AFAIK. Maybe in some extreme edge cases.
3. Very little to switching threads or processes. A significant part of CPU and OS design is about getting good at this. As I stated and you can see from my image, the "context switch delta" is how often the CPU is switching threads ("context" refers to the current working environment). It's already going on thousands of times per second as you read this.
4. Threads generally don't require much extra memory. They can directly address the memory of the main thread and don't need to duplicate anything but what they are modifying. Furthermore the vast, vast majority of any game is graphics and sound, the actual data is just a handful of bytes per pop. Stellaris saves in their entirety are a few megabytes in size.
5. Yes, certainly. It's clearly the case that pops are a major component of performance. If you offload those tasks to a CPU core that isn't doing anything then effectively those pops now cause next to no performance hit on the main CPU core and the Stellaris game.
 
You know, I realize that was a bit rude and I apologize. Since you responded nicely I'll try to answer your questions as concisely as possible.



1. Certainly many of them would be, but its not terribly relevant. The point is that Stellaris planets aren't dependent on other planets, only a monthly update of the global state of the game (err... galactic state) which makes them ideal for multi-threading. Threading is hard when you need to sync up constantly, on the order of multiple times per frame. Every 10-15 seconds? Fairly trivial for the level of processing that Stellaris is doing (e.g. we aren't working with 100 GB datasets or something).
2. Just not terribly relevant AFAIK. Maybe in some extreme edge cases.
3. Very little to switching threads or processes. A significant part of CPU and OS design is about getting good at this. As I stated and you can see from my image, the "context switch delta" is how often the CPU is switching threads ("context" refers to the current working environment). It's already going on thousands of times per second as you read this.
4. Threads generally don't require much extra memory. They can directly address the memory of the main thread and don't need to duplicate anything but what they are modifying. Furthermore the vast, vast majority of any game is graphics and sound, the actual data is just a handful of bytes per pop. Stellaris saves in their entirety are a few megabytes in size.
5. Yes, certainly. It's clearly the case that pops are a major component of performance. If you offload those tasks to a CPU core that isn't doing anything then effectively those pops now cause next to no performance hit on the main CPU core and the Stellaris game.

Now that's just not being honest. The cost of context switching for a delta of 38,000 in your case can vary anywhere from hundreds of millions to a billion cycles per second, ideally a number you can divided by the number of physical cores/threads, of course. It takes thousands of cycles for the hardware to do a single context switch, cache misses will occur, hardware registers must be saved and loaded, it's an expensive process; the more you multitask the more things slow down. Switching a thread between physical cores is even more costly as you also have to move a thread to another core's run queue and migrate the cache lines, this isn't an 'extreme edge case', it occurs for load balancing reasons, however it does happen much, much less often than a typical context switch so I would not actually consider it a critically important performance metric.
 
Now that's just not being honest. The cost of context switching for a delta of 38,000 in your case can vary anywhere from hundreds of millions to a billion cycles per second, ideally a number you can divided by the number of physical cores/threads, of course.

I'm sorry, but you are the one being not honest here. Stellaris is a very simple game running easy calculations on a very small dataset. You are exactly talking about extreme edge cases.

It takes thousands of cycles for the hardware to do a single context switch, cache misses will occur, hardware registers must be saved and loaded, it's an expensive process; the more you multitask the more things slow down. Switching a thread between physical cores is even more costly as you also have to move a thread to another core's run queue and migrate the cache lines, this isn't an 'extreme edge case', it occurs for load balancing reasons, however it does happen much, much less often than a typical context switch so I would not actually consider it a critically important performance metric.

"thousands of cycles", yes, on a core with billions of cycles per second. Furthermore most of these context switches essentially don't even count for performance, because they are not being run on the same CPU core that is running the main thread. Wasting any number of cycles on core 2 or higher is essentially meaningless because you were wasting those cycles all the time anyway. The main game thread is just running on core 1 and checking that the work is done. The supposed "cost" of multithreading, while real in some situations (usually extreme edge cases of either work being done or poor programming), should absolutely not be a factor. If anything the only real recognizable performance loss would be the loss of turbo boost frequencies.

Also, for those who care, Stellaris does already use 43 threads and has ~15,000 context switches a second.
 
I'm sorry, but you are the one being not honest here. Stellaris is a very simple game running easy calculations on a very small dataset. You are exactly talking about extreme edge cases.



"thousands of cycles", yes, on a core with billions of cycles per second. Furthermore most of these context switches essentially don't even count for performance, because they are not being run on the same CPU core that is running the main thread. Wasting any number of cycles on core 2 or higher is essentially meaningless because you were wasting those cycles all the time anyway. The main game thread is just running on core 1 and checking that the work is done. The supposed "cost" of multithreading, while real in some situations (usually extreme edge cases of either work being done or poor programming), should absolutely not be a factor. If anything the only real recognizable performance loss would be the loss of turbo boost frequencies.

Also, for those who care, Stellaris does already use 43 threads and has ~15,000 context switches a second.

I have no clue what you guys are debating here. The game slows down to a turtle in the lategame even on medium galaxies and low planets on beefy pc's. You see a small lag with every day ticking down. How to fix this isnt our problem.

For many players like me seeing this kills all the immersion and fun to play further. I have no problem with sinking time into a game but sitting there watching lagging items on my monitor is simply not fun. In pre 2.2 there was always a huge lag when the month has ended. And thats it. The game slowed down significantly in the lategame as well but at what settings? I had 4 times more planets in my games. I had always more ai empires and a huge galaxy.

If you pair this with a dumb ai and nonsense micro the game isnt fun anymore. In my last playthrough i needed 2 hours to manage 500 captured and enslaved pops after a war. I am not able to purge single pops on planets, pops are checked daily but dont get the right jobs for their traits. You have to close all jobs manually and then the game puts the right pops on the jobs... The planet management is a mess because it takes ages to finish them. Sectors dont work at all and lost their charme. I am not able to plan them at all like before.
 
The game isn't unplayable, though. Does it lag as time goes on? Yes. Does that prevent it from being played? No, not really- the point where lag becomes too frustrating for someone to keep playing is entirely subjective.

Ok, I buy your last point: "the point where lag becomes too frustrating for someone to keep playing is entirely subjective". Hence I will rephase my point saying that: "from my subjective point of view, the slow down in late game is so dammn frustrating that makes the game absolutely uplayable, to me".

Let me just remark that your last point contraddicts your starting point; you began stating (as a univoque truth) that the game isn't unplayable. In fact, either you are right when you say that "the game isn't umplayable" or you are right when you say that unplayability is "subjective".

Hence? What do you think? I am entitled to say that I find the game galactically unplayable (from the subjective point of view of a person, like me, who think that life do not last forever)?

Finally, even buying the "subjectivity" of frustration argument, we could do a bit of statistics about how many people do think today that performance is a frustrating problem. We can just count how many people, here and on all the other forums/comments/reviews, feel frustrated. Maybe I am wrong but I got the impression that we are not a small minority.

For many players like me seeing this kills all the immersion and fun to play further. I have no problem with sinking time into a game but sitting there watching lagging items on my monitor is simply not fun. In pre 2.2 there was always a huge lag when the month has ended. And thats it. The game slowed down significantly in the lategame as well but at what settings? I had 4 times more planets in my games. I had always more ai empires and a huge galaxy.

If you pair this with a dumb ai and nonsense micro the game isnt fun anymore. In my last playthrough i needed 2 hours to manage 500 captured and enslaved pops after a war. I am not able to purge single pops on planets, pops are checked daily but dont get the right jobs for their traits. You have to close all jobs manually and then the game puts the right pops on the jobs... The planet management is a mess because it takes ages to finish them. Sectors dont work at all and lost their charme. I am not able to plan them at all like before.

Applause! This is the point.
 
Last edited:
Let me just remark that your last point contraddicts your starting point; you began stating (as a univoque truth) that the game isn't unplayable. In fact, either you are right when you say that "the game isn't umplayable" or you are right when you say that unplayability is "subjective".
"The game becoming too frustrating for someone to want to play" =/= "the game is unplayable". Getting frustrated and quitting is an active decision made by the player, not an inherent property of the game.
 
I have decided to run some experiments based on this.

I modified Nexus Districts to give 1k housing and 1k jobs as such:

Code:
    planet_modifier = {
        planet_housing_add = 1000
        job_maintenance_drone_add = 300
        job_technician_drone_add = 220
        job_mining_drone_add = 200
        job_agri_drone_add = 140
        job_alloy_drone_add = 70
        job_calculator_add = 70
        planet_crime_add = -10000
        planet_crime_no_happiness_add = -10000
    }

With this, I ran 3 tests in 3 scenarios, with 5 pop "setups"
They all had:
Default Galaxy settings, except no default, fallen and marauder empires
Machine Empire
1 colony
Just over 10k pops
10 ± x Nexus Districts
1 Drone Storage building

All tests were run using:
'ticks_per_turn 3600'
'one_year'
(console commands)

The 3 scenarios:
A - 10 Nexus Districts
B - +2 Nexus Districts
C - -2 Nexus Districts

The 5 pop "setups":
1 - 1 species (1 machine)
2 - 2 species (1 machine, 1 bio)
3 - 3 species (1 machine, 2 bio)
4 - 6 species (2 machine, 4 bio)
5 - 15 species (3 machine, 12 bio)

2 - 5 are using almost the exact same save, except random pops' species has been changed to another.
1 is tested from 2200.02.01 (I believe), meanwhile, the others are from the start of 2201.
However, for a reason that I do not know, 4 and 5 had a huge lag spike (several minutes long) at 2201.01.02, 1 day after the save I tested from. As such, any tests for 4 and 5 start from 2201.01.02 instead of 2201.01.01.

The results of my tests:
Code:
1A - 29.089s, 29.761s, 30.59s  (29.813s)   1B - 29.776s, 29.916s, 30.586s (30.093s)   1C - 29.335s, 29.096s, 32.843s (30.425s)
2A - 25.371s, 27.985s, 27.175s (26.844s)   2B - 28.684s, 27.476s, 27.456s (27.872s)   2C - 31.791s, 28.389s, 29.175s (29.785s)
3A - 27.181s, 27.557s, 28.969s (27.902s)   3B - 27.086s, 27.3s,   27.396s (27.261s)   3C - 30.438s, 29.267s, 31.728s (30.478s)
4A - 27.054s, 33.898s, 27.908s (29.62s)    4B - 31.222s, 32.879s, 35.092s (33.064s)   4C - 34.141s, 33.837s, 27.878s (31.952s)
5A - 34.304s, 30.845s, 30.434s (31.861s)   5B - 29.302s, 39.085s, 32.354s (33.58s)    5C - 40.258s, 27.363s, 27.206s (31.609s)

Control (game without any modifications) - 6.876s, 6.462s, 6.493s (6.61s)

For me...
these results seem a bit inconclusive.

The only conclusion that I have from this is having more pops drastically decreases performance.

Now, this is by no means a perfect test. For one, the exact times vary by a lot in some cases, and also, I'm only using 1 planet.
But it does not seem like having a ton of free jobs decreases performance. At least not in these conditions.

I have included all the saves I tested from (except control, as I haven't done anything special there). Every test was done on the latest save.

Your data set is not correctly set up. You need more pops - try 100 or 200 with many free jobs. As I wrote in my explanation it is p times j = pops times jobs.
And please add more planets.
 
View attachment 529487

My PC idling with ~1.6k threads with your normal firefox, steam, etc stuff running in background, performing ~28k context switches per second, completely responsive and normal.
"200 THREADS IZ TOO HARD THE OS CAN'T HANDLE IT!!!" - certified internet expert.

I'm sorry but I need to point out that it all depends on what the threads are doing. Context switching got faster and faster with each CPU generation for both intel and AMD.
But most threads in your list are just asleep or switch to do a minute check and go back to sleep.

With active true working threads that read & produce data changes you'll quickly discover other bottlenecks like ram access and IO - you only need very few "beefy" threads like that to saturate your system and not 200.

That is not to say that the game won't benefit if such a change is made - if possible for the engine at this point. But a new setting/optimization would be needed: planets per thread with a minimum 50-100 per thread. An old single core pc user would set this to max i.e. all planets in a single thread and have no overhead for context switching. This assumes all of the game mechanics can be seperated to a per planet basis (not true).
 
Last edited:
View attachment 529487

My PC idling with ~1.6k threads with your normal firefox, steam, etc stuff running in background, performing ~28k context switches per second, completely responsive and normal.
"200 THREADS IZ TOO HARD THE OS CAN'T HANDLE IT!!!" - certified internet expert.
Yes making Thrads that IDLE is a really big mistake we that programmers are just slowly growing out off. We just recently got some tools that allow us multtiaskign without resorting to spamming threads where they are not needed.
How does that relate to Multitasking Slowdown?

So where is that Pseudo code, you superior understanding should be trivially able to produce?

I just upgraded my computer to an 8 core CPU, an M.2 SSD, and an rtx 2080 super (not for Stellaris), and my days STILL crawl at 3x speed on the massive 1000 star map.
30 Ticks per ingame day.
30 Ticks per second at normal speed.
You are trying to run 90 per reallife second.

If each tick takes 12 ms, the game will "crawl" once you try to run it faster then your CPU can handle. Because there are only 1000 ms in 1 second, so at best you can run 83 Ticks/Second.

Also why do people still buy multicore Systems expecting a speedup of games? For the 1-thousand and 1st time: All games require Single Core Performance!
Multicores only mater for server applications.
 
  • 1
Reactions:
Yes making Thrads that IDLE is a really big mistake we that programmers are just slowly growing out off. We just recently got some tools that allow us multtiaskign without resorting to spamming threads where they are not needed.
How does that relate to Multitasking Slowdown?

So where is that Pseudo code, you superior understanding should be trivially able to produce?

I'm sorry, what are you even trying to say? You are borderline incomprehensible here.

Your claim is that multi-threading would slow the game down yet you provide no proof? You have no arguments at all, yet expect me to do work to prove something so laughably wrong?

I never said they really mattered, i just contested the notion that they use negligible amounts of cpu activity.

I mean, this is just quibbling over definitions. There would be some performance impact for 200 threads, but it would be greatly outweighed by the performance benefit. Also I just said that 200 was possible. Realistically as I said 20-40 would be more than enough even for the dumbest, simplest method of divvying up threads (just throwing random planets at each thread and letting the OS scheduler figure out what to run). Presumably it would be pretty easy to get a simple, effective heuristic to predict the CPU time each planet would use and instead divvy up them between (# of available CPU threads - 1).
 
Last edited:
Yes making Thrads that IDLE is a really big mistake we that programmers are just slowly growing out off. We just recently got some tools that allow us multtiaskign without resorting to spamming threads where they are not needed.
How does that relate to Multitasking Slowdown?


Also why do people still buy multicore Systems expecting a speedup of games? For the 1-thousand and 1st time: All games require Single Core Performance!
Multicores only mater for server applications.

It relates only based on the fact that with useless threads, the cpu spends more time doing context switches, that has no benefit. As was said above 20-40 would be the max if not overkill.

Multicore is very beneficial when the code is not data heavy as in do many-many instructions with as few data. If the code is data heavy as it is in stellaris, the upper bound is ram/cache bus performance and not core counts. The game is very, very far away from reaching that limit with the way it runs now. I pray we'll reach that ram transfer ceiling or come close to it with the next few patches, because it would also mean playing fast up to year 4000 with a million pops.

People complaining about stellaris performance when the game first launched and there after, miss a simple fact: you could play a game and come to its conclusion by defeating the crisis (not) and wiping up the galaxy before the slow downs caught up with you. I should know because that was how we played back then. Even if you had a potato pc, you could play on a medium or smaller galaxy and finish a game. The complaints came from people that wished to play after 2500 ~ 2600. While I agree on the principle, I also believe that as other PDX products have limits (eg. EU4 no support after 1821) so should be the case for stellaris, because otherwise the product could always be considered broken: "I demand a refund because the game doesn't play well in the year 4000 with 100,000 pops!! shame on you PDX, I'm calling my lawyer".

Finaly people expect performance with multicore cpus because they've been on the market for *ages* and only a few engines/game have caught up to that fact. So the complaints are valid because we are no longer in 2010. Also the modern graphics stack does use multiple threads, and even base engines use 3-4, also the os uses a few, and you need more for the background applications like the browser, discord, music player, streaming, etc, so even if you play a single core game, you will greatly benefit from a 4 core/8 thread cpu. So no multicore is not only for server applications.
 
Also why do people still buy multicore Systems expecting a speedup of games? For the 1-thousand and 1st time: All games require Single Core Performance!
Multicores only mater for server applications.
Space Engineers can make your CPU cry and beg for mercy, no matter how many cores it has. It is the rare exception in being a highly multi-core intensive game, though.

People complaining about stellaris performance when the game first launched and there after, miss a simple fact: you could play a game and come to its conclusion by defeating the crisis (not) and wiping up the galaxy before the slow downs caught up with you. I should know because that was how we played back then. Even if you had a potato pc, you could play on a medium or smaller galaxy and finish a game. The complaints came from people that wished to play after 2500 ~ 2600. While I agree on the principle, I also believe that as other PDX products have limits (eg. EU4 no support after 1821) so should be the case for stellaris, because otherwise the product could always be considered broken: "I demand a refund because the game doesn't play well in the year 4000 with 100,000 pops!! shame on you PDX, I'm calling my lawyer".
I want Stellaris to not be unbearably slow on default settings on a roughly 1 year old high end i7 when playing all the way through the end game crisis. Right now, I'm getting 50-100 years before the crisis before the slowdown bores me into playing something else. I don't think that my expectations are unreasonable.
 
Last edited:
That is not to say that the game won't benefit if such a change is made - if possible for the engine at this point. But a new setting/optimization would be needed: planets per thread with a minimum 50-100 per thread. An old single core pc user would set this to max i.e. all planets in a single thread and have no overhead for context switching. This assumes all of the game mechanics can be seperated to a per planet basis (not true).
In practice, application developers don't work (or at least shouldn't work) at such low level. Their job is to define independent segments of work (such as 1 per planet). It's the job of parallelization library to create number of threads suitable for the system it runs on and schedule independent segments for execution.

Multicore is very beneficial when the code is not data heavy as in do many-many instructions with as few data. If the code is data heavy as it is in stellaris, the upper bound is ram/cache bus performance and not core counts. The game is very, very far away from reaching that limit with the way it runs now. I pray we'll reach that ram transfer ceiling or come close to it with the next few patches, because it would also mean playing fast up to year 4000 with a million pops.
Why do you think Stellaris is data heavy? You can describe a pop in under 64 bytes (that was covered much earlier in the thread). Let's say jobs/buildings/districts are about the same (likely they are smaller). That's 8K pop/job pairs per megabyte. 8K pop/job pairs is already more than sufficient to meet victory condition. Checking CPU utilization by Stellaris also shows that there are very few L1(!) cache misses which suggests that the data load is very low.
 
Your data set is not correctly set up. You need more pops - try 100 or 200 with many free jobs.
Pardon?
They all had:
Default Galaxy settings, except no default, fallen and marauder empires
Machine Empire
1 colony
Just over 10k pops
10 ± x Nexus Districts
1 Drone Storage building
and
Code:
   planet_modifier = {
        planet_housing_add = 1000
        job_maintenance_drone_add = 300
        job_technician_drone_add = 220
        job_mining_drone_add = 200
        job_agri_drone_add = 140
        job_alloy_drone_add = 70
        job_calculator_add = 70
        planet_crime_add = -10000
        planet_crime_no_happiness_add = -10000
    }
+2 Nexus Districts = +2000 jobs
-2 Nexus Districts = -2000 jobs

As for adding more colonies...
I will... if and when I decide to do this again...