I don't understand, why Stellaris developers won't use Factorio's developers' experience as reference (especially, because Factorio developers are kind enough to delve into technical details of the issues they've encountered and solutions they've found).
Here they describe what was the issue and how they resolved it and
here somebody else playtested and described how effective the change was.
As much as I understand that issue with all pops (correct me if I'm wrong) is that all pops are processed globally by the same thread. Why not do the threading of pops on the same planet and use something similar to "Wake-up Lists" where main thread that just collects results of the pops calculations on the planets, when they are finished and processes them? And if pop calculates something for entire country, just move it to the thread with pops that do same country-wide calculations. Such thing can be applied to anything that interacts only with limited objects. And even if something interacts with multiple objects, something similar to "Wake-up Lists" can be used to flag when calculations are done so main thread will collect and process the results.
I'll leave most important quote from their blog here just in case:
I feel that this solution (if applied correctly) can resolve a lot of performance issues Stellaris is facing right now. Pop reduction bandaids aren't really a solution, IMO.