According to Phoronix, Ampere’s new CPUs have so many cores that Linux doesn’t support systems when two of Ampere’s 192-core chips (384 total cores) are installed in a single server. For now, the ARM64 Linux kernel only supports systems with 256 cores or less. To fix the issue, Ampere has submitted a patch proposing that the Linux kernel core limit be raised to 512
If you’re already at 384 cores in a dual-processor setup, isn’t raising the limit to 512 too little? Why not just go for 1024 now that they’re at it, especially since the method they proposed doesn’t increase kernel image memory footprint.
+config NR_CPUS_RANGE_END+ int+ default 8192 if SMP && CPUMASK_OFFSTACK+ default 512 if SMP && !CPUMASK_OFFSTACK+ default 1 if !SMP+
It looks like it’s doing and end range of 8192 but with the off stack flag set. And it seems that…
+ This is purely to save memory: each supported CPU adds about 8KB
+ to the kernel image.
Which looks like they’re trying to save memory to avoid TLB stalling on the CPU’s bitmap. I think if the chip maker is indicating that slab allocation is fine for more at the moment (which the patch looks to be coming from Christoph Lameter, who works at Ampere), it’s best to assume they’ve tested it on their end. Or at least I would think so. If they felt that more on the stack was a fine option, I would think that, that’s exactly what they would pitch to the KML. Them saying there’s a need for offstack past 512, I’m guessing there’s a reason and the one I can think of is TLB stalls.
I agree, they are just going to hit the wall again way too fast. If the limit is 256 or 2^8, they should increase it to 65536 or 2^16. Now that’s a limit that feels safer to leave at for many year to come.
Or do what ietf did “We’re running out of 32bit addresses, should we add some bits and call it an even 48? No! Let’s double the number of addresses 96 fucking times!”
Fine point. I assume because they know there is an entire waterfall of shit they don’t want to mess with regarding memory registers for SMP, and they know this is the limit where they can patch and not have to deal with all of that.
Unfortunately these kludge solutions that last a few years have a tendency to ripple more kludge solutions when they run out, because the “proper” fix still wasn’t done. Shit that doesn’t work, but needs to work, gets high priority. Shit that works just well enough usually gets neglected until that shit doesn’t work (again).
This approach is slicing a finite resource. It can only extend so far, and it sounds like they extended it about that far in one step. The amount of information the kernel keeps about each core has to be drastically reduced, for the next order of magnitude, or else cache hardware and behavior will need to change in comically-parallel chips.
If you’re already at 384 cores in a dual-processor setup, isn’t raising the limit to 512 too little? Why not just go for 1024 now that they’re at it, especially since the method they proposed doesn’t increase kernel image memory footprint.
Well looking at the patch
+config NR_CPUS_RANGE_END + int + default 8192 if SMP && CPUMASK_OFFSTACK + default 512 if SMP && !CPUMASK_OFFSTACK + default 1 if !SMP +
It looks like it’s doing and end range of 8192 but with the off stack flag set. And it seems that…
+ This is purely to save memory: each supported CPU adds about 8KB + to the kernel image.
Which looks like they’re trying to save memory to avoid TLB stalling on the CPU’s bitmap. I think if the chip maker is indicating that slab allocation is fine for more at the moment (which the patch looks to be coming from Christoph Lameter, who works at Ampere), it’s best to assume they’ve tested it on their end. Or at least I would think so. If they felt that more on the stack was a fine option, I would think that, that’s exactly what they would pitch to the KML. Them saying there’s a need for offstack past 512, I’m guessing there’s a reason and the one I can think of is TLB stalls.
I dunno, man. 512 cores ought to be enough for anybody.
Edit: guys come on I shouldn’t have to explain the joke
I agree, they are just going to hit the wall again way too fast. If the limit is 256 or 2^8, they should increase it to 65536 or 2^16. Now that’s a limit that feels safer to leave at for many year to come.
Or do what ietf did “We’re running out of 32bit addresses, should we add some bits and call it an even 48? No! Let’s double the number of addresses 96 fucking times!”
Start using 128bit for everything.
If you have to solve a problem, do it in a way that solves it for good.
Max value of
uint128
is ~340 undecillion (~3.4e38).Is it fair to assume that those are more cores than there ever has and will be made?
Honestly, I think so.
Fine point. I assume because they know there is an entire waterfall of shit they don’t want to mess with regarding memory registers for SMP, and they know this is the limit where they can patch and not have to deal with all of that.
An immediate kludge buys time for a worthwhile general solution.
And if that kludge only buys a few years, we’re less likely to see it Frankensteined into a shitty general solution.
Unfortunately these kludge solutions that last a few years have a tendency to ripple more kludge solutions when they run out, because the “proper” fix still wasn’t done. Shit that doesn’t work, but needs to work, gets high priority. Shit that works just well enough usually gets neglected until that shit doesn’t work (again).
This approach is slicing a finite resource. It can only extend so far, and it sounds like they extended it about that far in one step. The amount of information the kernel keeps about each core has to be drastically reduced, for the next order of magnitude, or else cache hardware and behavior will need to change in comically-parallel chips.