Sponsor: Get 10% off Squarespace purchases (https://geni.us/BqEpf)We're benchmarking the Intel APO tool on Intel's new Core i9-14900K CPU. APO is compatible ...
Not sure why is so pessimistic about future support after seeing all the effort Intel has put into Arc drivers, which are obviously manual tuning too. APO will never ever come to every game, most games wont even benefit much from it, and it would be far too much work, but all they would need to do is look at like the top 100 games played every year, quickly go through them and see which are under performing due to scheduling issues (not hard to do), then hand tune the ones they expect to find performance left on the table.
Finding an additional 30% performance, and lower power consumption is definitely worth the effort, its far cheaper for Intel to go down this route than it is to get these gains in silicon. And its not like Intel has any plans to move away from heterogeneous designs anytime soon, even AMD is now doing them and they have their own scheduler issues (X3D on 1/2 CCDs and Zen4+Zen4c).
I’d obviously like to see support on 13th gen and the midrange SKUs too, and ideally not have a separate APO app.
but all they would need to do is look at like the top 100 games played every year
My main hypothesis on this subject - perhaps they already did, and out of the top 100 games only 2 games was possible to accelerate via this method, even after exhaustively checking all possible affinities and scheduling schemes, and only on CPUs with 2 or more 4-clusters of E-cores.
The support for the hypothesis is the following suggestions:
how temporally stable the thread behaviors might need to be, probably disqualifying apps with any in-app task scheduling / load balancing
the signal that they possibly didn’t find a single game where 1 4-core E-cluster is enough (how rarely is this applicable if they apparently needed 2+, for… some reason?)
the odd choice of Metro Exodus as pointed out by HUB - it’s a single player game with very high visual fidelity, pretty far down the list of CPU limited games (nothing else benefited?)
the fact that none of the games supported (Metro and Rainbow 6) are based on either of the two most popular game engines (Unity and Unreal), possibly reducing how many apps could be hoped to have similar behavior and possibly benefit.
Now, perhaps the longer list of games they show on their screenshot is actually the games that benefit, and we only got 2 for now because those are the only ones they figured (at the moment) how to detect threads identities in (possibly not too far off from as curiously as this), or maybe that list is something else entirely and not indicative of anything. Who knows.
And then there comes the discussion you’re having, re implementation, scaling, and maintenance with its own can of worms.
And its not like Intel has any plans to move away from heterogeneous designs anytime soon, even AMD is now doing them and they have their own scheduler issues (X3D on 1/2 CCDs and Zen4+Zen4c).
AMD isn’t really doing anything heterogeneous, pal.
Correct me if I’m wrong here, but apart from the different clock-frequency properties, Zen4c-cores are in fact *identical* to the usual full-grown Zen4-Cores. Zen4c-Cores are barely anything else than a compactly built and neatly rearranged Zen4-Core, without the micro-bumps for the 3D-Cache. The only downside is the lower max clocks, and that’s literally it.
The main reason for AMD introducing any whatsoever Zen4c-Core was the mere fact of their increased power-density (Server-space; Muh, racks!), so solely for space-saving reasons andoverall efficiency and that’s it.
Even the L2-cache is identical, isn’t it?
→ A Zen4c-Core is not a E-Core, as it’s architecturally identical to any Zen4-Core, same IPC.
Same story for the X3D-endabled Cores/Chiplets. Identical apart from a larger cache.
So I don’t really know what you’re actually talking about when erroneously claiming AMD would also have jumped the heterogeneous hype-train. That statement of yours is utter nonsense.
On AMD there’s no heterogeneous mixing in terms of different IPC-/architecture-cores, being different and as such needing to be scheduled accordingly to run properly. Only Intel needs to rely on a heterogeneous-aware (and capable!) scheduler and depends on proper scheduling to NOT kill performance.
Meanwhile, for any mix-and-max AMD Zen4/Zen4c-CPU, it’s fundamentally irrelevant what core a thread is running on, as it doesn’t matter anyway. In fact, the scheduler doesn’t even need to know which core is a usual Zen4 and which is a Zen4c.
AMD’s designs are heterogeneous in terms of different chiplets/configs, yes.
The heterogeneousness you are talking about isn’t even remotely the same as heterogeneousness in terms of Heterogeneous computing (system [on a Chip], that uses multiple types of computing-cores) in terms of different architectures as Intel uses in their Hybrid-SoCs. So no, no heterogeneousness for you!
and this is why amd’s 3d + normal chiplet cpus arent having as hard a time as intels mess. heck even if amd wants to go big little they can have a big chiplet and a little chiplet to avoid many of these problems
Not sure why is so pessimistic about future support after seeing all the effort Intel has put into Arc drivers, which are obviously manual tuning too. APO will never ever come to every game, most games wont even benefit much from it, and it would be far too much work, but all they would need to do is look at like the top 100 games played every year, quickly go through them and see which are under performing due to scheduling issues (not hard to do), then hand tune the ones they expect to find performance left on the table.
Finding an additional 30% performance, and lower power consumption is definitely worth the effort, its far cheaper for Intel to go down this route than it is to get these gains in silicon. And its not like Intel has any plans to move away from heterogeneous designs anytime soon, even AMD is now doing them and they have their own scheduler issues (X3D on 1/2 CCDs and Zen4+Zen4c).
I’d obviously like to see support on 13th gen and the midrange SKUs too, and ideally not have a separate APO app.
My main hypothesis on this subject - perhaps they already did, and out of the top 100 games only 2 games was possible to accelerate via this method, even after exhaustively checking all possible affinities and scheduling schemes, and only on CPUs with 2 or more 4-clusters of E-cores.
The support for the hypothesis is the following suggestions:
Now, perhaps the longer list of games they show on their screenshot is actually the games that benefit, and we only got 2 for now because those are the only ones they figured (at the moment) how to detect threads identities in (possibly not too far off from as curiously as this), or maybe that list is something else entirely and not indicative of anything. Who knows.
And then there comes the discussion you’re having, re implementation, scaling, and maintenance with its own can of worms.
that makes a lot of sense…
AMD isn’t really doing anything heterogeneous, pal.
Correct me if I’m wrong here, but apart from the different clock-frequency properties, Zen4c-cores are in fact *identical* to the usual full-grown Zen4-Cores. Zen4c-Cores are barely anything else than a compactly built and neatly rearranged Zen4-Core, without the micro-bumps for the 3D-Cache. The only downside is the lower max clocks, and that’s literally it.
The main reason for AMD introducing any whatsoever Zen4c-Core was the mere fact of their increased power-density (Server-space; Muh, racks!), so solely for space-saving reasons andoverall efficiency and that’s it.
Even the L2-cache is identical, isn’t it?
→ A Zen4c-Core is not a E-Core, as it’s architecturally identical to any Zen4-Core, same IPC.
Same story for the X3D-endabled Cores/Chiplets. Identical apart from a larger cache.
So I don’t really know what you’re actually talking about when erroneously claiming AMD would also have jumped the heterogeneous hype-train. That statement of yours is utter nonsense.
On AMD there’s no heterogeneous mixing in terms of different IPC-/architecture-cores, being different and as such needing to be scheduled accordingly to run properly. Only Intel needs to rely on a heterogeneous-aware (and capable!) scheduler and depends on proper scheduling to NOT kill performance.
Meanwhile, for any mix-and-max AMD Zen4/Zen4c-CPU, it’s fundamentally irrelevant what core a thread is running on, as it doesn’t matter anyway. In fact, the scheduler doesn’t even need to know which core is a usual Zen4 and which is a Zen4c.
AMD’s designs are heterogeneous in terms of different chiplets/configs, yes.
The heterogeneousness you are talking about isn’t even remotely the same as heterogeneousness in terms of Heterogeneous computing (system [on a Chip], that uses multiple types of computing-cores) in terms of different architectures as Intel uses in their Hybrid-SoCs. So no, no heterogeneousness for you!
and this is why amd’s 3d + normal chiplet cpus arent having as hard a time as intels mess. heck even if amd wants to go big little they can have a big chiplet and a little chiplet to avoid many of these problems