The single CCD cache stack was a huge problem with the original x3d dual ccd chips (7900/12coreand 7950/16core) as OS’es were not “ccd aware” about which chiplet has the cache. There is a huge access penalty for cache traversal between chiplets as the data has to travel across the infinity fabric rather than inside the CCD only. When the OS would distribute threads across both ccd’s because it only saw threads, you would never get the full benefit of x3d due to loads being placed on cores without x3d and incurring the traversal penalty. This has largely been fixed in Windows and Linux by now, but it still limits the true potential of the non-x3d ccd.
Dual cache stacks would allow both CCD’s to utilize the full benefits of a massive cache rather than just one.
The single CCD cache stack was a huge problem with the original x3d dual ccd chips (7900/12coreand 7950/16core) as OS’es were not “ccd aware” about which chiplet has the cache. There is a huge access penalty for cache traversal between chiplets as the data has to travel across the infinity fabric rather than inside the CCD only. When the OS would distribute threads across both ccd’s because it only saw threads, you would never get the full benefit of x3d due to loads being placed on cores without x3d and incurring the traversal penalty. This has largely been fixed in Windows and Linux by now, but it still limits the true potential of the non-x3d ccd.
Dual cache stacks would allow both CCD’s to utilize the full benefits of a massive cache rather than just one.
Exactly what I said in my comment at exactly the same time, but you worded it better 😅