They’d have to actually spend chip real estate on DL/RT.
So far they’ve been talking about using DL for gameplay instead of graphics. So no dedicated tensor units.
And their RT has been mostly just there to keep up with NV feature wise. They did enhance it somewhat in RDNA3 apparently. But NV isn’t waiting for them either.
Second gen AMD HW ray-tracing still has a worse performance impact than Intel first gen HW ray-tracing. No need to talk about Nvidia here, as they are miles ahead. Either AMD is not willing to expend more resources on RT or they aren’t able to improve performance.
I think it’s the latter. AMD’s spent years trying to make their GPU serve their CPU. It was apparently one of their big incentives for buying ATI in the first place, as illustrated by the AMD Fusion effort that gave us those weird modules that had two integer units with a FP unit. They were never able to make it work particularly well, but since then they seem stuck and incapable of getting actual performance from their GPUs that’s in line with theoretical performance.
You seem to know more than me so forgive is this is out of place, but how are they incapable of getting performance out of their GPUs?
I mean they basically have an answer for everything except nvidias absolute top of the line, and their RT is one generation behind. Don’t they have slightly better raster performance too at similar price points?
Excuse me if you were just being hyperbolic but the notion that AMD cant answer one card from NVIDIA and thus they are complete shit is a bit hyperbolic to me
The A750 beats the RTX 3060 in Cyberpunk, Control, and Metro Exodus. All Nvidia sponsored titles. In Metro it beats the 3060ti. The main reason is probably because their cards have the RT hardware that was initially meant to compete with GPUs 1 level above what they ended up being. the A770 is a current 3060ti competitor, with the RT hardware meant to originally compete with a 3070ti.
But I don’t think it matters what generation AMD or Nvidia, or any of them are on. It’s not that AMD couldn’t build hardware from day 1 that could compete in RT. It’s that they viewed it to be a waste of space, so they did the bare minimum with RDNA2 to be compatible. Spending 10% more on a die is going to cut into your margins a lot, unless you also increase the price of the GPU.
It’s been a conscious decision for years, not a failed effort.
AMD’s RT hardware is intrinsically tied to the texture unit, which was probably a good decision at the start since Nvidia kinda caught them with their pants down and they needed something fast to implement (especially with consoles looming overhead, wouldn’t want the entire generation to lack any form of RT).
Now, though, I think it’s giving them a lot of problems because it’s really not a scalable design. I hope they eventually implement a proper dedicated unit like Nvidia and Intel have.
There’s not really anything intrinsically /wrong/ with tying it to the same latency-hiding mechanisms as the texture unit (there’s nothing in the ISA that /requires/ it to be implemented in the texture unit, more likely that’s already the biggest read bandwidth connection to the memory bus so may as well piggyback off it). I honestly wouldn’t be surprised if the nvidia units were implemented in a similar place - as it needs to be heavily integrated to the shader units, while also having a direct fast path to memory reads.
One big difference is that nvidia’s unit can do a whole tree traversal with no shader interaction, while the AMD one just does a single node test and expansion then needs the shader to queue the next level. This means AMD’s implementation is great for hardware simplicity, and if the there’s always a shader scheduled that is doing a good mix of RT and non-RT instructions it’s not really much slower.
But that doesn’t really happen in the real world - the BVH lookups are normally all concentrated to an RT pass and not spread over all shaders over the frame. And that batch tends to not have enough other work to be doing to fill the pipeline while waiting for the BVH lookups. If you’re just waiting on a tight loop of BVH lookups, the pass back to the shader to just submit the next BVH lookup is a break in the pipelining or prefetching you might otherwise be able to do.
But it might also be more flexible - anything that that looks a bit like a BVH might be able to do fun things with the BVH lookup/triangle-ray intersection instructions, not just raytracing, but there simply doesn’t seem to be a glut of use cases for that as-is. And then unused flexibility is just inefficiency, after all.
I doubt they were surprised at all. Isn’t RDNA3 very similar to RDNA2? They could have fixed it there, and they decided on minor improvements instead.
Wasn’t RDNA2 designed with Sony and Microsoft having an input on its features? I’m sure Sony and MS knew what was coming from Nvidia years in advance. I think Mark Cerny said developers even wanted a 16 core originally, and they were talked out of it, because they had die area restrictions. RT hardware area on those consoles probably would have equaled an extra 8 CPU cores in area if they wanted Nvidia-like RT. All just seems like cost optimization to me.
Microsoft told Digital Foundry that they had locked the specs of the Xbox Series consoles in 2016. In 2016 they knew the console would have an SSD, RT capabilities etc.
I mistook 6700XT specs, thinking it was comparison to 7800XT with 60CU but the actual comparison is 6800 vs 7800XT.
That being said
7600 is 20% faster in Alan wake, 10% more in Cyberpunk than 6600XT with same CU count at 1080p
7800XT is 37% faster than 6800 at 1440p in Alan Wake, no 6800 in Cyberpunk, you would have to infer based on 6700XT, which shows good 35% improvements 7800XT vs 6800, with an inference of 6800 being 20% faster than 6700XT like in Alan Wake with same 60CU
The die space is only one part of the puzzle. The other - AMD’s achilles heel no less - is software support. I mean, Phoenix has XDNA already, but from everything I’ve read, it’s a PITA to actually use and rather limited by its currently available driver API, and as a consequence, barely any ML library/framework support as of now.
Well, sure the application specific IP is always going to be more performant. But on a pinch, shader ALUs can do tensor processing just fine. But without a proper software stack, the presence of tensor cores is irrelevant ;-)
This. AMD struggles with making drivers that don’t crash or get you VAC banned. They’re going to have to clear that bar before they can really start competing
AMD is better than Intel on both gpu and cpu front lol. Not sure what you are on.
Idd I think AMD has solid gpu products last decade. Had several AMD gpus just as nvidia. Just because nvidia has been ahead last 3 years doesn’t invalidate AMD. Its competition and as long as they offer decent performance for the price ppl will buy it. RDNA2/3 was definitely not bad architectures - the main gap atm is upscalers and framegen but that is also reflected in the price nvidia sells for.
Of course they could. Intel does on their graphics cards. Apple does on its latest silicon.
Question is, do they have the people that could develop this, can they and do they want to spend the money on it and can they and do they want to spend the money on the software side it this as well.
Currently, it seems like they looked at it, did the math and decided to try and get by without the effort. And to a degree that’s doable. FSR2 isn’t as good as DLSS but it saves them the effort to have AI-cores on chip. Now they did the same with frame generation. Generally they seem to be able to be slightly worse for a lot less R&D-budget.
Of course, they will never leave nVidia’s shadow this way and should Intel or nVidia ever manage to offer Microsoft and Sony an APU to power the next generation of consoles but with more features, their graphics division might be well and truly fucked.
Keep in mind too if they haven’t already made these decisions to inovate and invest 4+ years ago, then any solution they come up with is still years away. Chip development is a 5+ year cycle from concept to implementation.
They’d have to actually spend chip real estate on DL/RT.
So far they’ve been talking about using DL for gameplay instead of graphics. So no dedicated tensor units.
And their RT has been mostly just there to keep up with NV feature wise. They did enhance it somewhat in RDNA3 apparently. But NV isn’t waiting for them either.
Second gen AMD HW ray-tracing still has a worse performance impact than Intel first gen HW ray-tracing. No need to talk about Nvidia here, as they are miles ahead. Either AMD is not willing to expend more resources on RT or they aren’t able to improve performance.
I think it’s the latter. AMD’s spent years trying to make their GPU serve their CPU. It was apparently one of their big incentives for buying ATI in the first place, as illustrated by the AMD Fusion effort that gave us those weird modules that had two integer units with a FP unit. They were never able to make it work particularly well, but since then they seem stuck and incapable of getting actual performance from their GPUs that’s in line with theoretical performance.
You seem to know more than me so forgive is this is out of place, but how are they incapable of getting performance out of their GPUs?
I mean they basically have an answer for everything except nvidias absolute top of the line, and their RT is one generation behind. Don’t they have slightly better raster performance too at similar price points?
Excuse me if you were just being hyperbolic but the notion that AMD cant answer one card from NVIDIA and thus they are complete shit is a bit hyperbolic to me
The A750 beats the RTX 3060 in Cyberpunk, Control, and Metro Exodus. All Nvidia sponsored titles. In Metro it beats the 3060ti. The main reason is probably because their cards have the RT hardware that was initially meant to compete with GPUs 1 level above what they ended up being. the A770 is a current 3060ti competitor, with the RT hardware meant to originally compete with a 3070ti.
But I don’t think it matters what generation AMD or Nvidia, or any of them are on. It’s not that AMD couldn’t build hardware from day 1 that could compete in RT. It’s that they viewed it to be a waste of space, so they did the bare minimum with RDNA2 to be compatible. Spending 10% more on a die is going to cut into your margins a lot, unless you also increase the price of the GPU.
It’s been a conscious decision for years, not a failed effort.
AMD’s RT hardware is intrinsically tied to the texture unit, which was probably a good decision at the start since Nvidia kinda caught them with their pants down and they needed something fast to implement (especially with consoles looming overhead, wouldn’t want the entire generation to lack any form of RT).
Now, though, I think it’s giving them a lot of problems because it’s really not a scalable design. I hope they eventually implement a proper dedicated unit like Nvidia and Intel have.
I am pretty sure they already have something in the pipeline, its just that it can take half a decade from low level concept to customer sales…
That’s what RDNA4 will introduce.
There’s not really anything intrinsically /wrong/ with tying it to the same latency-hiding mechanisms as the texture unit (there’s nothing in the ISA that /requires/ it to be implemented in the texture unit, more likely that’s already the biggest read bandwidth connection to the memory bus so may as well piggyback off it). I honestly wouldn’t be surprised if the nvidia units were implemented in a similar place - as it needs to be heavily integrated to the shader units, while also having a direct fast path to memory reads.
One big difference is that nvidia’s unit can do a whole tree traversal with no shader interaction, while the AMD one just does a single node test and expansion then needs the shader to queue the next level. This means AMD’s implementation is great for hardware simplicity, and if the there’s always a shader scheduled that is doing a good mix of RT and non-RT instructions it’s not really much slower.
But that doesn’t really happen in the real world - the BVH lookups are normally all concentrated to an RT pass and not spread over all shaders over the frame. And that batch tends to not have enough other work to be doing to fill the pipeline while waiting for the BVH lookups. If you’re just waiting on a tight loop of BVH lookups, the pass back to the shader to just submit the next BVH lookup is a break in the pipelining or prefetching you might otherwise be able to do.
But it might also be more flexible - anything that that looks a bit like a BVH might be able to do fun things with the BVH lookup/triangle-ray intersection instructions, not just raytracing, but there simply doesn’t seem to be a glut of use cases for that as-is. And then unused flexibility is just inefficiency, after all.
I doubt they were surprised at all. Isn’t RDNA3 very similar to RDNA2? They could have fixed it there, and they decided on minor improvements instead.
Wasn’t RDNA2 designed with Sony and Microsoft having an input on its features? I’m sure Sony and MS knew what was coming from Nvidia years in advance. I think Mark Cerny said developers even wanted a 16 core originally, and they were talked out of it, because they had die area restrictions. RT hardware area on those consoles probably would have equaled an extra 8 CPU cores in area if they wanted Nvidia-like RT. All just seems like cost optimization to me.
Microsoft told Digital Foundry that they had locked the specs of the Xbox Series consoles in 2016. In 2016 they knew the console would have an SSD, RT capabilities etc.
They could already have in mind the next console
RDNA3 is up to 60% faster than RDNA2 equivalent in epath tracing
Isn’t that only because RDNA3 is available with that many more CUs? Afaik the per-unit and per-clock RT performance of RDNA3 is barely ahead of RDNA2.
I mistook 6700XT specs, thinking it was comparison to 7800XT with 60CU but the actual comparison is 6800 vs 7800XT.
That being said
7600 is 20% faster in Alan wake, 10% more in Cyberpunk than 6600XT with same CU count at 1080p
7800XT is 37% faster than 6800 at 1440p in Alan Wake, no 6800 in Cyberpunk, you would have to infer based on 6700XT, which shows good 35% improvements 7800XT vs 6800, with an inference of 6800 being 20% faster than 6700XT like in Alan Wake with same 60CU
https://www.techpowerup.com/review/alan-wake-2-performance-benchmark/7.html
https://www.techpowerup.com/review/cyberpunk-2077-phantom-liberty-benchmark-test-performance-analysis/6.html
Can they make dedicated tensor to units or is that patented?
“Tensor Units” are just low-precision matrix multiplication units.
AMD got a lot of AI-related IP when they made the acquisition of Xilinx. It’s just a matter of them dedicating the die space to it.
The die space is only one part of the puzzle. The other - AMD’s achilles heel no less - is software support. I mean, Phoenix has XDNA already, but from everything I’ve read, it’s a PITA to actually use and rather limited by its currently available driver API, and as a consequence, barely any ML library/framework support as of now.
They don’t even need to make dedicated tensor units, since programmable shaders already have the necessary ALU functionality.
The main issue for AMD is their software, not their hardware per se.
Nah, throughput of tensor cores is far to high to compete against
Well, sure the application specific IP is always going to be more performant. But on a pinch, shader ALUs can do tensor processing just fine. But without a proper software stack, the presence of tensor cores is irrelevant ;-)
This. AMD struggles with making drivers that don’t crash or get you VAC banned. They’re going to have to clear that bar before they can really start competing
Those VAC bans really kinda sum the lack of ability with AMD’s software. AMD can’t ship fluid frames without literally getting you banned.
Stop for a moment and think about this, AMD can’t even catch up to nvidia/intel much less be at the forefront.
Really, AMD only exists so nvidia doesn’t charge $2k for a 4090… so uh thanks AMD for being a joke of a competitor but saving me $400
AMD is better than Intel on both gpu and cpu front lol. Not sure what you are on.
Idd I think AMD has solid gpu products last decade. Had several AMD gpus just as nvidia. Just because nvidia has been ahead last 3 years doesn’t invalidate AMD. Its competition and as long as they offer decent performance for the price ppl will buy it. RDNA2/3 was definitely not bad architectures - the main gap atm is upscalers and framegen but that is also reflected in the price nvidia sells for.
Lol this sub just has it out for RTG for a few months now. The most ridiculous takes get upvoted.
WTF are you talking about, what Intel GPU is better than AMD? No one is buying Intel’s trash video cards. Also the 7800X3D is the fastest gaming chip.
They have their own equivalent in the CDNA line of compute products.
They absolutely could bring matrix multiplication units to their consumer cards, they just refuse to do so.
Just like they refuse to support consumer cards officially
Of course they could. Intel does on their graphics cards. Apple does on its latest silicon.
Question is, do they have the people that could develop this, can they and do they want to spend the money on it and can they and do they want to spend the money on the software side it this as well.
Currently, it seems like they looked at it, did the math and decided to try and get by without the effort. And to a degree that’s doable. FSR2 isn’t as good as DLSS but it saves them the effort to have AI-cores on chip. Now they did the same with frame generation. Generally they seem to be able to be slightly worse for a lot less R&D-budget.
Of course, they will never leave nVidia’s shadow this way and should Intel or nVidia ever manage to offer Microsoft and Sony an APU to power the next generation of consoles but with more features, their graphics division might be well and truly fucked.
Keep in mind too if they haven’t already made these decisions to inovate and invest 4+ years ago, then any solution they come up with is still years away. Chip development is a 5+ year cycle from concept to implementation.