• intelshill@lemmy.ca
    link
    fedilink
    arrow-up
    13
    ·
    9 months ago

    Frankly, I think you’re missing the forest for the trees.

    Yes, SMIC is behind the big players, but these newer node shrinks are delivering increasingly small iterative gains. Meanwhile, leakage power is becoming an increasingly important concern for smaller nodes, and much of the technologies that these advanced nodes are delivering (e.g., backside power, 3D stacking) are not coupled with the node itself. Indeed, there’s no reason GAAFET (or whatever people are calling it these days) needs a smaller node, only that it doesn’t really make financial sense on a larger one. It would, if, say, you wanted to develop the technical capability and had idle engineering cycles waiting for machines. So, what does this mean? SMIC can progress at will on everything except the node itself. It’ll remain behind on sheer density, but Intel showed that you can be stuck on a single node for the better part of a decade and still be a market leader. SMIC already knows how to scale a fab (the Huawei Mate 60 moved more than 30 million units in less than a year).

    On the process side, SMIC will be behind but not obscenely so. What about at the architecture level? Here, there lies a big problem. Most of the world has been locked in to the CUDA ecosystem. They’re tied to Nvidia for any massively parallel computing needs, and Nvidia can charge whatever price they want. This is, unsurprisingly, a problem, because Nvidia’s hardware is, believe it or not, not the most optimized for a wide variety of applications. There’s been an abundance of research into specialized accelerators for applications like machine learning and scientific computing (and indeed, also many real-world designs) which deliver up to an order of magnitude increase in perf/W, but, perhaps more importantly, a substantially reduced TCO due to not having to pay Nvidia’s obscene prices. Indeed, this can be seen by the fact that Huawei’s Ascend accelerators are actually getting tractíon in the Chinese market. So, at scale, China will be hogging up more electricity for low-capital-cost data centers with higher operating cost… Fortunately, China has an obscene amount of basically free green energy coming online in the next few years.

    So, what’s all this concern about? It’s pretty simple: mobile applications are constrained by energy-efficiency, and you can’t get around key energy-efficiency limits without shrinking the transistor. Fortunately, advanced nodes aren’t used in most military or space applications due to reliability concerns… So, the main application for them will be consumer-focused applications like smartphones, autonomous vehicles, and drones. This is a rather annoying problem given that China is betting their next decade of growth on the EV transition, but it’s not insurmountable in the near-term given the sheer cost advantage China has in shipping non-autonomous EVs due to efficiencies in the rest of the supply chain. The same holds for drones, where DJI is the undisputed market leader. For smartphones, Huawei has the largest captive market in the world and Apple is rapidly losing market share.

    Ok, so, having established the challenges, how far away is SMIC from a solution?

    As discussed, SMIC already operates fabs at scale for 7nm, and indeed SMEE has demonstrated the ability to ship a DUV lithography machine. Intel7 is also entirely DUV, so this isn’t all that surprising. NAND processes still use DUV as well. Intel’s TSMC N7-equivalent process first started shipping with Ice Lake in late 2019, while SMIC’s 7nm process started shipping with the Kirin 9000S in mid 2023. SMIC is expected to start shipping 5nm at volume later this year (how? I’ll never know, but clearly the yields are alright because they’re going to be supplying Huawei again).

    I’ll stress this again and again: yields are for the most part, an engineering problem. Getting high yield with multi-patterning requires some absurdly complex engineering, but it’s not inherently impossible with tight enough tolerances. It wouldn’t be economically viable in an open market where EUV is available, but it isn’t.

    The big concern right now is, surprisingly, not on the manufacturing side but on the software side. There is no equivalent in the world for American EDA tools. None. The minute sanctions stretch towards EDA, China will be set back by more than a decade. China lacks EDA tools for newer nodes.

    • Varyk
      link
      fedilink
      arrow-up
      5
      arrow-down
      6
      ·
      9 months ago

      I’m not sure what you think “missing the forest for the trees” means by combatively agreeing with my broad strokes analysis of the situation(forest) by listing a couple dozen of your own technical corroborating details (trees).

      It was an interesting read, though.

      Did you send it to the wrong person?

      • intelshill@lemmy.ca
        link
        fedilink
        arrow-up
        9
        ·
        9 months ago

        Your claims are that China is 20 years behind and lacks the technological know-how to catch up. My claim is that China is 5 years behind hardware-wise, already knows how to scale fabrication for near-bleeding-edge nodes, that those 5 years are only marginally important because of the end of Moore’s Law, and that Nvidia’s effective monopoly makes that 5 year lead very tenuous indeed. Essentially, that the West’s technology node lead is basically irrelevant.

        This entire semiconductor war is operating under the assumptions that China is bound by power and production constraints (due to cost) and thus cannot outpace US AI development. Military/space applications don’t care, Huawei’s already fucked by sanctions, consumer electronics OEMs are untouched by sanctions, edge devices consider 28nm to be the brand shiny new node…

        Basically, it pins the entire issue on the back of Nvidia. Of course, Nvidia is now charging $40k per H100, Facebook holds the bulk of them, and the cost of building a GPU cluster is skyrocketing.

        Meanwhile, construction is cheap in China due to oversupply in the construction industry, electricity is cheap in China due to oversupply in the green energy industry, and basically everything 14nm+ is also cheap due solely to already having the equipment to build it. Nvidia’s lead in efficiency cannot offset 40k capex, which is what the next few years will show.

        So, surprisingly, my conclusion is that China’s lack of bleeding-edge node capability does not have a significant immediate geopolitical implication. China is perfectly capable of building 28nm chips, and with that 14nm is rather trivial. This is using either domestic supply chains or unsanctioned foreign supply chains. Should China languish for another decade, we may run into problems, but SMIC is on track to release 5nm this year (widely estimated to be the peak economically achievable by DUV quad-patterning) and SMEE has already announced a domestic ArFi DUV lithography machine. The supply chain is there. The gap towards EUV is twofold: the high-power light source and the photoresist. Reminder that both TSMC 7nm and Intel 7 do not use EUV, and their yields are perfectly acceptable (both nodes took some time to scale, I’ll admit). Everything else is a matter of throwing manpower at the problem.

        Let’s take a step back and assume that China is stuck in their current state indefinitely. That is, they can produce DUV machines absurdly cheaply, but will never have an EUV machine. Even if China is stuck on 7nm while Intel rolls out 18A, so what? $/transistor isn’t scaling with node anymore, so the main driver is perf/W. Given that China’s green energy transition will pump absurd amounts of electricity into the grid in the near term, the more accurate driver is work/J. This matters solely, solely for mobile applications where reliability isn’t a concern.

        • Varyk
          link
          fedilink
          arrow-up
          1
          arrow-down
          4
          ·
          edit-2
          9 months ago

          Again, a strangely confrontational near total agreement with my conclusions that China will be able to easily produce consumer-centered microchips and have difficulty closing the gap to cutting edge microchips.

          Two differences of opinion but I can see:

          1. You argue that progressively more advanced microchips don’t matter, I cannot see how having more advanced thinking machines is going to be a less important to automation, AI, national security going forward than they already are.

          2. you believe it’s only a matter of “throwing manpower at the situation” for the Chinese to catch up to TSMC, which does not bear out.

          If restaurant A has one cook that has created the most popular omelet through a set of interdependent recipes and complex creative cooking methods, and restaurant B next door hires a new cook every week, instructs them how to create an adequate omelette, and asks them to create a better omelet than the cook at A, restaurant B could easily be stuck with 100 cooks who have learned how to create an adequate omelette and are continuing that set of processes without ever finding a better recipe.

          It does not automatically follow that having more cooks is going to result in a better omelette.

          Culturally, Taiwan thrives on innovation and creativity. China survives by hierarchy, tradition and established processes.

          Again, it isn’t impossible that those one hundred cooks will come up with the perfect omelette, although it’s illogical to think all it takes is hiring more cooks and teaching them the recipe for an adequate omelette to create a better omelette then the cook at restaurant A.

          • intelshill@lemmy.ca
            link
            fedilink
            arrow-up
            1
            ·
            9 months ago

            More efficient chips do not have emergent behaviours (outside of, say, mobile and autonomous vehicles). More efficient chips make things more economical. Total compute capability is a function of manufacturing capability (which reflects in capital cost), electricity (which reflects in operating cost), and efficiency (which also reflects in operating cost). If your manufacturing capability is obscene and your electricity output is obscene, then you can handwave a lot of efficiency concerns by just scaling the number of chips you have in a system. In terms of aggregate computing capability, 5nm is more than sufficient to keep pace given enough scale.

            There’s an interesting figure that I saw a while ago: China’s % of electricity generation dedicated to data centers is lower than both the US and EU, and due to top line electricity generation growth this proportion is basically not expected to move in the next decade. China has a LOT of freedom to tank efficiency losses that other regions simply do not.

            There’s a small condition here that scaling usually has some degree of losses, but for LLM training it’s basically non-existent and for supercomputing it’s supposed to be around 10% losses due to networking/etc.

            • Varyk
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              9 months ago

              That is interesting, do you recall where you saw that data about electricity generation growth in different countries?