[Update:Solution]

It was my router which set STP on by default. Switching it off (in smaller networks) or using RSTP made the delays go away.

[/Update]

Hóla!

For a long time I’ve got this horribly annoying problem: Upon bootup, ANY domain-machine that is using LAN (no probs with wireless) has an idle-time with “there’s no network!” of about 1-2mins until they discovered the network. BUT only windows-machines. Linux boxes get net instantly. Also on LAN.

Setup: 2 Domaincontrollers, Server2019. Both are DNS, one is DHCP and NPS for WIFI. All machines have fixed IPs, the DHCP is just for wireless clients.

I have tried everything I could think of, like NIC-Drivers, OpenDHCP, temporarily changed the switch from a managed one to a dumb one, changed the NIC in the server, let only one DC be alive at a time, rejoined the domain, the usual sfc/dism-approach and whatnot.

I asked once on reddit, but everyone just told me “that’s DHCP!”, yet it’s (seemingly at least) not. All have fixed IPs, but using dhcp doesn’t change a thing.

So I’m clueless again, hoping for some nerd that’s nerdier than me to have an idea :)

  • Sailing7@lemmy.ml
    link
    fedilink
    arrow-up
    8
    ·
    11 months ago

    I know this is stupid to ask but can you test setting up servers fresh from a .iso? No template, no domain join, no nothing that would create any predefined settings. If the issue doesnt persist, maybe there is a legacy gpo or something that forces it for domain recognition before allowing other network traffic. Or something completely different but we gotta corner the problem in with troubleshooting.

    And also maybe create a script that’s being fired at bootup. The script could write the timecode and the “ipconfig /all” and “route print” into a textfile every few miliseconds.

    This would create large logfiles but might help. Since if you are even uncapable of pinging local adresses with IPv4 adresses, maybe the network stack just simply doesn’t load fast enough.

    Also some additional info might help with cornering it in such as:

    • is it only occuring on Virtualized Machines?
    • what Hypervisor is being used?
    • are there more than one kind of hypervisor brand? (For e.g. Vmware and Hyper-V)
    • is the problem also ocurring on Bare Metal Servers? (Windows Server OS being installed directly on the Server without usage of Virtualisation)
    • is your Domain Forest an old one, that you didnt create initially - or another way of asking: could there be GPO’s or Templates that have settings in them, that you dont know about?
    • did you already try to connect two servers together by directly connecting them to each other and sniffing the NIC output via Wireshark? Maybe you can use this to parallel Check the behaviour of the bootup script with the Routing Tables and IP-Settings. Maybe somthing sticks out weirdly enough to catch your attention?
    • Dyskolos@lemmy.zipOP
      link
      fedilink
      arrow-up
      9
      ·
      11 months ago

      NVM, I finally found the culprit by accident…my switch enabled STP (slow) by default. Switching it off or using RSTP fixed the delays. Thanks for helping anyway man!

      • Sailing7@lemmy.ml
        link
        fedilink
        arrow-up
        4
        ·
        11 months ago

        Holy moly Networking Class… I’m getting flashbacks to my time when in the Simulated Cisco Environment we tried the SPT out and yes you are right. It takes a short but nonetheless weird amount of time for it to timeout.

        Thanks for giving me the updates. If I or somebody else ever has similar symptoms maybe they will find this thread :D

        I gotta say I think I would never had targeted SPT as the culprit. Though to be fair I only use dumb switches in my homelab and at the corp, the Networking department gatekeeps the nice stuff a bit :3

        Anyway, I’m happy you found out and were able to fix it. <3

        • Dyskolos@lemmy.zipOP
          link
          fedilink
          arrow-up
          4
          ·
          11 months ago

          If I’d tell you that I was trying to fix that shit for over a year now and gave up 4 times already…

          Yeah totally. Would’ve never thought the culprit there. But it started to make total sense. Only lan. Only physical. Even switching the nic off and on again. But not in a vm. There was only one denominator here. The effing switch.

          Well, if you use pro-stuff at home, better be a pro lol. Thanks anyway man. It nudged me in right direction.

          At this point I was willing to try sacrificing sheep or reading a manual.

          • Sailing7@lemmy.ml
            link
            fedilink
            arrow-up
            3
            ·
            11 months ago

            You were ready for reading the manual. Darn good that you’ve made it without passing that line. Once you pass it you never come back to being sane again, you know?

            :D

            • Dyskolos@lemmy.zipOP
              link
              fedilink
              arrow-up
              3
              ·
              11 months ago

              I knoooow. That’s what i feared most. Luckily i lacked the balls to cross the final frontier 😁

    • Dyskolos@lemmy.zipOP
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      11 months ago
      • no. Also physical machines.

      • hypervisor is proxmox. But there’s only linux-machines which all have no problems.

      • yes also bare metal servers. They both are.

      • the forest is old (2003 or so) and migrated a lot. I created it. I already tried disabling all gpos and returning to default.

      Will try the wiresharking approach. Good hint. Didn’t even think of it. The bootup-log-script is also a good idea. Will do that. Thanks man!