Tried it out in the past couple of days to manage k8s volumes and backups on s3 and it works surprisingly well out of the box. Context: k3s running on multiple raspberry pi

  • notfromhere@lemmy.one
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Which Pi hardware and OS are you running? I’m fixing to stand up two Pi 4’s and about four Pi 3Bs. I was planning to run Ubuntu Server 22.04 arm64.

    • justpassingbyOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Two Pi4 Model B 8gb and one Pi4 Model B 4gb. OS I am using the latest ubuntu 64. A word of warning, I am having some issues with longhorn breaking the fs of my pihole pod making them “read only”. I work on this stuff as a job and I still don’t find a good explanation about wth went wrong. I have to be honest longhorn github issues are not very helpful nor there are good logs about it. I am starting to think there are too many microservices working behind the scene and too many gRPCs. tl;dr: it is hell to debug - I still don’t find a good alternative however

      • notfromhere@lemmy.one
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Can you elaborate on what you mean when you say the fs gets broken? What are the symptoms, what have you tried, and what are any errors or log messages?

        • justpassingbyOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Eh, I will have to find my notes on the issue with the pihole, I can see if I can dig them out this weekend and send it to you (I wonder if you can send PM in Lemmy ^^).

          To stay on the point of this discussion: just, and I am not joking, this afternoon I got hit by this: https://longhorn.io/kb/troubleshooting-volume-with-multipath/ The pod (in this case wireguard) was crashing because it could not mount the drive and the error was something like “already mounted or mount point busy”. I had to dig and dig but I found out the problem was the one above and I fixed it. I will now add that setting in my ansible and configure all three the PIs. However this should not happen for a mature-ish system like longhorn which may cater a userbase which may not know enough to dig into /dev . I think there should be a better way to alert the users for such an issue. Just to be clear, longorn UI and logs were nice and dandy, all good on the western front, but all was broken. Longorn reconciler could have a check that is something should be mounted, and is not, and the error is “already mounted”, but is not “already mounted”, check for known bugs. However I think the issue is what I said above. It is too fragmented and working with a miriad of other microservices, so longhorn is like “I gave the order, now whatever”. I will share what is in my longhorn-system ns, there is no secret in here but I want to give an idea (ps: I do nothing fancy with longhorn at home - obvs some are ds so you see 3 pods because I have 3 nodes):

          k get pods -n longhorn-system | cut -d' ' -f1
          NAME
          engine-image-ei-f9e7c473-5pdjx
          engine-image-ei-f9e7c473-xq4hn
          instance-manager-e-fa08a5ebf4663f1e9fb894f865362d65
          engine-image-ei-f9e7c473-gdp6n
          instance-manager-e-567b6ba176274fe20a001eec63ce3564
          instance-manager-r-567b6ba176274fe20a001eec63ce3564
          instance-manager-r-b1d285dd9205d1ba992836073c48db8a
          instance-manager-e-b1d285dd9205d1ba992836073c48db8a
          daily-keep-for-a-week-28144800-pppw8
          longhorn-manager-xqwld
          longhorn-ui-f574474c8-n847h
          longhorn-manager-cgqvm
          longhorn-driver-deployer-6c7bd5bd9b-8skh4
          longhorn-manager-tjzvz
          instance-manager-d3c9343a8637e4ef197ad6da68b3ed2d
          instance-manager-cf746b18d51f6426b74d6c6652f01afc
          engine-image-ei-d911131c-wwfwz
          engine-image-ei-d911131c-qcn26
          instance-manager-e7d92f3ca0455cde2158bebdbb33ea16
          engine-image-ei-d911131c-mgb2k
          csi-attacher-785fd6545b-bn9lp
          csi-attacher-785fd6545b-4nfxz
          csi-provisioner-8658f9bd9c-2bq7v
          csi-provisioner-8658f9bd9c-q6ctq
          csi-attacher-785fd6545b-rx7r9
          csi-resizer-68c4c75bf5-tmw2f
          csi-resizer-68c4c75bf5-n9dxm
          csi-snapshotter-7c466dd68f-7r2x6
          csi-snapshotter-7c466dd68f-cd8pm
          longhorn-csi-plugin-vgqh5
          longhorn-csi-plugin-mnskk
          csi-provisioner-8658f9bd9c-kcb8f
          csi-resizer-68c4c75bf5-gccfg
          csi-snapshotter-7c466dd68f-wsltq
          longhorn-csi-plugin-9q9kj
          

          Dependency on the csi-* ecosystem sort of allows the errors to get lost in translation.

          • notfromhere@lemmy.one
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 year ago

            I definitely dont have the expertise or experience digging around device tree, nor the inclination 😅. I’m really not sure what I even want the distributed FS for. I guess I wanted to have redundancy on the pod long term storage, but I have other ways to achieve that. I’ve ran Docker containers in production and in the lab but I’m trying to dip my toes into the k3s waters and see what I can do.

            • justpassingbyOP
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 year ago

              Please do not use my bad experience stop you! Longhorn is a nice tool and as you can read online and in other posts it works very well. I may have unlucky, have a bad configuration or had my PIs under too much pressure. Who knows! My advice is try something new: k3s, longhorn, etc. That is what I use the PIs for. I would not use Longhorn at work :D

              I’m really not sure what I even want the distributed FS for. I guess I wanted to have redundancy on the pod long term storage, but I have other ways to achieve that.

              I am not using replicas :) I use longhorn for the clean/integrated backup mechanism instead of using something external. Maybe one day when I have the same-ish disk speed on all 3 PIs I will enable replicas but for now I am good like this.

              For backups of important stuff maybe use something else or ALSO something else. I was personally thinking to use another backup too for longhorn devices like https://github.com/backube/volsync or velero to have a secondary source in case something happen. Also longhorn is always getting better. This is just out of the press https://github.com/longhorn/longhorn/releases/tag/v1.5.0

              My advice? Try it out! If not, it will still be a source of learning and fun (but I am strange, I like to debug stuff).

              • notfromhere@lemmy.one
                link
                fedilink
                English
                arrow-up
                1
                ·
                1 year ago

                I’m having enough trouble trying to get k3s running. I keep getting apiserver unavailable, connection refused, etc. kubectl works about 1/10th of the times I call it. I loaded the OS on external HDD to see if that was the problem but no dice.

                • justpassingbyOP
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  1 year ago

                  Sorry to hear. If you had NO connection with kubectl, I would have adviced you to check the ports; but if sometimes it replies and most of the time not, it must be something else. Good luck with the debug and if you have any specific problem you could also try to create a post on any of the self-hosted communities here on lemmy. From my experience people is more friendly and more technical than what we used to have on reddit.