So, I’m selfhosting immich, the issue is we tend to take a lot of pictures of the same scene/thing to later pick the best, and well, we can have 5~10 photos which are basically duplicates but not quite.
Some duplicate finding programs put those images at 95% or more similarity.

I’m wondering if there’s any way, probably at file system level, for the same images to be compressed together.
Maybe deduplication?
Have any of you guys handled a similar situation?

  • @[email protected]
    link
    fedilink
    English
    -514 days ago

    The problem is that OP is asking for something to automatically make decisions for him. Computers don’t make decisions, they follow instructions.

    If you have 10 similar images and want a script to delete 9 you don’t want, then how would it know what to delete and keep?

    If it doesn’t matter, or if you’ve already chosen the one out of the set you want, just go delete the rest. Easy.

    As far as identifying similar images, this is high school level programming at best with a CV model. You just run a pass through something with Yolo or whatever and have it output similarities in confidence of a set of images. The problem is you need a source image to compare it to. If you’re running through thousands of files comprising dozens or hundreds of sets of similar images, you need a source for comparison.

    • @[email protected]
      link
      fedilink
      English
      514 days ago

      OP didn’t want to delete anything, but to compress them all, exploiting the fact they’re similar to gain efficiency.

        • @WhyJiffie
          link
          English
          013 days ago

          No, not really.

          The problem is that OP is asking for something to automatically make decisions for him. Computers don’t make decisions, they follow instructions.

          The computer is not asked to make decisions like “pick the best image”. The computer is asked to optimize, like with lossless compression.

            • @WhyJiffie
              link
              English
              112 days ago

              yes, they are. reread the post, I just did so and I’m still confident

    • @[email protected]
      link
      fedilink
      English
      113 days ago

      computers make decisions all the time. For example, how to route my packets from my instance to your instance. Classification functions are well understood in computer science in general, and, while stochastic, can be constructed to be arbitrarily precise.

      https://en.wikipedia.org/wiki/Probably_approximately_correct_learning?wprov=sfla1

      Human facial detection has been at 99% accuracy since the 90s and OPs task I’d likely a lot easier since we can exploit time and location proximity data and know in advance that 10 pictures taken of Alice or Bob at one single party are probably a lot less variant than 10 pictures taken in different contexts over many years.

      What OP is asking to do isn’t at all impossible-- I’m just not sure you’ll save any money on power and GPU time compared to buying another HDD.