Does anyone know of any off the shelf tool (online or offline) to find duplicates in several DNS blocklists and merge them into one?
Context: I am running AdGuard on one GL.iNet router with ~10 blocklists some of them pretty huge and most of the times the lists are updated the router comes to one halt while doing so, having to often times reboot it through the old power-off-and-on.
I would rather download the lists myself from time to time and merge them into one file but with duplicates extracted somehow.
If I’m understanding you correctly, you could make use of a shell script for this. Use WGET to download lists, then combine them into a single large file, and finally create a new file with no duplicates by using “awk ‘!visited[$0]++’”
wget URL1 URL2 URL3
cat *.txt > all.txt (This overwrites all.txt) awk ‘!visited[$0]++’ all.txt > no_duplicates.txtWhen no tool is available bash to the rescue, thank you for this it seems actually simpler then I thought :)
Isnt there a tool developed by the AdGuard team to handle exactly this?
Just looked through my files, look into this tool, it does exactly what you want: https://github.com/AdguardTeam/HostlistCompiler
Thank you this looks promising
I’ve used this for similar tasks before. Might need a few steps if the formats vary to get them all together but it should be possible.
This is very helpful thank you :)
Afaik pihole does parse and then merge the lists into a single block list.
Update: Nevermind. They do it by design (assuming this statement is still correct): https://github.com/pi-hole/pi-hole/issues/2013#issuecomment-817901839
What you could do is use any text editor and manually combine the text files with something like notepad++ and deduplicate from there. (Notepad++ can do it natively)