Recently I was wandering if there is someone or some group preserving , collecting , organizing and publishing all the knowledge of mankind ever created throughout its existence so that if ever mankind faces the 6th mass extinction we don’t have to reinvent the wheel and can have a kick start to our new post apocalyptic civilization .

  • JohnDClay
    link
    fedilink
    arrow-up
    6
    arrow-down
    1
    ·
    edit-2
    1 year ago

    It’s never going to be all knowledge, since a lot of stuff is just lost or never recorded. A ton of stuff (like this thread) are probably low on the priority list for recording as well. But the closest you’d probably get to a full catalog of human knowledge (at last text based) are the huge data sets of nearly all text data on the internet used for training LLMs. I wouldn’t be surprised if there are ones soon that include video and pictures as well, since newer AI models are starting to be able to interpret those too.

    I believe this is one of those data sets: https://github.com/yaodongC/awesome-instruction-dataset

    Edit: here’s a big data set used for a lot of gpt3 https://commoncrawl.org/