Is anyone aware of an existing project that can do something like this:

  • Access an RSS feed.
  • Parse the contents of the items in the feed, and fetch linked images.
  • Take the new feed elements and add them to previously fetched elements.
  • Store all of the content in a merged RSS/XML file, or something like a SQLite DB.

Context: I’d like to archive Mastodon posts of an account automatically. I’d prefer it to be a script/binary I could run on Linux as I’d likely throw it in a GitHub action and save the resulting output in the git repo.

I could probably whip something together but I’m lazy and I’d prefer to use something that already exists.

  • Paradox
    link
    fedilink
    English
    4
    edit-2
    1 year ago

    https://github.com/mreid/feed-bag

    Not sure if it does all you want, but the basics are there, and it wouldn’t be beyond the pale to make something like this do what you want. The code is pretty clean

    • @bogoOP
      link
      English
      21 year ago

      Thanks. This has potential and would force me to finally learn Ruby if I want to tweak it.

      • Paradox
        link
        fedilink
        English
        21 year ago

        Best way to learn is to dive in and try to accomplish something you want to do

  • @[email protected]
    link
    fedilink
    4
    edit-2
    1 year ago

    I don’t know of a project that does this, but if I was to tackle it I would convert the RSS to the Activity Streams standard - https://www.w3.org/TR/activitystreams-core/.

    Activity Streams are basically the new RSS and it’s a lot better than RSS.

    Mastodon is built on Activity Pub, which is built on Activity Streams - so you shouldn’t even need to touch RSS at all. The AS already exists. You can access it via the API.

    Under European laws all services are required to give you a copy of all data associated with your account if you ask for it. And Mastodon being a European product is of course fully compliant. Just go to your profile and hit the “Request your Archive” button. You could do that once a month or something.

    • @bogoOP
      link
      11 year ago

      Yes, the “Request Archive” method may be the “don’t over engineer this stupid” option I go with.

  • @[email protected]
    link
    fedilink
    41 year ago

    No but I have an indirect answer (a method?) for you. There are many open source projects that do this type of work. For example, newsblur. Maybe you can find a few of these projects in the language you want to use and see how they’re handling it. I would expect the to be done common libraries used between them.

  • @[email protected]
    link
    fedilink
    211 months ago

    I use miniflux, and you can configure it to modify feed items. As far as I know it does not purge anything by default.

    Really, pulling an RSS feed and parsing it, storing stuff is probably 50 lines of bash, and less in a general purpose scripting language.