Ok so I want to download this embedded PDF/document in order to physically print it. The website allows me to view it as much as I want, but is asking me to fork over 25 + tax USD so i can download the document.

Obviously, i don’t want to do that, so I try to download the embedded document via inspect element. But, the weird thing is it not actually loading a pdf, but like really small pictures of each page:

So, my question is basically how can I download this document in order to print it?

PdF link: https://www.sbcaplanroom.com/jobs/2477/plans/goleta-sanitary-district-biosolids-and-energy-phase-1-project/?preview=200647

  • Андрей Быдло
    link
    English
    22
    edit-2
    8 months ago

    FOR LINUX, COMPLETE AND WORKING

    1. Install xdotool, AutoKey
    2. In Firefox get Save Screenshot: https://addons.mozilla.org/en-US/firefox/addon/savescreenshot/ Then, in Firefox Shortcuts add Ctrl+1 as a hotkey to capture visible page.
    3. Create a script for Autokey in Python, mine is:
    import time
    import os   
    import subprocess
    
    pages = dialog.input_dialog(title='', message='Number of pages:', default='5').data
    
    time.sleep(1)
    for k in range(1,int(pages)):
        subprocess.run(["xdotool", "key", "ctrl+1"]) # Plugin's hotkey
        time.sleep(2)
        subprocess.run(["xdotool", "click", "1"]) # Mouseclick
        time.sleep(2)
    
    subprocess.run(["xdotool", "key", "ctrl+1"]) # to screenshot the last one
    
    1. In the bottom of a program, set a hotkey to launch it (I set it to Home).
    2. Open OP’s page and via Inspect Element find the link to embed. It’s https://www.sbcaplanroom.com/preview/2477/12610/200647
    3. Press F11, make the whole picture fit.
    4. Place mouse pointer over next page button, so it clicks each time.
    5. Lauch my Autokey script via Home button.
    6. Enter number of pages.
    7. See how it does it.
    8. Open screenshots directory in XnView, select them. Locate it’s BatchConvert tool, in Actions tab select a crop action and adjust it to pages’ margins. ACHTUNG The last one should be done differently, you can open it in XnV and crop this one alone.
    9. Use any tool to stitch them back together into a PDF. I’ve used PDF Arranger: https://github.com/pdfarranger/pdfarranger But some user down there said it crashed on 600-something pages document.

    Result: https://files.catbox.moe/iivoga.pdf

  • Андрей Быдло
    link
    English
    17
    edit-2
    8 months ago

    OP, I did it: https://files.catbox.moe/6eofj6.pdf

    I will edit my reply with linux specifics.

    My link was updated with a slightly better PDF. Comparison on max zoom: https://files.catbox.moe/5q3v4b.png A person with a 4k display could make better, but that’s what my screen is capable of.

    Either way, it was a fun puzzle for my entry knowledge of linux\python\macroses and I feel I’ll use this method a couple of times myself. Hope someone would make use of it.

  • Norah - She/They
    link
    fedilink
    English
    108 months ago

    Okay so, PDF documents are actually already “a collection of images” basically. This website is clearly trying to make it an extra step harder by loading the images individually as you browse the document. You could manually save/download all the images and use a tool to turn it back into a pdf. I haven’t heard of a tool that does this automatically, but it should be possible for a web scraper to make the GET requests sequentially then stitch the pdf back together.

    • @[email protected]
      link
      fedilink
      English
      58 months ago

      I would go this route as well. As a developer this sounds easy enough. It you don’t get vertical sequences of images, but instead a grid of images, then I would apply traditional image stitching techniques. There are tons of libraries for that on github.

  • @[email protected]
    link
    fedilink
    English
    8
    edit-2
    8 months ago

    Imagemagick can convert a series of images to single PDF: “convert page*.png mydoc.pdf”

    • Daniel
      link
      fedilink
      English
      28 months ago

      I thought the convert command didn’t do this, and that it was the magick one?

  • @[email protected]
    link
    fedilink
    English
    48 months ago

    I’ve run into this before on archive.org, incredibly annoying.

    I believe there are utilities that can capture and join together JPEGs into a PDF, but it seems they purposefully uploaded a very low res version to prevent that.

    Hate to say, but I don’t see a way around it.

  • @[email protected]
    link
    fedilink
    English
    38 months ago

    You could write a script to scroll through the document at defined intervals, take screenshots, then have the script edit them together.

    Of course by then, the time you’d have spent would be worth more than $25

    • @[email protected]
      link
      fedilink
      English
      13
      edit-2
      8 months ago

      Of course by then, the time you’d have spent would be worth more than $25

      Yes, but you’d now have a script that can be used in the future as well. Automation is a magical thing, my friend.

  • @[email protected]
    link
    fedilink
    English
    18 months ago

    Think you’ll have to take screen shots. That’s a pretty good way of stopping you downloading it.

    Or just email and ask them for a copy. It’s not Harry Potter or anything, there’s no reason it shouldn’t be free if you ask the right person.