My old man has a bunch of .dox stuff saved. He has complicated large files saved that are not supported by any of the FOSS conversion tools. I’ve tried Libre office, Abi Word, and every command line tool and converter I can find. These are entire book sized files.

I have a W10 machine with Word. Is extracting the .exe and running it with wine feasible without making an epic mess or massive project of this?

  • j4k3@lemmy.worldOP
    link
    fedilink
    English
    arrow-up
    4
    ·
    3 months ago

    Yes .docx.

    It appears as though the encoding is missing in such a way that nothing in Linux recognizes the file. The underlying CLI tools don’t have a way of converting the file. I tried with Python’s docx tool and with iconv. It has to be encoding related because some tools initially load the file with several sets of Asian characters instead of English. However, there is no hexadecimal or sections of entirely binary looking data. Archiving tools do not open up the the file to reveal anything else like a metafile or header. Neo vim shows garbled nonsense throughout. Bat warns of binary. Python won’t load the file, nor will Only Office. Libre Office and Abi Word load initially with Asian characters before crashing.

    The only option is likely gong to be setting up the W10 machine and converting a bunch of files within it.

    Ultimately, my old man thinks he can be an author all of the sudden and is trying to write. He’s not very capable of learning. I’m not confident that he can learn to use FOSS to do the same thing he has been doing. This post was just to see if there are options I am not already aware of that might actually work in practice. I can easily do everything I need in FOSS. I can do everything he needs to do. I’m more concerned about becoming his tech support when he forgets how to copy pasta. He already fails to separate the internet hardware connectivity from the web browser and operating system within his mental model of technology.

    • flubba86@lemmy.world
      link
      fedilink
      arrow-up
      9
      ·
      3 months ago

      Sounds like it’s actually a .doc file that has been renamed to a .docx for some reason. Real MS Word would probably still open it fine, but open source tools would fall over hard.

      You mentioned you can’t decompress it either. If it was a real .docx you could rename the extension to .zip and unzip it with any archiver to see the contents. If the archiver complains about the format, then it’s not a real docx.

      • nyan
        link
        fedilink
        arrow-up
        4
        ·
        3 months ago

        If it really is a .doc file and written in an ASCII-compatible encoding as most English-language documents are, opening it in a hex editor (or a non-codepage-aware text editor like the Notepad on a W10 or earlier Windows machine) will show an indecipherable proprietary header followed by the text in the file, possibly with a single space or “junk” character between each letter depending on the exact version of Word and system encoding it was written with. There may be occasional additional stretches of markup junk. At the end, there will be a footer with occasional decipherable text strings like “MSWordDoc” and font names.

        If you open a .docx file in such a program, you should get a typical zipfile signature: the letters “PK” at the beginning of the file, followed by a lot of gobbledegook. If you don’t get that “PK”, it probably isn’t a .docx.

        (I’ve looked at a lot of MS file guts, for both curiosity and information extraction purposes.)

    • thayer@lemmy.ca
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      3 months ago

      Thanks for clarifying, and I can appreciate your overall concerns as I face the same dilemma with my aging relatives.

      Just to confirm, have you opened these files in Word yourself (or witnessed them being opened), to verify they are in fact valid documents? if valid, are they meant to be in English?

      It wouldn’t be the first time I’ve seen “other” files renamed with an incorrect file extension.

    • MonkderVierte@lemmy.ml
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      3 months ago

      Sure it’s not .doc? Earlier .docx were rather more standard compliant than new ones. .doc is the old proprietary MS Word format, while .docx is to the OOXML standard (though with all the proprietary extensions, making the standard useless).