Murdo Maclachlan@lemmy.world

Murdo Maclachlan@lemmy.world

The following is an FAQ for why I transcribe and questions I have been asked here or was often asked on the other site. It’s adapted from an FAQ I posted over there, but with site-specific details removed. I may add more questions to it in the future.

1. Why do you do transcriptions?

Transcriptions help improve the accessibility of posts. Lemmy doesn’t, at the moment, provide a native way to add alt-text to images, so transcriptions are an attempt to fill that space. The following is a (non-exhaustive) list of some of the ways transcriptions improve accessibility:

They help blind or otherwise visually-impaired people who rely on screen readers, technology that reads out what’s on the screen. That technology can’t read the text in an image or video, and obviously it cannot describe non-textual images at all.
Audio transcriptions are necessary for deaf or otherwise hearing-impaired people.
They help people who have trouble reading small, blurry or oddly formatted text.
In some cases, they may be helpful for people with colour deficiencies, if there is low contrast between text and background colours.
They help people with bad internet connections, who as a result may not be able to load the image at high quality or at all.
They can provide context or note small details that people missed when first viewing the post, potentially aiding their understanding and/or appreciation of it.
They are useful for search engine indexing and the preservation of images, videos or audio that may at some point get deleted.
They provide data for improving OCR (Optical Character Recognition) technology. See below for reasons as to why OCR isn’t yet adequate.

2. Why don’t you just use OCR or AI?

OCR (Optical Character Recognition) is technology that detects and transcribes text in an image. However, it is currently infeasible for three simpel reasons:

It can, and does, easily get a lot wrong. It’s most accurate on simple images of plain text, such as screenshots from social media posts, but even there will have errors from time to time. Since this is an accessibility service, as close to 100% accuracy as possible is required. OCR’s work simply isn’t reliable enough for that yet.
Even were OCR able to 100%-accurately describe the text, there are certain parts of posts I don’t always transcribed if they are not considered relevant (this beingderived from r/TranscribersOfReddit’s original guidelines, created with the aid of moderators or r/Blind), and certain parts should be placed in specific markdwon formatting and so on. Sometimes things that aren’t normally relevant become relevant depending on the context of the post. Working out what is and isn’t relevant isn’t possible for computers right now.
Finally, for posts without text, or where a large portion of the post is not text, OCR is useless. Other AI such as ChatGPT can sometimes describe these, but here is where it’s important to understand what these types of AI, that is LLMs (Large Language Models), actually are. They’re generative. You give them a prompt and they generate a statistically likely response. It doesn’t matter to the LLM whether the response is correct or contains errors or complete nonsense, and it doesn’t, and can’t, know if it does. This will always be the case because that’s what LLMs are: for this reason, AI is not remotely suitable for transcriptions.

FAQ: Why Transcribe?

FAQ: Why Transcribe?

1. Why do you do transcriptions?

2. Why don’t you just use OCR or AI?