Why is OCR for handwritten content still that bad?

hinterlufer@lemmy.world · edit-2 3 months ago

Why is OCR for handwritten content still that bad?

litchralee · edit-2 3 months ago

My German is non-existent, but it seems to me that those two references can agree with this form for the lowercase d:

lowercase d handwriting

Of course, your second reference shows an initial stroke towards the top of the circle, but the rest of the stroke is one motion where the ascender double-backs on itself, completing the circle in a counterclockwise move that also starts the ascender. That is to say, the circle and ascender are naturally attached.

I could find only one reference which explicitly starts a new stroke for the ascender after completing the circle, but this example is from cursive, not from standard form:

cursive d with separate ascender stroke

If I had to guess, the impetus for not doubling back is to prevent the ascender from becoming messy, since writing over the same part of the page can cause smudging. And perhaps in hurried writing, this form lends itself to detaching the circle from the ascender. But I personally draw my cursive d with the ascender more akin to how cursive l is drawn, with a looping ascender, which preserves the attachment:

cursive l with looping ascender stroke

There is no ambiguity in cursive doing it this way, and for standard form, it saves a lift from the paper.

Seeing as drawing the d with its circle separated from the ascender requires a lift, and also becomes ambiguous from an O and an L, I’m not entirely sure how that form would be clearer to read. Context of the language means there’s usually no issue of confusion between a D or OL, but that doesn’t necessarily mean the drawn form is clear to read, which is going to mess up any OCR system prior to performing spell checking.

But some pathologal examples might include “olay” vs “day” vs “0 day”.

hinterlufer@lemmy.world · 3 months ago

That’s all very interesting. I might even consider re-learning the d (and the b for that matter).