The original post: /r/datahoarder by /u/alexlazar98 on 2025-03-14 10:37:22.
I want to start a personal project where I scan, OCR and index markdown for old books. This is a book with ALL of Romania’s roads back in 1974. It has tables and maps and all sorts of other interesting historical data points.
I already have some idea of data engineering. I’m a software engineer and I’ve made a project that helps with RAG, search and indexing of markdown files (even very big ones). My problem is the OCR part. Any tips?
You must log in or register to comment.