lk/data/pdf-to-txt.md

447 B

title tags
pdf to txt
Documentation
data
pdf
ocr

How to translate pdf book images to text (results are very poor, and will need lots of corrections).

Dependencies

Search for 'tesseract english' (or whatever language).

Arch: tesseract-data-eng and poppler-utils

Script

pdftoppm -png file.pdf test

for x in *png; do tesseract -l eng "$x" - >> out.txt done