Malin Freeborn
ba8026e0c3
input examples are now given as ```bash input $ARG1 ``` While outputs use md's '> ' sign as a quote.
428 B
428 B
title | tags | ||||
---|---|---|---|---|---|
pdf to txt |
|
How to translate pdf book images to text (results are very poor, and will need lots of corrections).
Dependencies
Search for 'tesseract english' (or whatever language).
Arch: tesseract-data-eng and poppler-utils
Script
pdftoppm -png *file*.pdf test
for x in \*png; do
tesseract -l eng "$x" - >> *out*.txt
done