lk/data/pdf-to-txt.md
Malin Freeborn ba8026e0c3
change formatting
input examples are now given as

```bash
input $ARG1
```

While outputs use md's '> ' sign as a quote.
2023-06-17 21:28:20 +02:00

428 B

title tags
pdf to txt
Documentation
data
pdf
ocr

How to translate pdf book images to text (results are very poor, and will need lots of corrections).

Dependencies

Search for 'tesseract english' (or whatever language).

Arch: tesseract-data-eng and poppler-utils

Script

pdftoppm -png *file*.pdf test
for x in \*png; do
    tesseract -l eng  "$x" - >> *out*.txt
done