As far as I understand, it takes a PDF that contains only image data (e.g. a scan of a price of paper) and uses OCR to recognize the text, then overlays the text on top of the image in the output PDF.
It would allow you to take a physically scanned document and create a PDF with selectable text you could copy+paste, search over, etc.
It calls a series of open source tools that result in producing a pdf with text embedded behind an image overlay, where the image overlay is the original pdf. It was a while ago where I really looked into this but to name a few:
ImageMagick to convert the pages to images
Tesseract-ocr by Google to transcribe the text in the images, which puts it’s output into singular pdf files
Pdfunite to stitch together the pdfs back into a whole file
I’m sure I’m missing a few, iirc it can call a tool that straightens the pages as well.
EDIT: Messed around and remembered the stuff:
where a.pdf is a 2 page PDF:
>convert a.pdf a.png
makes a-0.png and a-1.png
OCR's each image:
>for x in {0..1} ; do tesseract a-$x.png a_ocr-$x PDF ; done ;