For the curious on how it works (not mentioned in the readme), it uses pymupdf and a precise mapping of all information in area coordinates, as such the document layout is hard coded.
When layout changes this breaks but layout changes on this sort of documents do not happen often (I think). Also code is very clean and it serms straightforward to fix.
This kind of code is maybe something that can be generated from an LLM/agent? (It would be easy to write checks)
Besides the practical value for those who might need it, I think it is possibly interesting for others to look at this approach.
When layout changes this breaks but layout changes on this sort of documents do not happen often (I think). Also code is very clean and it serms straightforward to fix.
This kind of code is maybe something that can be generated from an LLM/agent? (It would be easy to write checks)
Besides the practical value for those who might need it, I think it is possibly interesting for others to look at this approach.
Neat project, thanks for sharing!