Improving PyPDF2 with PDFtk
PyPDF2 (forked from pyPdf) is wonderful. I use it a fair bit in my job, mainly for chopping up PDFs and re-assembling the pages in a different order. It does sometimes have difficulty with non-standard PDFs though that seem fine in other programs. This can be frustrating.
The one that I’ve been battling with today from some PDFs provided by a client was:
PyPDF2.utils.PdfReadError: EOF marker not found
I managed to find a workaround using PDFtk to fix the PDF in memory at the first sign of any trouble. It works well so far, so in case anyone else is having similar issues I thought I’d write it up.
more ...