The article shows a few simple examples which use python to edit PDF.
Copy And Encrypt PDF
Just copy an existing PDF and generate a new file which is encrypted
import PyPDF2
fileName = "/Users/weiyang/Desktop/Test.pdf"
newFileName = "/Users/weiyang/Desktop/NewTest.pdf"
file = open( fileName, 'rb' )
reader = PyPDF2.PdfFileReader( file )
writer = PyPDF2.PdfFileWriter()
for pageIndex in range( reader.numPages ):
writer.addPage( reader.getPage( pageIndex ) )
writer.encrypt( 'bell' ) #passwd
newFile = open( newFileName, "wb" )
writer.write( newFile )
newFile.close()
file.close()
If you want to just encrypt the origin PDF, import the module OS and add os.rename( newFileName, fileName )
in the end at the above code snippet.
The PdfFileReader object can decrypt one encrypted PDF.
Extract The String Content From PDF
I use the PDF file in the last example for a test. PyPDF2 can help us to extract only the text string.
The extracted content is not perfect, I miss a line string.
import PyPDF2
import os
fileName = "/Users/weiyang/Desktop/Test.pdf"
file = open( fileName, 'rb' )
reader = PyPDF2.PdfFileReader( file )
page = reader.getPage( 0 )
content = page.extractText()
print content
file.close()
Combine Two Different PDF Files
Sometimes we want to combine different PDFs to only one file, this is easy to get done if we use PyPDF2.
In the following example, I combine two files Test1.pdf and Test2.pdf to a new file NewTest.pdf.
import PyPDF2
fileName1 = "/Users/weiyang/Desktop/Test1.pdf"
fileName2 = "/Users/weiyang/Desktop/Test2.pdf"
newFileName = "/Users/weiyang/Desktop/NewTest.pdf"
file1 = open( fileName1, 'rb' )
file2 = open( fileName2, 'rb' )
reader1 = PyPDF2.PdfFileReader( file1 )
reader2 = PyPDF2.PdfFileReader( file2 )
writer = PyPDF2.PdfFileWriter()
for pageIndex in range( reader1.numPages ):
writer.addPage( reader1.getPage( pageIndex ) )
for pageIndex in range( reader2.numPages ):
writer.addPage( reader2.getPage( pageIndex ) )
newFile = open( newFileName, "wb" )
writer.write( newFile )
newFile.close()
file1.close()
file2.close()
Add Watermark For PDF
I created a watermark PDF file by Microsoft office tool, then use PyPDF2 to add it on every page of NewTest.pdf.
import PyPDF2
fileName = "/Users/weiyang/Desktop/NewTest.pdf"
fileName2 = "/Users/weiyang/Desktop/WaterMark.pdf"
fileName3 = "/Users/weiyang/Desktop/Result.pdf"
file = open( fileName, 'rb' )
reader = PyPDF2.PdfFileReader( file )
waterMarkReader = PyPDF2.PdfFileReader( open( fileName2, "rb" ) )
writer = PyPDF2.PdfFileWriter()
for pageIndex in range( reader.numPages ):
pageObj = reader.getPage( pageIndex )
pageObj.mergePage( waterMarkReader.getPage( 0 ) )
writer.addPage( pageObj )
resultFile = open( fileName3, "wb" )
writer.write( resultFile )
resultFile.close()
file.close()
Result: