python自动翻译pdf
Modules used:
使用的模块:
In this script, we will use PyPDF2 module which will provide us various functions such as to extract the data and read the pdf file and split the file and write a new file.
在此脚本中,我们将使用PyPDF2模块,该模块将为我们提供各种功能,例如提取数据,读取pdf文件,拆分文件并写入新文件。
Download PyPDF2:
下载PyPDF2:
General Way: pip install PyPDF2
通用方式:pip安装PyPDF2
Pycharm Users: Go to the python project interpreter and install it from there.
Pycharm用户:转到python项目解释器并从那里安装它。
Various function provided by PyPDF2:
PyPDF2提供的各种功能:
PyPDF2.PdfFileReader(): This function will read our pdf and return us a data value that we will store in a variable (Let's take as Pdf_Data).
PyPDF2.PdfFileReader() :此函数将读取我们的pdf并返回一个将存储在变量中的数据值(以Pdf_Data为例)。
Pdf_Data.isEncrypted: This Function will help us to check if the pdf file is Encrypted.
Pdf_Data.isEncrypted :此功能将帮助我们检查pdf文件是否已加密。
Pdf_Data.decrypt("<password>"): This function will help us to decrypt the pdf file and inside this function, we have to put the password and our pdf file will get decrypted.
Pdf_Data.decrypt(“ <password>”) :此函数将帮助我们解密pdf文件,并且在此函数内部,我们必须输入密码,然后pdf文件将被解密。
Pdf_Data.numPages: This Function will return us the number of pages our pdf contain.
Pdf_Data.numPages :此函数将向我们返回pdf包含的页面数。
Pdf_Data.getPage(0): This function will return us the data on the first page, here 0 seems to be the first page and 1 to be the second page, the things will go like indexing in python.
Pdf_Data.getPage(0) :此函数将返回第一页上的数据,这里0似乎是第一页,而1则是第二页,事情就像在python中建立索引一样。
Pdf_Writer=PyPDF2.PdfFileWriter(): This function will create a variable that will help us to create a new pdf file.
Pdf_Writer = PyPDF2.PdfFileWriter() :此函数将创建一个变量,该变量将帮助我们创建新的pdf文件。
Pdf_Writer.addPage(<The Page Data>): This function will add the pdf page to the newly created pdf file.
Pdf_Writer.addPage(<页面数据>) :此函数会将pdf页面添加到新创建的pdf文件中。
Note: The text Extraction can be done only with the pdf files which have text.
注意:只有具有text的pdf文件才能进行文本提取。
Python代码读取文件并提取文本 (Python code to read the file and extract the text)
# import the modules
import PyPDF2
# open the file and read the content
# open the file
Pdf_Open=open("/home/abhinav/Downloads/CS_Defination-converted.pdf","rb")
# read the file and store the content
Pdf_Data=PyPDF2.PdfFileReader(Pdf_Open)
# get the number of pages
print(Pdf_Data.numPages)
# Lets extract the data for the first page
# we will use getPage command to get the page
# using 0 for 1st page
First_page=Pdf_Data.getPage(0)
# printing the text
print(First_page.extractText())
Output:
输出:
This is the extracted text from the pdf that we have given in input. In this way, we can extract the text from the pdf.
这是我们在输入中从pdf中提取的文本。 这样,我们可以从pdf中提取文本。
Now we will create a pdf file and we will add the starting and the last page of the above-used pdf in that file.
现在我们将创建一个pdf文件 ,并将上面使用的pdf的开始和最后一页添加到该文件中。
Let's see the code,
让我们看一下代码,
# import the modules
import PyPDF2
# open the file and read the content
# open the file
Pdf_Open=open("/home/abhinav/Downloads/Abhinav_Gangrade.pdf","rb")
# read the file and store the content
Pdf_Data=PyPDF2.PdfFileReader(Pdf_Open)
# get the number of pages
print(Pdf_Data.numPages)
# Create a pdf writer
pdf_writer=PyPDF2.PdfFileWriter()
# we will take the first page of the above pdf
first_page=Pdf_Data.getPage(0)
# we will take the last page of the above pdf
# as the last page will be Total number of pages-1
last_page=Pdf_Data.getPage((Pdf_Data.numPages)-1)
# adding page to the new pdf
pdf_writer.addPage(first_page)
pdf_writer.addPage(last_page)
# create a blank file
New_pdf=open("/home/abhinav/Downloads/Hello.pdf","wb")
# add the content to the blank file
pdf_writer.write(New_pdf)
# Now close the file
From the above code, we can create a new pdf with the help of an existing pdf, and after that, we have taken the first and last page of the existing pdf and combine them and wrote it in the new pdf. In that way, we can create a pdf with the help of existing pdfs.
从上面的代码中,我们可以在现有pdf的帮助下创建一个新pdf,然后,我们将现有pdf的第一页和最后一页进行合并,并将它们写入新pdf中。 这样,我们可以在现有pdf的帮助下创建pdf。
翻译自: https://www.includehelp.com/python/automating-pdfs.aspx
python自动翻译pdf