作者:fyupeng
技术专栏:☞ https://github.com/fyupeng
项目地址:☞ https://github.com/fyupeng/distributed-blog-system-api
留给读者
咱们又见面了,本期带给大家什么,请往下看,绝对是干货!
一、介绍
提供 PDF
文件二进制参数,返回删除空白页的PDF
文件二进制。
二、代码
引入依赖:
<dependency><groupId>org.apache.pdfbox</groupId><artifactId>pdfbox</artifactId><version>2.0.21</version>
</dependency>
代码:
public static void main(String[] args) throws IOException {File file = new File("d:/hztzs.pdf");byte[] bytes = new byte[(int) file.length()];FileInputStream fis = new FileInputStream(file);fis.read(bytes);bytes = new ArchivElecFileService().removeEmptyPages(bytes);File newfile = new File("d:/out.pdf");FileOutputStream fos = new FileOutputStream(newfile);fos.write(bytes);}public byte[] removeEmptyPages(byte[] fileBytes) throws IOException {// Load the PDF documentPDDocument document = PDDocument.load(fileBytes);// Iterate through each pageint pageCount = document.getNumberOfPages();for (int i = pageCount - 1; i >= 0; i--) {// Extract text from the pagePDFTextStripper stripper = new PDFTextStripper();stripper.setStartPage(i + 1); // Page indexes are 1-based in PDFTextStripperstripper.setEndPage(i + 1);String text = stripper.getText(document);// Check if the page is emptyif (text.trim().isEmpty()) {// Remove the pagedocument.removePage(i);}}// 保存结果文件ByteArrayOutputStream outputStream = new ByteArrayOutputStream();document.save(outputStream);return outputStream.toByteArray();}
三、总结
易用、高效、轻便!