本文主要介绍AIGC算法,readLink实现url识别pdf、html标题和内容
一、设计思路
- 识别url是pdf或者网页
- 网页处理逻辑,使用
cheerio
解析网页 - PDF处理逻辑,使用
pdf-parse
解析PDF文件 - 自定义的函数来提取标题和内容
二、可执行核心代码
const express = require("express");
const axios = require("axios");
const ytSearch = require("yt-search");
const cheerio = require("cheerio");const { PDFDocument } = require("pdf-lib");
const pdfParser = require("pdf-parse");const app = express();
const port = 3000;app.get("/read-link", async (req, res) => {const url = req.query.url;if (!url) {return res.status(400).send("URL is required");}try {const response = await axios.get(url, { responseType: "arraybuffer" });const contentType = response.headers["conte