2019独角兽企业重金招聘Python工程师标准>>>
Problem
You have a HTML document that contains relative URLs, which you need to resolve to absolute URLs.
Solution
Make sure you specify a
base URI
when parsing the document (which is implicit when loading from a URL), andUse the
abs:
attribute prefix to resolve an absolute URL from an attribute:
Document doc = Jsoup.connect("http://jsoup.org").get();Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"
Description
In HTML elements, URLs are often written relative to the document's location:<a href="/download">...</a>
. When you use the Node.attr(String key)
method to get a href attribute, it will be returned as it is specified in the source HTML.
If you want to get an absolute URL, there is a attribute key prefix abs:
that will cause the attribute value to be resolved against the document's base URI (original location):attr("abs:href")
For this use case, it is important to specify the base URI when parsing the document.
If you don't want to use the abs:
prefix, there is also a method Node.absUrl(String key)
which does the same thing, but accesses via the natural attribute key.