我是.NET开发人员,因此代码在C#中.但是你应该能够轻松翻译以下内容.
iText是一个PDF优先的库,[X] HTML解析非常复杂,因此在这方面并不完整.每当解析[X] HTML并且事情不按预期的方式进行特定标记时,您应遵循的基本步骤是:
>验证XML Worker支持标记:Tags class.
>如果支持标记(在本例中为true),请查看默认实现.这里由the HorizontalRule class处理.但是,我们看到不支持您的用例,所以一种方法是使用该代码作为蓝图. (如下所示)您还可以继承特定的标记类并覆盖End()方法as done here.无论哪种方式,您所做的只是实现自定义标记处理器.
>如果不支持该标记,则需要通过继承AbstractTagProcessor来滚动自己的自定义标记处理器.
无论如何,这是一个让你入门的简单例子.首先,自定义标签处理器:
public class CustomHorizontalRule : AbstractTagProcessor
{
public override IList Start(IWorkerContext ctx, Tag tag)
{
IList result;
LineSeparator lineSeparator;
var cssUtil = CssUtils.GetInstance();
try
{
IList list = new List();
HtmlPipelineContext htmlPipelineContext = this.GetHtmlPipelineContext(ctx);
Paragraph paragraph = new Paragraph();
IDictionary css = tag.CSS;
float baseValue = 12f;
if (css.ContainsKey("font-size"))
{
baseValue = cssUtil.ParsePxInCmMmPcToPt(css["font-size"]);
}
string text;
css.TryGetValue("margin-top", out text);
if (text == null) text = "0.5em";
string text2;
css.TryGetValue("margin-bottom", out text2);
if (text2 == null) text2 = "0.5em";
string border;
css.TryGetValue(CSS.Property.BORDER_BOTTOM_STYLE, out border);
lineSeparator = border != null && border == "dotted"
? new DottedLineSeparator()
: new LineSeparator();
var element = (LineSeparator)this.GetCssAppliers().Apply(
lineSeparator, tag, htmlPipelineContext
);
string color;
css.TryGetValue(CSS.Property.BORDER_BOTTOM_COLOR, out color);
if (color != null)
{
// WebColors deprecated, but docs don't state replacement
element.LineColor = WebColors.GetRGBColor(color);
}
paragraph.SpacingBefore += cssUtil.ParseValueToPt(text, baseValue);
paragraph.SpacingAfter += cssUtil.ParseValueToPt(text2, baseValue);
paragraph.Leading = 0f;
paragraph.Add(element);
list.Add(paragraph);
result = list;
}
catch (NoCustomContextException cause)
{
throw new RuntimeWorkerException(
LocaleMessages.GetInstance().GetMessage("customcontext.404"),
cause
);
}
return result;
}
}
大多数代码直接取自现有的源代码,但CSS.Property.BORDER_BOTTOM_STYLE和CSS.Property.BORDER_BOTTOM_COLOR的检查除外,如果它们在< hr>中内联,则设置边框样式和颜色.样式属性.
然后将上面的自定义标记处理器添加到XML Worker TagProcessorFactory:
using (var stream = new FileStream(OUTPUT_FILE, FileMode.Create))
{
using (var document = new Document())
{
var writer = PdfWriter.GetInstance(document, stream);
document.Open();
var tagProcessorFactory = Tags.GetHtmlTagProcessorFactory();
// custom tag processor above
tagProcessorFactory.AddProcessor(
new CustomHorizontalRule(),
new string[] { HTML.Tag.HR }
);
var htmlPipelineContext = new HtmlPipelineContext(null);
htmlPipelineContext.SetTagFactory(tagProcessorFactory);
var pdfWriterPipeline = new PdfWriterPipeline(document, writer);
var htmlPipeline = new HtmlPipeline(htmlPipelineContext, pdfWriterPipeline);
var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);
var cssResolverPipeline = new CssResolverPipeline(
cssResolver, htmlPipeline
);
var worker = new XMLWorker(cssResolverPipeline, true);
var parser = new XMLParser(worker);
var xHtml = "
";
using (var stringReader = new StringReader(xHtml))
{
parser.Parse(stringReader);
}
}
}
有一点需要注意的是,即使我们使用的是速记边框内联样式,iText的CSS解析器似乎也会在内部设置所有样式.即,您可以使用四种速记样式中的任何一种来检查 – 我恰好使用了CSS.Property.BORDER_BOTTOM_STYLE和CSS.Property.BORDER_BOTTOM_COLOR.
结果PDF: