I think POI is using too much memory! What can I do?This one comes up quite a lot, but often the reason isn't what you might initially think. So, the first thing to check is - what's the source of the problem? Your file? Your code? Your environment? Or Apache POI?
Next, use the example program ToCSV to try reading the a file in with HSSF or XSSF. Related is XLSX2CSV, which uses SAX parsing for .xlsx
我们看到了POI已经提出了针对这个问题的回答,而且正好在XLS2CSV中找到了很好的解决读取大容量的方法.
在XLS2CSV中,即使是读取300M的2007excel文件也没有什么问题,占用的内存也是可以接受的。在这段代码中的注释中写出了下面一段话:
*Data sheets are read using a SAX parser to keep thememory footprint relatively small, so this should be
able to read enormous workbooks. The styles table and
the shared-string table must be kept in memory. The
standard POI styles table class is used, but a custom
(read-only) class is used for the shared string table
because the standard POI SharedStringsTable grows
very quickly with the number of unique strings.
大概的意思就是说通用的处理方法会快速消耗掉内存,所以采用了 SAX parser方法来处理大容量的文件.public void processSheet(
StylesTable styles,
ReadOnlySharedStringsTable strings,
InputStream sheetInputStream)
throws IOException, ParserConfigurationException, SAXException {
InputSource sheetSource = new InputSource(sheetInputStream);
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader sheetParser = saxParser.getXMLReader();
ContentHandler handler = new MyXSSFSheetHandler(styles, strings, this.minColumns, this.output);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
}上面的一段代码是比较核心的代码,通过SAXParser解析xml文件,尽量不占用内存资源,从而达到读取大文件的目的。