本文讨论FIL_PAGE_INDEX页的可回收垃圾记录(Garbage/Deleted Records),当我们删除某一条记录(delete from …)时,通常InnoDB并不会在物理存储上进行完全删除,而是在记录上置一个删除标志位,我们称这些行记录为垃圾记录,删除标志位位于对应记录的Extra Bytes中。与正常的记录(User Records)类似,InnoDB在页内也有一个单向链表将可回收垃圾记录串在一起,用户记录是从Infimum Record开始,到Supremum Record结束, 而可回收垃圾记录则是从"FIL_PAGE_INDEX"页中的"INDEX Header"中的PAGE_FREE (First Garbage Record Offset) 开始,到最后一条垃圾数据结束(next_offset=0,指向自己)。记录遍历的方式都是一样的。
sakila数据库是MySQL官方的案例库,这里介绍一下获取方法,打开链接: https://dev.mysql.com/doc/index-other.html,然后找到"Example Databases"章节,即可获取下载和相关文档链接。
上代码,解析Garbage Records:
/*
在数据库中执行删除操作:
root@localhost [sakila]> set foreign_key_checks = 0; --> 因为sakila有外键约束, 为了简化删除, 先在会话中禁用外键检查;
Query OK, 0 rows affected (0.00 sec)root@localhost [sakila]> delete from sakila.film where film_id in (646, 656, 666); --> 删除3条记录;
Query OK, 3 rows affected (0.01 sec)
*/public class IdxPage5 {public static void main(String[] args) throws Exception {String fileName = "D:\\Data\\mysql\\8.0.18\\data\\sakila\\film.ibd";try (IbdFileParser parser = new IbdFileParser(fileName)) {// 通过上文"InnoDB文件物理结构解析4"可知,page(14)包含film_id=(565, 668)之间的数据;Page page = parser.getPage(14);// System.out.println(ParserHelper.hexDump(page.getPageRaw()));ClusteredKeyLeafPage leafPage = new ClusteredKeyLeafPage(page.getPageRaw(), page.getPageSize());List<ClusteredKeyLeafRecord> garbageRecords = leafPage.getUserRecords(IdxPage3.getFilmTableMeta());StringBuilder buff = new StringBuilder();for (ClusteredKeyLeafRecord record : garbageRecords) {List<RecordField> fields = record.getRecordFields();buff.append("\n ==> Extra: deleted = ").append(record.getDeletedFlag()).append("; next offset = ").append(record.getNextRecordOffset()).append(" <==\n");for (RecordField field : fields) {buff.append(String.format("%20s", field.getName())).append(": ").append(field.getContent()).append("\n");}}System.out.println(buff);}}
}/*
程序执行结果:==> Extra: deleted = true; next offset = -1432 <==film_id: 666DB_TRX_ID: 00000000843cDB_ROLL_PTR: 01000001321526title: PAYCHECK WAITdescription: A Awe-Inspiring Reflection of a Boy And a Man who must Discover a Moose in The Sahara Desertrelease_year: 2006language_id: 1
original_language_id: nullrental_duration: 4rental_rate: 4.99length: 145replacement_cost: 27.99rating: PG-13special_features: [Commentaries, Deleted Scenes]last_update: 2006-02-15T05:03:42==> Extra: deleted = true; next offset = -1489 <==film_id: 656DB_TRX_ID: 00000000843cDB_ROLL_PTR: 010000013214c7title: PAPI NECKLACEdescription: A Fanciful Display of a Car And a Monkey who must Escape a Squirrel in Ancient Japanrelease_year: 2006language_id: 1
original_language_id: nullrental_duration: 3rental_rate: 0.99length: 128replacement_cost: 9.99rating: PGspecial_features: [Trailers, Deleted Scenes, Behind the Scenes]last_update: 2006-02-15T05:03:42==> Extra: deleted = true; next offset = 0 <==film_id: 646DB_TRX_ID: 00000000843cDB_ROLL_PTR: 01000001321466title: OUTBREAK DIVINEdescription: A Unbelieveable Yarn of a Database Administrator And a Woman who must Succumb a A Shark in A U-Boatrelease_year: 2006language_id: 1
original_language_id: nullrental_duration: 6rental_rate: 0.99length: 169replacement_cost: 12.99rating: NC-17special_features: [Trailers, Deleted Scenes, Behind the Scenes]last_update: 2006-02-15T05:03:42
*/
程序输出和我们预想的一样,记录虽然被delete语句删除了,但是数据还是保留在页内的。只是Extra Bytes的delete flag被置为true,最后一条被删除的记录指向的不是"Supremum Record",而是自己(next offset = 0)。
案例中获取删除数据用到了ClusteredKeyLeafPage的getGarbageRecords()方法,与获取普通用户记录的getUserRecords()方法的唯一不同是遍历记录的开始位置不同:
public class ClusteredKeyLeafPage {public List<ClusteredKeyLeafRecord> getUserRecords(TableMeta tableMeta) {int pos = getSystemRecords().getInfimumNextRecordPos(); return iterateRecordInPage(tableMeta, pos);}public List<ClusteredKeyLeafRecord> getGarbageRecords(TableMeta tableMeta) {int pos = getIndexHeader().getFirstGarbageOffset();return iterateRecordInPage(tableMeta, pos);}private List<ClusteredKeyLeafRecord> iterateRecordInPage(TableMeta tableMeta, int firstRecordPos) {//...// 遍历结束的条件if (recCount > maxRecs || nextOffset == 0 || nextRecord == SUPREMUM_EXTRA_END_POS) {break;}//}
}
更多测试情况:
-
如果执行的是"truncate sakila.film",该方法是无效的,因为整个ibd文件的存储空间会被"重置"(文件会变小,没有page(14)),全表删除(“delete from sakila.film”)通常不会,但也有特例,当一个表的数据量非常小(索引深度小于1),所有的行都在一个(Leaf) Page时,观察到全表删除和truncate一样,整个页的记录数据会被清掉(置为00),可以通过hexdump确认。
-
在删除整个页内的记录时,记录虽然不会被清掉,但观察到会有部分删除记录在User Record链表内的情况。
最后,介绍一个通过hexdump命令查看某个页内容的方法:
# 假设我们要看page(4),Page的大小为16KB(16384字节);
# 那么page(4)的起始位置为 4 * 16384=65536,读取长度为16384;
# 所以命令hexdump的命令为:
[think@TP-T470 sakila]$ hexdump --skip 65536 --length 16384 -C -v film2.ibd