高级数据结构设计
Blocks
EXT4 以块
为单位分配存储空间。块是一组扇区介于 1KiB 和 64KiB 之间,扇区数量必须为 2 的整数幂。块又被分组成更大的单元,称为块组
。默认情况下,文件系统可以包含 2^32 个块,启用 64 bits feature,则文件系统可以有 2^64 个块。
Layout
标准块组的布局:
Group 0 Padding | ext4 Super Block | Group Descriptors | Reserved GDT Blocks | Data Block Bitmap | inode Bitmap | inode Table | Data Blocks |
---|---|---|---|---|---|---|---|
1024 bytes | 1 block | many blocks | many blocks | 1 block | 1 block | many blocks | many more blocks |
对于块组 0 的特殊情况,前 1024 个字节未使用, 允许安装 x86 引导扇区。 超级块将从偏移量 1024 字节开始。如果块大小 = 1024,则块 0 标记为使用中,超级块就处于块 1。
Flexible Block Groups
ext4 开始,flex_bg 中几个块组被捆绑在一起作为一个 逻辑块组,bitmap 空间和 inode Table 空间 flex_bg 的第一个块组被扩展为包括位图以及 flex_bg 中所有其他块组的 inode 表。
Meta Block Groups
Special inodes
ext4 为特殊功能预留了一些 inode,如下所示:
inode 编号 | 目的 |
---|---|
0 | 没有 inode 0。 |
1 | 有缺陷的块的列表。 |
2 | Root directory |
3 | User quota |
4 | Group quota |
5 | Boot loader |
6 | Undelete directory |
7 | Reserved group descriptors inode(调整 Inode) |
8 | Journal inode |
9 | exclude inode,用于快照 |
10 | Replica inode |
11 | lost+found |
Checksums
Inline Data
- 文件小于 60 字节,数据就会内嵌存储在 inode.i_block 中。
- 文件的其余部分可以容纳在扩展属性空间内,那么就可以在 inode 主体(“ibody EA”)中找到扩展属性 “system.data”。 当然,这也限制了一个 inode 可以附加的扩展属性数量。
- 数据大小超过 i_block + ibody EA,就会分配一个常规块,并将内容移至该块。
Inline Directories
i_block 的前四个字节是父目录的 inode 编号。 之后是 56 字节的目录条目数组空间;参见 struct ext4_dir_entry。 如果 inode 主体中有 "system.data "属性,则 EA 值也是 struct ext4_dir_entry 数组。 请注意,对于内联目录,i_block 和 EA 空间被视为独立的 dirent 块;目录条目不能跨越这两个块。 内联目录条目不进行校验和,因为 inode 校验和应保护所有内联数据内容。
全局数据结构
Super Block
Offset | Size | Name | Description |
---|---|---|---|
0x0 | __le32 | s_inodes_count | 节点总数 |
0x4 | __le32 | s_blocks_count_lo | 区块总数 |
0x8 | __le32 | s_r_blocks_count_lo | This number of blocks can only be allocated by the super-user. |
0xC | __le32 | s_free_blocks_count_lo | 空闲块计数 |
0x10 | __le32 | s_free_inodes_count | 空闲 inode 数量 |
0x14 | __le32 | s_first_data_block | 第一个数据块, 对于 1k 块的文件系统,该值必须至少为 1,对于所有其他块大小,该值通常为 0 |
0x18 | __le32 | s_log_block_size | 块大小为 2 ^ (10 + s_log_block_size)。 |
0x1C | __le32 | s_log_cluster_size | Cluster size is 2 ^ (10 + s_log_cluster_size) blocks if bigalloc is enabled. Otherwise s_log_cluster_size must equal s_log_block_size. |
0x20 | __le32 | s_blocks_per_group | 每组块数 |
0x24 | __le32 | s_clusters_per_group | Clusters per group, if bigalloc is enabled. Otherwise s_clusters_per_group must equal s_blocks_per_group. |
0x28 | __le32 | s_inodes_per_group | 每个组的 Inodes 数 |
0x2C | __le32 | s_mtime | Mount time, in seconds since the epoch. |
0x30 | __le32 | s_wtime | Write time, in seconds since the epoch. |
0x34 | __le16 | s_mnt_count | Number of mounts since the last fsck. |
0x36 | __le16 | s_max_mnt_count | Number of mounts beyond which a fsck is needed. |
0x38 | __le16 | s_magic | 标识签名, 0xEF53 |
0x3A | __le16 | s_state | File system state. See super_state for more info. |
0x3C | __le16 | s_errors | Behaviour when detecting errors. See super_errors for more info. |
0x3E | __le16 | s_minor_rev_level | Minor revision level. |
0x40 | __le32 | s_lastcheck | Time of last check, in seconds since the epoch. |
0x44 | __le32 | s_checkinterval | Maximum time between checks, in seconds. |
0x48 | __le32 | s_creator_os | Creator OS. See the table super_creator for more info. |
0x4C | __le32 | s_rev_level | Revision level. See the table super_revision for more info. |
0x50 | __le16 | s_def_resuid | Default uid for reserved blocks. |
0x52 | __le16 | s_def_resgid | Default gid for reserved blocks. |
这些字段仅适用于 EXT4_DYNAMIC_REV 超级块。 | |||
0x54 | __le32 | s_first_ino | First non-reserved inode. |
0x58 | __le16 | s_inode_size | inode 结构的大小(字节) |
0x5A | __le16 | s_block_group_nr | Block group # of this superblock. |
0x5C | __le32 | s_feature_compat | Compatible feature set flags. Kernel can still read/write this fs even if it doesn’t understand a flag; fsck should not do that. See the super_compat table for more info. |
0x60 | __le32 | s_feature_incompat | Incompatible feature set. If the kernel or fsck doesn’t understand one of these bits, it should stop. See the super_incompat table for more info. |
0x64 | __le32 | s_feature_ro_compat | Readonly-compatible feature set. If the kernel doesn’t understand one of these bits, it can still mount read-only. See the super_rocompat table for more info. |
0x68 | __u8 | s_uuid[16] | 128-bit UUID for volume. |
0x78 | char | s_volume_name[16] | 卷标 |
0x88 | char | s_last_mounted[64] | Directory where filesystem was last mounted. |
0xC8 | __le32 | s_algorithm_usage_bitmap | For compression (Not used in e2fsprogs/Linux) |
Performance hints. Directory preallocation should only happen if the EXT4_FEATURE_COMPAT_DIR_PREALLOC flag is on. | |||
0xCC | __u8 | s_prealloc_blocks | #. of blocks to try to preallocate for … files? (Not used in e2fsprogs/Linux) |
0xCD | __u8 | s_prealloc_dir_blocks | #. of blocks to preallocate for directories. (Not used in e2fsprogs/Linux) |
0xCE | __le16 | s_reserved_gdt_blocks | Number of reserved GDT entries for future filesystem expansion. |
Journalling support is valid only if EXT4_FEATURE_COMPAT_HAS_JOURNAL is set. | |||
0xD0 | __u8 | s_journal_uuid[16] | UUID of journal superblock |
0xE0 | __le32 | s_journal_inum | inode number of journal file. |
0xE4 | __le32 | s_journal_dev | Device number of journal file, if the external journal feature flag is set. |
0xE8 | __le32 | s_last_orphan | Start of list of orphaned inodes to delete. |
0xEC | __le32 | s_hash_seed[4] | HTREE hash seed. |
0xFC | __u8 | s_def_hash_version | Default hash algorithm to use for directory hashes. See super_def_hash for more info. |
0xFD | __u8 | s_jnl_backup_type | If this value is 0 or EXT3_JNL_BACKUP_BLOCKS (1), then the s_jnl_blocks field contains a duplicate copy of the inode’s i_block[] array and i_size . |
0xFE | __le16 | s_desc_size | Size of group descriptors, in bytes, if the 64bit incompat feature flag is set. |
0x100 | __le32 | s_default_mount_opts | Default mount options. See the super_mountopts table for more info. |
0x104 | __le32 | s_first_meta_bg | First metablock block group, if the meta_bg feature is enabled. |
0x108 | __le32 | s_mkfs_time | When the filesystem was created, in seconds since the epoch. |
0x10C | __le32 | s_jnl_blocks[17] |