接前一篇文章:
本文内容参考:
《趣谈Linux操作系统》 —— 刘超,极客时间
《QEMU/KVM》源码解析与应用 —— 李强,机械工业出版社
QEMU内存管理模型
浅谈QEMU Memory Region 与 Address Space
特此致谢!
QEMU内存初始化
1. 基本结构
上一回对于QEMU中与内存相关的第二个数据结构MemoryRegion结构进行了深入讲解。本回讲解第三个数据结构RAMBlock。
(3)RAMBlock
内存管理最基础的一部分自然是物理Memory内存,然后还包括MMIO空间、IO端口的地址空间。RAMBlock结构表示的是内存条,一个RAMBlock对应虚拟机中的一个内存条。RAMBlock结构的定义在include/qemu/typedefs.h中,代码如下:
typedef struct RAMBlock RAMBlock;
struct RAMBlock的定义在include/exec/ramblock.h中,代码如下:
struct RAMBlock {struct rcu_head rcu;struct MemoryRegion *mr;uint8_t *host;uint8_t *colo_cache; /* For colo, VM's ram cache */ram_addr_t offset;ram_addr_t used_length;ram_addr_t max_length;void (*resized)(const char*, uint64_t length, void *host);uint32_t flags;/* Protected by iothread lock. */char idstr[256];/* RCU-enabled, writes protected by the ramlist lock */QLIST_ENTRY(RAMBlock) next;QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;int fd;uint64_t fd_offset;size_t page_size;/* dirty bitmap used during migration */unsigned long *bmap;/* bitmap of already received pages in postcopy */unsigned long *receivedmap;/** bitmap to track already cleared dirty bitmap. When the bit is* set, it means the corresponding memory chunk needs a log-clear.* Set this up to non-NULL to enable the capability to postpone* and split clearing of dirty bitmap on the remote node (e.g.,* KVM). The bitmap will be set only when doing global sync.** It is only used during src side of ram migration, and it is* protected by the global ram_state.bitmap_mutex.** NOTE: this bitmap is different comparing to the other bitmaps* in that one bit can represent multiple guest pages (which is* decided by the `clear_bmap_shift' variable below). On* destination side, this should always be NULL, and the variable* `clear_bmap_shift' is meaningless.*/unsigned long *clear_bmap;uint8_t clear_bmap_shift;/** RAM block length that corresponds to the used_length on the migration* source (after RAM block sizes were synchronized). Especially, after* starting to run the guest, used_length and postcopy_length can differ.* Used to register/unregister uffd handlers and as the size of the received* bitmap. Receiving any page beyond this length will bail out, as it* could not have been valid on the source.*/ram_addr_t postcopy_length;
};
上边已提到,RAMBlock结构表示的是虚拟机中的内存条,一个RAMBlock对应虚拟机中的一个内存条。RAMBlock里面记录了该内存条的一些基本信息,如所属的mr(struct MemoryRegion *mr)、如果有文件作为后端,该文件对应的fd(int fd)、系统的页面大小page_size(size_t page_size)、已经使用的大小used_length(ram_addr_t used_length)、该内存条在虚拟机整个内存中的偏移offset(ram_addr_t offset)等。
每个MemoryRegion里都包含一个RAMBlock的指针,但不一定会对应一个RAMBlock。对于物理内存,则其实体MemoryRegion会指向一个实体RAMBlock。回顾一下MemoryRegion结构中的RAMBlock的相关成员,在include/exec/memory.h中,如下:
/** MemoryRegion:** A struct representing a memory region.*/
struct MemoryRegion {Object parent_obj;/* private: *//* The following fields should fit in a cache line */bool romd_mode;bool ram;bool subpage;bool readonly; /* For RAM regions */bool nonvolatile;bool rom_device;bool flush_coalesced_mmio;uint8_t dirty_log_mask;bool is_iommu;RAMBlock *ram_block;……
};
- RAMBlock *ram_block
ram_block表示实际分配的物理内存。
回到struct RAMBlock的定义。其中的主线逻辑变量offset(ram_addr_t offset)(GPA)和host(uint8_t *host)(HVA)。还有bmap(unsigned long *bmap)和receivedmap(unsigned long *receivedmap)是热迁移存储脏页使用。
此外,所有的RAMBlock会通过next(QLIST_ENTRY(RAMBlock) next)域连接到一个链表中,链表头是ram_list.blocks全局变量。
这里顺便提一下struct RAMBlock中ram_addr_t类型的定义。ram_addr_t的定义在include/exec/cpu-common.h中,代码如下:
/* address in the RAM (different from a physical address) */
#if defined(CONFIG_XEN_BACKEND)
typedef uint64_t ram_addr_t;
# define RAM_ADDR_MAX UINT64_MAX
# define RAM_ADDR_FMT "%" PRIx64
#else
typedef uintptr_t ram_addr_t;
# define RAM_ADDR_MAX UINTPTR_MAX
# define RAM_ADDR_FMT "%" PRIxPTR
#endif
uint64_t和uintptr_t都在中定义,分别如下:
#if __riscv_xlen == 64
typedef long s64;
typedef unsigned long u64;
typedef long int64_t;
typedef unsigned long uint64_t;
#define PRILX "016lx"
#elif __riscv_xlen == 32
typedef long long s64;
typedef unsigned long long u64;
typedef long long int64_t;
typedef unsigned long long uint64_t;
#define PRILX "08lx"
#else
#error "Unexpected __riscv_xlen"
#endif
typedef unsigned long uintptr_t;
至此,QEMU中与内存相关的三个基本结构struct AddrSpace、struct MemoryRegion、struct RAMBlock就讲解完了。
再来回顾和复习一下这三个基本数据结构:
- AddressSpace(struct AddressSpace)
AddressSpace结构用来表示一个虚拟机或者虚拟CPU能够访问的所有物理地址。struct AddressSpace的定义在include/exec/memory.h中,如下:
/*** struct AddressSpace: describes a mapping of addresses to #MemoryRegion objects*/
struct AddressSpace {/* private: */struct rcu_head rcu;char *name;MemoryRegion *root;/* Accessed via RCU. */struct FlatView *current_map;int ioeventfd_nb;struct MemoryRegionIoeventfd *ioeventfds;QTAILQ_HEAD(, MemoryListener) listeners;QTAILQ_ENTRY(AddressSpace) address_spaces_link;
};
- MemoryRegion(struct MemoryRegion)
MemoryRegion表示的是虚拟机中的一段内存区域。MemoryRegion是内存模拟中的核心结构,整个内存的模拟都是通过MemoryRegion构成的无环图完成的。图的叶子节点是实际分配给虚拟机的物理内存或者MMIO,中间节点则表示内存总线,内存控制是其它MemoryRegion的别名。
struct MemoryRegion的定义也在include/exec/memory.h中,代码如下:
/** MemoryRegion:** A struct representing a memory region.*/
struct MemoryRegion {Object parent_obj;/* private: *//* The following fields should fit in a cache line */bool romd_mode;bool ram;bool subpage;bool readonly; /* For RAM regions */bool nonvolatile;bool rom_device;bool flush_coalesced_mmio;uint8_t dirty_log_mask;bool is_iommu;RAMBlock *ram_block;Object *owner;/* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath */DeviceState *dev;const MemoryRegionOps *ops;void *opaque;MemoryRegion *container;int mapped_via_alias; /* Mapped via an alias, container might be NULL */Int128 size;hwaddr addr;void (*destructor)(MemoryRegion *mr);uint64_t align;bool terminates;bool ram_device;bool enabled;bool warning_printed; /* For reservations */uint8_t vga_logging_count;MemoryRegion *alias;hwaddr alias_offset;int32_t priority;QTAILQ_HEAD(, MemoryRegion) subregions;QTAILQ_ENTRY(MemoryRegion) subregions_link;QTAILQ_HEAD(, CoalescedMemoryRange) coalesced;const char *name;unsigned ioeventfd_nb;MemoryRegionIoeventfd *ioeventfds;RamDiscardManager *rdm; /* Only for RAM *//* For devices designed to perform re-entrant IO into their own IO MRs */bool disable_reentrancy_guard;
};
- RAMBlock(struct RAMBlock)
RAMBlock结构表示的是虚拟机中的内存条,一个RAMBlock对应虚拟机中的一个内存条。RAMBlock里面记录了该内存条的一些基本信息。struct RAMBlock的定义在include/exec/ramblock.h中,如下:
struct RAMBlock {struct rcu_head rcu;struct MemoryRegion *mr;uint8_t *host;uint8_t *colo_cache; /* For colo, VM's ram cache */ram_addr_t offset;ram_addr_t used_length;ram_addr_t max_length;void (*resized)(const char*, uint64_t length, void *host);uint32_t flags;/* Protected by iothread lock. */char idstr[256];/* RCU-enabled, writes protected by the ramlist lock */QLIST_ENTRY(RAMBlock) next;QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;int fd;uint64_t fd_offset;size_t page_size;/* dirty bitmap used during migration */unsigned long *bmap;/* bitmap of already received pages in postcopy */unsigned long *receivedmap;/** bitmap to track already cleared dirty bitmap. When the bit is* set, it means the corresponding memory chunk needs a log-clear.* Set this up to non-NULL to enable the capability to postpone* and split clearing of dirty bitmap on the remote node (e.g.,* KVM). The bitmap will be set only when doing global sync.** It is only used during src side of ram migration, and it is* protected by the global ram_state.bitmap_mutex.** NOTE: this bitmap is different comparing to the other bitmaps* in that one bit can represent multiple guest pages (which is* decided by the `clear_bmap_shift' variable below). On* destination side, this should always be NULL, and the variable* `clear_bmap_shift' is meaningless.*/unsigned long *clear_bmap;uint8_t clear_bmap_shift;/** RAM block length that corresponds to the used_length on the migration* source (after RAM block sizes were synchronized). Especially, after* starting to run the guest, used_length and postcopy_length can differ.* Used to register/unregister uffd handlers and as the size of the received* bitmap. Receiving any page beyond this length will bail out, as it* could not have been valid on the source.*/ram_addr_t postcopy_length;
};
基础已经打好,下一回开始讲解其中更为详细的内容。