linux中的memory management和page mapping

1 首先要说的最简单的是在一个process在运行的时候,它看到的内存是这个样子的。3G以后是给kernel使用的运行和动态分配的内存的空间,注意因为是process所看到的,下面全部都是虚拟地址空间。
如下:

 2 然后需要说的是Linux Physical Memory Layout
下面这段话解释了为什么linux不能占用所有的Ram内存:

Why isn't the kernel loaded starting with the first available megabyte of RAM? Well, the PC architecture has
several peculiarities that must be taken into account.

For example:
1 Page frame 0 is used by BIOS to store the system hardware configuration detected during the
Power-On Self-Test(POST); the BIOS of many laptops, moreover, writes data on this page frame
even after the system is initialized.

2 Physical addresses ranging from 0x000a0000 to 0x000fffff are usually reserved to BIOS
routines and to map the internal memory of ISA graphics cards. This area is the well-known hole from
640 KB to 1 MB in all IBM-compatible PCs: the physical addresses exist but they are reserved, and
the corresponding page frames cannot be used by the operating system.

3 Additional page frames within the first megabyte may be reserved by specific computer models. For
example, the IBM ThinkPad maps the 0xa0 page frame into the 0x9f one.

所以总之一句话:前1M的内存存储了BIOS和其他一些硬件信息。所以Linux代码物理开始地址在1M处。

在不考虑virtual address也就是不考虑使用page table的时候,kernel的物理占用如下图所示:


图中各个段的含义都已经很明确了。kernel物理内存 [_text  _end].
具体的值可以不用细扣,因为不同的架构上,不同的内核编译后可能位置和大小可能有偏差。
比如我的 linux-2.6.38.8版本的内核编译后产生的System.map文件中_text 和 _end的地址为:

  • 0xc0400000 --- _text
  • 0xc0cc5000 --- _end

首先说明这是内核使用page table之后的虚拟内存的地址。
图中_text在虚拟内存中:起始于3G + 偏移量4M。
_end在虚拟内存中:起始于3G + 偏移量超过12M。

这说明我用的内核编译后比上图中的内核要大一些。

3 Kernel Page Tables
因为Kernel加载完初始完后,就会进入保护模式,所以在往下走之前需要了解保护模式,并且了解Linux的Page Table的使用,如下可以是Linux的页表的形式,每个Process和Kernel都有一个Page Table:

然后Process和Kernel的Page Table的关系是怎么样的呢?请看这句引用:

1 The kernel maintains a set of page tables for its own use, rooted at a so-called master kernel Page Global Directory.

2 After system initialization, this set of page tables is never directly used by any process or kernel thread;

3 rather, the highest entries of the master kernel Page Global Directory are the reference model for the corresponding entries of the Page Global Directories of every regular process in the system.

将3这句话复制出来加以强调:

the highest entries of the master kernel Page Global Directory are 
the reference model for the corresponding entries of the Page Global Directories of
every regular process in the system.

--------------------------
4 加入页表后,具体我们分为两部分来讲,
第1:Kernel Page Table中各映射了些什么东西?第2:Kernel是如何完成这些映射的?

第1:Kernel Page Table中各映射了些什么东西?就是Kernel在运行的时候使用的Page Table。

依次介绍下:

  • Physical memory mapping ---- 这一块是最基本的内存映射,
    • 先假设内存在0-896M(1G - 128M)之间,那么在初始化的时候,0x0 - 896M(physical address) ----(3G + 0x0) - (3G + 896M)[Linear address]了。Kernel的function variable地址在编译的时候就确定好了为3G以后的Virtual address.因此Kernel是假设自己有1G的虚拟内存可以使用的,页不够就swap【swap比较复杂,先假设自己知道,也可以先假设内存足够】。
    • 如果RAM实际大小大于896M,那么在访问高地址的时候,动态的remap【section later will discuss it】。
  • Fix-mapped linear addresses. ---- 只是知道这一块可以被映射到任何的内存,【不是太清楚用途,先放一放】
  • Persistent kernel mappings ----- Starting from PKMAP_BASE we find the linear addresses used for the persistent kernel mapping of high-memory page frames.
  • vmalloc area ----- Linux provides a mechanism via vmalloc() where non-contiguous physically memory can be used that is contiguous in virtual memory.【见下面non-contiguous memory allocation.】

 ------------------------------------------------------------------------------------------------
 ------------------------------------------------------------------------------------------------

Kernel Mappings of High-Memory page Frames
我想利用这个dynamic kernel-mapping来理解,linear address与physical address 的对应关系的,以及内核是如何keep track of physical page frame including low-memory and high memory.

1 直接用一段话来说明Kernel Mapping存在的必要性。

1 Where to store map page table(其实上图中有)
The linear address that corresponds to the end of the directly mapped physical memory, and thus to the
beginning of the high memory, is stored in the high_memory variable, which is set to 896 MB.

2 Page frames above the 896 MB boundary are not generally mapped in the fourth gigabyte of
the kernel linear address spaces, so the kernel is unable to directly access them.

3 This implies that each page allocator function that returns the linear address of the
assigned page frame doesn't work for high-memory page frames, that is, for page frames in
the ZONE_HIGHMEM memory zone

所以说low-memory本来就被映射了,所以不需要remap。high-memory因为没有被page table映射,所以需要在用到的时候动态的申请remap。

2 第一种方法:Permanent kernel mappings(如上图的persistent kernel mappings位置)
用于映射的基本变量和数据结构:

  • pkmap_page_table ------- stores the address of this Page Table
  • LAST_PKMAP ------  macro yields the number of Page Table entries.
  • pkmap_count ------ array in kernel 原型为:int pkmap_count[LAST_PKMAP].
    The pkmap_count array includes LAST_PKMAP counters, one for each entry of the pkmap_page_table Page Table
    用于记录counter。
    1  The counter is 0
    The corresponding Page Table entry does not map any high-memory page frame and is usable.

    2 The counter is 1
    The corresponding Page Table entry map any high-memory page frame, but it cannot be
    used because the corresponding TLB entry has not been flushed since its last usage.
    表明这个线性地址被映射过了,可是现在还没有模块使用它,它属于闲置资源,如果暂时资源不够就对这种资源进行回收。

    3 The counter is n (greater than 1)
    The corresponding Page Table entry maps a high-memory page frame, which is used by exactly n - 1
    kernel components.


  • page_address_htable ----- This table contains one page_address_map data structure for each page frame in high memory that is currently mapped.
  • page_address_map ----- prototype 如下:
    struct page_address_map {
    struct page *page;
    void *virtual;
    struct list_head list;
    };
  • page_address( ) function ----- returns the linear address associated with the page frame, or NULL if the
    page frame is in high memory and is not mapped.
  • struct page ----- State information of a page frame is kept in a page descriptor of type page. All page descriptors are stored in the mem_map array.即是说physical address中的每一个page frame在内核的初始化数据中都有对应的一个struct page数据结构。kernel就是通过对这些struct page类型的page descriptor调度和存储信息的。就像进程的基本信息都存放在struct task中一样。还有下面这句话,所以说struct page是物理上的RAM的每一个page在kernel中的数据结构的代表:
    The kernel must keep track of the current status of each page frame. For instance, it must
    be able to distinguish the page frames that are used to contain pages that belong to
    processes from those that contain kernel code or kernel data structures. Similarly, it must
    be able to determine whether a page frame in dynamic memory is free. A page frame in
    dynamic memory is free if it does not contain any useful data. It is not free when the page
    frame contains data of a User Mode process, data of a software cache, dynamically
    allocated kernel data structures, buffered data of a device driver, code of a kernel module,
    and so on

 首先要说明的是kernel对page的引用是这样的:
假设Kernel当前正在操作一个struct page,那么当他想得到这个page的线性地址也就是虚拟地址的时候,调用page_address(page)返回它的线性地址。当然如果它是low_memory或者它是high_memory并且已经被映射。
如:_ _va((unsigned long)(page  -  mem_map)  <<  12) ------ low memory这样得到线性地址。

下面的伪代码主要是解释remap是如何进行的,不解释,具体参看书本<Understanding the linux kernel>:

void * kmap(struct page * page)
{
if (!PageHighMem(page))
return page_address(page);
return kmap_high(page);
}

void * kmap_high(struct page * page)
{
unsigned long vaddr;
spin_lock(&kmap_lock);
vaddr = (unsigned long) page_address(page);
if (!vaddr)
vaddr = map_new_virtual(page);
pkmap_count[(vaddr-PKMAP_BASE) >> PAGE_SHIFT]++;
spin_unlock(&kmap_lock);
return (void *) vaddr;
}
View Code
 1 for (;;) {
2 int count;
3 DECLARE_WAITQUEUE(wait, current);
4 for (count = LAST_PKMAP; count > 0; --count) {
5 last_pkmap_nr = (last_pkmap_nr + 1) & (LAST_PKMAP - 1);
6 if (!last_pkmap_nr) {
7 flush_all_zero_pkmaps( );
8 count = LAST_PKMAP;
9 }
10 if (!pkmap_count[last_pkmap_nr]) {
11 unsigned long vaddr = PKMAP_BASE +
12 (last_pkmap_nr << PAGE_SHIFT);
13 set_pte(&(pkmap_page_table[last_pkmap_nr]),
14 mk_pte(page, _ _pgprot(0x63)));
15 pkmap_count[last_pkmap_nr] = 1;
16 set_page_address(page, (void *) vaddr);
17 return vaddr;
18 }
19 }
20 current->state = TASK_UNINTERRUPTIBLE;
21 add_wait_queue(&pkmap_map_wait, &wait);
22 spin_unlock(&kmap_lock);
23 schedule( );
24 remove_wait_queue(&pkmap_map_wait, &wait);
25 spin_lock(&kmap_lock);
26 if (page_address(page))
27 return (unsigned long) page_address(page);
28 }

3 Temporary Kernel Mappings

Temporary kernel Mappings 和Permanent kernel mappings中有一个比较:

1 The temporary mapping of data from highmem into kernel virtual
memory is done using the functions kmap(), kunmap(), kmap_atomic() and kunmap_atomic().

2 The function kmap() gives you a persistant mapping, ie. one that will
still be there after you schedule and/or move to another CPU.
However, this kind of mapping is allocated under a global lock, which can be a bottleneck on SMP systems.
The kmap() function is discouraged.

3 Good SMP scalability can be obtained by using kmap_atomic(), which is lockless.
The reason kmap_atomic() can run without any locks is that the page is mapped to a fixed address
which is private to the CPU on which you run. Of course, this means that you can not schedule between setting up
such a mapping and using it, since another process running on the same CPU might also need the same address!
This is the highmem mapping type used most in the 2.6 kernel.

Fix-mapped 的一些数据结构:

  • enum fixed-address ----- 主要用于内核编译的时候确定virtual 地址,它还包括很多其他的用途,但是这里的Temporary kernal mapping只用到了FIX_KMAP_BEGIN和FIX_KMAP_END。以下是它的数据结构定义:。
     Here we define all the compile-time 'special' virtual
     addresses. The point is to have a constant address at
     compile time, but to set the physical address only
     in the boot process. We allocate these special addresses
     from the end of virtual memory (0xfffff000) backwards.


    enum
    fixed_addresses{
      ....
      #ifdef CONFIG_HIGHMEM
    FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
    FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
    #endif
      ....
    }
  • enum km_type --- 主要用于访问high_memory的remap。
    1 Each CPU has its own set of 13 windows, represented by the enum km_type data structure. 

    2 The kernel must ensure that the same window is never used by two kernel control paths at the same time.
    Thus, each symbol in the km_type structure is dedicated to one kernel component and is named after the
    component. The last symbol, KM_TYPE_NR, does not represent a linear address by itself, but yields the
    number of different windows usable by every CPU。

    以上的意思是:模块总共可能有13个control path(kernel component)同时运行,于是将这13个control path各分一个window
    (即一个page table entry)。这样就不用加锁,不会出现冲突了。同时如果是smp, 每个cpu都有13个window。

    【虽然暂时不知道为什么会有13个control path?但以后会理解的】

    下面这段代码就是使用fixed_addresses and km_type来进行page的替换,将type转换成cpu对应的window的linear address, 然后修改page table:
    void * kmap_atomic(struct page * page, enum km_type type)
    {
    enum fixed_addresses idx;
    unsigned long vaddr;
    current_thread_info( )->preempt_count++;
    if (!PageHighMem(page))
    return page_address(page);
    idx = type + KM_TYPE_NR * smp_processor_id( );
    vaddr = fix_to_virt(FIX_KMAP_BEGIN + idx);
    set_pte(kmap_pte-idx, mk_pte(page, 0x063));
    _ _flush_tlb_single(vaddr);
    return (void *) vaddr;
    }

    ------------------------------------------------------------------------------------------------------
    -----------------------------------------------------------------------------------------------------

 ps:

1 ZONE_DMA
Contains page frames of memory below 16 MB

2 ZONE_NORMAL
Contains page frames of memory at and above 16 MB and below 896 MB

3 ZONE_HIGHMEM
Contains page frames of memory at and above 896 MB


-----------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------

Linear Addresses of Noncontiguous Memory Areas 

Linux provides a mechanism via vmalloc() where non-contiguous physically memory can be used that is contiguous in virtual memory.
主要是如果系统中连续的内存不够的时候,使用vmalloc(),可以在high_memory中分配一些零碎的page,使得这些page在physical memory是离散的,使用page table将其映射成virtual memory是连续的。

get_vm_area() ------ looks for a free range of linear addresses between VMALLOC_START and VMALLOC_END.(就是说分配一块虚拟地址),此函数的主要功能就是。

  1. Invokes kmalloc( ) to obtain a memory area for the new descriptor of type vm_struct.
  2. Gets the vmlist_lock lock for writing and scans the list of descriptors of type vm_struct looking for a free range of linear addresses that includes at least size + 4096 addresses (4096 is the size of the safety interval between the memory areas).
  3. If such an interval exists, the function initializes the fields of the descriptor, releases the vmlist_lock lock, and terminates by returning the initial address of the noncontiguous memory area。
  4. Otherwise, get_vm_area( ) releases the descriptor obtained previously, releases the vmlist_lock lock, and returns NULL.

下面是申请物理上的page,并且映射为virtual上连续的page,读者读的时候即使有些不理解的地方,大体上就是这个样子,可以暂时不求甚解。

void * vmalloc(unsigned long size)
{
struct vm_struct *area;
struct page **pages;
unsigned int array_size, i;
size = (size + PAGE_SIZE - 1) & PAGE_MASK;
area = get_vm_area(size, VM_ALLOC); ------------ 【分配虚拟内存地址】
if (!area)
return NULL;
area->nr_pages = size >> PAGE_SHIFT;
array_size = (area->nr_pages * sizeof(struct page *));
area->pages = pages = kmalloc(array_size, GFP_KERNEL); ---------- 【申请存储struct page *的指针数组】
if (!area_pages) {
remove_vm_area(area->addr);
kfree(area);
return NULL;
}
memset(area->pages, 0, array_size);
for (i=0; i<area->nr_pages; i++) {
area->pages[i] = alloc_page(GFP_KERNEL|_ _GFP_HIGHMEM); -------- 【在高地址处分配物理上存在的page,其实是返回struct page * 的指针】
if (!area->pages[i]) {
area->nr_pages = i;
fail: vfree(area->addr);
return NULL;

}
}
if (map_vm_area(area, _ _pgprot(0x63), &pages)) ---------- 【在page table做映射,如果存在就修改,不存在就生成page table的各级表项】
goto fail;
return area->addr; ------- 【返回虚拟地址】
}



 

 

 

 

 

 

 

 

 

 

 

 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/404675.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

linux 使用paho C库实现mqtt客户端

一、下载 github 下载paho mqtt c库源码&#xff0c;编译安装库文件。 地址&#xff1a;https://github.com/eclipse/paho.mqtt.c 关键API&#xff1a;Paho Asynchronous MQTT C Client Library: MQTTAsync.h File Reference 说明&#xff1a;paho客户端库是纯c库&#xff…

Linux实验二报告

北京电子科技学院&#xff08;BESTI&#xff09; 实 验 报 告 课程&#xff1a;信息安全系统设计基础 班级&#xff1a; 201352 姓名&#xff1a;池彬宁 贺邦 学号&#xff1a;20135212 20135208 成绩&#xff1a; 指导教…

重学数据结构007——二叉查找树

之前的博客中提到过&#xff0c;我学习采用的参考书是《数据结构与算法分析——C语言描述》。这门书的组织安排与国内广泛实用的教材《数据结构——C语言版》比较不同。这本书描述了一些树和二叉树的概念&#xff0c;举例讲解了什么是树的三种遍历之后&#xff0c;就开始重点讲…

无线网卡共享Internet访问到电脑有线接口注意事项

一、共享网卡访问 网卡属性--共享--以太网 Note: 观察有线以太网的IPv4是否有Internet访问权限。Note&#xff1a;配置IPv4的NDS与无线网卡WLAN的属性一致。 二、有线网卡配置 Note&#xff1a;连接到有线网卡上的外接设备需要配置IP地址为192.168.137.xxx&#xff0c;处于同…

Nginx 多进程连接请求/事件分发流程分析

Nginx使用多进程的方法进行任务处理&#xff0c;每个worker进程只有一个线程&#xff0c;单线程循环处理全部监听的事件。本文重点分析一下多进程间的负载均衡问题以及Nginx多进程事件处理流程&#xff0c;方便大家自己写程序的时候借鉴。 一、监听建立流程 整个建立监听socket…

h264检测是I帧还是P帧

From: http://blog.csdn.net/zgyulongfei/article/details/7558031 今天在网上找了一些资料&#xff0c;知道了如何检测h264中的帧类型&#xff0c;在这里记录下来。 首先&#xff0c;贴出nal单元类型定义&#xff08;图从《新一代视频压缩编码标准H.264》摘录&#xff09;&am…

C#之out和ref区别

out与ref的区别总结&#xff1a;1.两者都是通过引用来传递。2.两者都按地址传递的&#xff0c;使用后都将改变原来参数的数值。3.属性不是变量&#xff0c;因此不能作为 out或ref 参数传递。4.若要使用 ref 或 out,方法定义和调用方法都必须显式使用 out、ref 关键字。5.rel可以…

一次ssh登录不成功的解决经历

一、列出解决过程中所有报错信息 ssh connection refused port 22Stopped OpenBSD Secure Shell server. Failed to start OpenBSD Secure Shell server.OpenSSL version mismatch. Built against 1010104f, you have 101000cf Unable to fetch some archives, maybe run apt-…

IOS自动化打包介绍

摘要 随着苹果手持设备用户的不断增加&#xff0c;ios应用也增长迅速&#xff0c;同时随着iphone被越狱越来越多的app 的渠道也不断增多&#xff0c;为各个渠道打包成了一件费时费力的工作&#xff0c;本文提供一种比较智能的打包方式来减少其带来的各种不便。 TAG Ios打包&…

win10 vscode 无法激活python 虚拟环境的解决办法

一、powershell中 python创建虚拟环境无法激活 二、管理员模式运行powershell&#xff0c;执行策略更改&#xff1a; Set-ExecutionPolicy RemoteSigned&#xff0c;输入y 三、vscode再次激活&#xff1a; .\flask-venv\Scripts\activate 激活成功。 四、退出虚拟环境&#x…

vscode 升级过后自带的四种终端

一、版本 二、终端 自带了四种默认配置终端&#xff0c;删除以前Edit in settings.json的“terminal.integrated.shell.windows”字段。 四种默认终端&#xff1a; powershellwslcmdjavaScript Debug Terminal

2015第19本:异类--不一样的成功启示录

一位移民加拿大的高中同学在2012年回国探亲&#xff0c;聚会时曾推荐了《异类--不一样的成功启示录》这本书&#xff0c;英文书名叫《Outliers - the story of success》&#xff0c;一直没有系统地看完。在整理Omnifocus的读书列表时又发现了此书&#xff0c;还是趁这个机会把…

windows10 安装mqtt服务器和client客户端进行本地调试

一、安装mqtt服务器 使用emqx作为mqtt服务器&#xff0c;下载emqx-windows-4.3.8.zip。 emqx-windows-4.3.8.zip 其他版本&#xff1a;Directory listing for broker: / | EMQ 解压到自定义目录位置&#xff0c;在cmd窗口进入解压后的bin目录 cd /d D:\Tools\exqxServer\em…

I,P,B帧和PTS,DTS的关系

From: http://www.cnblogs.com/qingquan/archive/2011/07/27/2118967.html 基本概念&#xff1a; I frame &#xff1a;帧内编码帧 又称intra picture&#xff0c;I 帧通常是每个 GOP&#xff08;MPEG 所使用的一种视频压缩技术&#xff09;的第一个帧&#xff0c;经过适度地压…

Windows Subsystem for Linux(WSL)安装emqx

一、安装 win10自带linux子系统&#xff0c;wsl ubuntu&#xff0c;安装方法同ubuntu。 脚本一键安装&#xff1a;curl https://repos.emqx.io/install_emqx.sh | bash 二、使用 $ emqx start emqx 4.0.0 is started successfully! $ emqx_ctl status Node emqx127.0.0.1 i…

丰富“WinForms” 的一个别样项目(学生管理)

一个别样的WinForms项目&#xff0c;他并没多么的新颖&#xff0c;但是它的用处确实有点多&#xff0c;或许会有你需要的地方&#xff1b;如果你对WinForms中那么多控件无法把握&#xff0c;又或者是你根本就不懂&#xff0c;那我觉得你应该好好看看&#xff0c;如果一个人的人…

OSPF区域不能与area 0 相连的解决方法

有些时候&#xff0c;由于区域包含的路由器过多或区域的地理位置原因等&#xff0c;造成网络中配置的OSPF区域&#xff08;非area 0&#xff09;不能够与area 0相连。大家都知道&#xff0c;在OSPF的所有区域内&#xff0c;area 0 是骨干区域&#xff0c;非0区域都要与area0相连…

emqx使用webhook数据持久化到mysql

官方文档&#xff1a;WebHook | EMQ Docs 一、启用webhook和触发规则 编辑webhook规则配置文件&#xff1a;/etc/emqx/plugins/emqx_web_hook.conf 指定webhook的url位置&#xff1a;web.hook.url http://127.0.0.1:5000/mqtt/webhook 增加消息推送事件规则&#xff1a;指…

为什么你应该使用OpenGL而不是DirectX?

From: http://www.cnblogs.com/Baesky/archive/2011/04/08/2009128.html 这是一篇很意思的博文&#xff0c;原文链接为&#xff1a;http://blog.wolfire.com/2010/01/Why-you-should-use-OpenGL-and-not-DirectX 大家可以思考一下&#xff1a;why we choose a closed source AP…

flask web开发的相关博文学习

一、基础教程 flask-tutorial/SUMMARY.md at master greyli/flask-tutorial GitHubThe Flask Mega-Tutorial Part I: Hello, World! - miguelgrinberg.com全面的Flask教程 - 简书Flask入门教程 - HelloFlask 主推miguelgrinberg&#xff0c;课程如下 二、高级应用 flask-sq…