一、进程地址空间 Pt.2
同一个变量,地址相同,其实是虚拟地址相同,内容不同其实是被映射到了不同的物理地址
1. 页表
内存保护与页表标志位
在操作系统中,页表用于管理内存的访问权限。每个页表项通常包含一组标志位,这些标志位决定了内存页面可以进行哪些类型的操作。例如,“rwx”标志位分别表示读(r)、写(w)和执行(x)权限。字符串常量所在的内存区域
字符串常量 "hello" 通常被放置在程序的只读数据段中。这个段的主要特点是只读,不允许写入操作。在页表中,对应于这个内存区域的页表项会被设置成只读(r-),不允许写(w)和执行(x)操作。为什么 char* str = "hello"; *str = 'H'; 会崩溃?
当你执行 char* str = "hello"; *str = 'H'; 时,str 指向了一个只读段的起始地址。在这个地址处,你试图进行写入操作 *str = 'H';。然而,由于页表项中的写保护标志位(w)没有被设置,操作系统不允许对该内存区域进行写入操作。当你的程序试图写入一个只读的内存区域时,操作系统会抛出一个异常(通常是段错误,Segmentation Fault),因为这违反了内存保护机制。
存在位(Present Bit):指示该页是否存在于物理内存中。
通过检查存在位,操作系统能够决定何时加载页、何时挂起页,从而动态地调整内存的使用。这样一来,系统能够更高效地利用有限的物理内存,减少不必要的内存占用。
2. 地址空间 mm_struct
mm_struct 是一个重要的结构体,用于表示一个进程的内存管理信息,包括虚拟地址空间的布局、页表、内存映射等。每个进程都有一个 mm_struct 实例,内核通过它来跟踪该进程的内存使用情况。
可执行程序编译的时候,各个区域的大小信息已经有了。
当一个新进程被创建时,内核会分配一个 mm_struct 实例并初始化它的各个成员,比如设置虚拟地址范围、页表等。这一步骤是确保进程能够正确使用其内存空间的重要环节。如果没有正确初始化,就可能导致进程在访问其地址空间时出现问题,甚至导致系统崩溃。
可执行程序:(1)分段;(2)包含属性。
操作系统(进程管理)和编译器、编译原理、可执行程序也有关系。
【Q】当程序请求堆内存时,本质上是在开辟物理内存吗?
【A】不是直接开辟物理内存。当程序请求堆内存时,操作系统实际上是在虚拟内存空间中分配一块区域给这个请求。MMU(内存管理单元)会将程序使用的虚拟地址空间映射到实际的物理内存上。这意味着操作系统首先会在虚拟内存中为请求分配地址空间,并调整堆的大小;而真正的物理内存页只有在数据实际需要被访问时,才会由操作系统进行分配和映射。这种方式提高了内存资源的管理和利用效率。
3. 虚拟地址空间 + 页表:保护内存
【Q】什么是野指针?为什么有野指针程序崩溃了?
【A】指针指向的虚拟地址不对。不符合权限或者不存在正确映射,系统把该进程杀掉了。
4. 其他
(1)进程管理 和 内存管理 在系统层面上 解耦合 了。
进程管理负责创建、调度和终止进程,维护进程状态;而内存管理负责分配、回收和保护内存资源。虽然它们在操作系统中密切相关,但可以独立发展和优化。虚拟内存技术允许操作系统为每个进程提供一个独立的地址空间,即使这些进程实际上共享相同的物理内存。这不仅增强了安全性,也使得进程管理可以独立于实际内存资源的分配情况。
(2)让进程以统一的视角看待物理内存。
【Q】可执行程序的代码和数据可以加载到物理内存的任意位置还是必须加载到特定位置?
【A】任意位置都可,除了极少数特殊情况。
把“无序”变“有序”
地址空间本质是一个struct mm_struct。
上述内容都是OS自动完成的,只要把进程管理好了,地址空间自然就管理好了。
【Q】全局变量、字符常量具有全局性,在程序运行期间都会有效,为什么?
【A】在地址空间中,随着进程,一直存在。全局变量的虚拟地址可以一直被大家看到。
二、进程控制
1. 进程创建
更新权限为只读 ➡️ 子进程写入 ➡️ 触发了系统错误 ➡️ 缺页中断,进行系统检测 ➡️ 如果判定是否需要进行写时拷贝 ➡️ 如是,申请内存,发生拷贝,修改页表,恢复执行、读写权限(父子进程权限都被修改)
【Q】为什么要写时拷贝?
【A】因为 写入操作 != 对目标区域进行覆盖的操作,如:count++。
2. 进程终止
(1)退出码
main函数的返回值返回给父进程或者系统。
echo $? 命令行中最近一个程序退出时对应的退出码。
紧跟上次第二次用 echo $? 为何返回0?
因为 echo $? 本身也是一个程序执行
退出码:表示错误原因,0表示成功,非0表示错误
为什么失败?用不同的数字约定或表明出错的原因
提供的(通常指C++):perror, strerror, errno
Linux中的错误码:0 ~ 133
errnum = 0, its errstr = Success errnum = 1, its errstr = Operation not permitted errnum = 2, its errstr = No such file or directory errnum = 3, its errstr = No such process errnum = 4, its errstr = Interrupted system call errnum = 5, its errstr = Input/output error errnum = 6, its errstr = No such device or address errnum = 7, its errstr = Argument list too long errnum = 8, its errstr = Exec format error errnum = 9, its errstr = Bad file descriptor errnum = 10, its errstr = No child processes errnum = 11, its errstr = Resource temporarily unavailable errnum = 12, its errstr = Cannot allocate memory errnum = 13, its errstr = Permission denied errnum = 14, its errstr = Bad address errnum = 15, its errstr = Block device required errnum = 16, its errstr = Device or resource busy errnum = 17, its errstr = File exists errnum = 18, its errstr = Invalid cross-device link errnum = 19, its errstr = No such device errnum = 20, its errstr = Not a directory errnum = 21, its errstr = Is a directory errnum = 22, its errstr = Invalid argument errnum = 23, its errstr = Too many open files in system errnum = 24, its errstr = Too many open files errnum = 25, its errstr = Inappropriate ioctl for device errnum = 26, its errstr = Text file busy errnum = 27, its errstr = File too large errnum = 28, its errstr = No space left on device errnum = 29, its errstr = Illegal seek errnum = 30, its errstr = Read-only file system errnum = 31, its errstr = Too many links errnum = 32, its errstr = Broken pipe errnum = 33, its errstr = Numerical argument out of domain errnum = 34, its errstr = Numerical result out of range errnum = 35, its errstr = Resource deadlock avoided errnum = 36, its errstr = File name too long errnum = 37, its errstr = No locks available errnum = 38, its errstr = Function not implemented errnum = 39, its errstr = Directory not empty errnum = 40, its errstr = Too many levels of symbolic links errnum = 41, its errstr = Unknown error 41 errnum = 42, its errstr = No message of desired type errnum = 43, its errstr = Identifier removed errnum = 44, its errstr = Channel number out of range errnum = 45, its errstr = Level 2 not synchronized errnum = 46, its errstr = Level 3 halted errnum = 47, its errstr = Level 3 reset errnum = 48, its errstr = Link number out of range errnum = 49, its errstr = Protocol driver not attached errnum = 50, its errstr = No CSI structure available errnum = 51, its errstr = Level 2 halted errnum = 52, its errstr = Invalid exchange errnum = 53, its errstr = Invalid request descriptor errnum = 54, its errstr = Exchange full errnum = 55, its errstr = No anode errnum = 56, its errstr = Invalid request code errnum = 57, its errstr = Invalid slot errnum = 58, its errstr = Unknown error 58 errnum = 59, its errstr = Bad font file format errnum = 60, its errstr = Device not a stream errnum = 61, its errstr = No data available errnum = 62, its errstr = Timer expired errnum = 63, its errstr = Out of streams resources errnum = 64, its errstr = Machine is not on the network errnum = 65, its errstr = Package not installed errnum = 66, its errstr = Object is remote errnum = 67, its errstr = Link has been severed errnum = 68, its errstr = Advertise error errnum = 69, its errstr = Srmount error errnum = 70, its errstr = Communication error on send errnum = 71, its errstr = Protocol error errnum = 72, its errstr = Multihop attempted errnum = 73, its errstr = RFS specific error errnum = 74, its errstr = Bad message errnum = 75, its errstr = Value too large for defined data type errnum = 76, its errstr = Name not unique on network errnum = 77, its errstr = File descriptor in bad state errnum = 78, its errstr = Remote address changed errnum = 79, its errstr = Can not access a needed shared library errnum = 80, its errstr = Accessing a corrupted shared library errnum = 81, its errstr = .lib section in a.out corrupted errnum = 82, its errstr = Attempting to link in too many shared libraries errnum = 83, its errstr = Cannot exec a shared library directly errnum = 84, its errstr = Invalid or incomplete multibyte or wide character errnum = 85, its errstr = Interrupted system call should be restarted errnum = 86, its errstr = Streams pipe error errnum = 87, its errstr = Too many users errnum = 88, its errstr = Socket operation on non-socket errnum = 89, its errstr = Destination address required errnum = 90, its errstr = Message too long errnum = 91, its errstr = Protocol wrong type for socket errnum = 92, its errstr = Protocol not available errnum = 93, its errstr = Protocol not supported errnum = 94, its errstr = Socket type not supported errnum = 95, its errstr = Operation not supported errnum = 96, its errstr = Protocol family not supported errnum = 97, its errstr = Address family not supported by protocol errnum = 98, its errstr = Address already in use errnum = 99, its errstr = Cannot assign requested address errnum = 100, its errstr = Network is down errnum = 101, its errstr = Network is unreachable errnum = 102, its errstr = Network dropped connection on reset errnum = 103, its errstr = Software caused connection abort errnum = 104, its errstr = Connection reset by peer errnum = 105, its errstr = No buffer space available errnum = 106, its errstr = Transport endpoint is already connected errnum = 107, its errstr = Transport endpoint is not connected errnum = 108, its errstr = Cannot send after transport endpoint shutdown errnum = 109, its errstr = Too many references: cannot splice errnum = 110, its errstr = Connection timed out errnum = 111, its errstr = Connection refused errnum = 112, its errstr = Host is down errnum = 113, its errstr = No route to host errnum = 114, its errstr = Operation already in progress errnum = 115, its errstr = Operation now in progress errnum = 116, its errstr = Stale file handle errnum = 117, its errstr = Structure needs cleaning errnum = 118, its errstr = Not a XENIX named type file errnum = 119, its errstr = No XENIX semaphores available errnum = 120, its errstr = Is a named type file errnum = 121, its errstr = Remote I/O error errnum = 122, its errstr = Disk quota exceeded errnum = 123, its errstr = No medium found errnum = 124, its errstr = Wrong medium type errnum = 125, its errstr = Operation canceled errnum = 126, its errstr = Required key not available errnum = 127, its errstr = Key has expired errnum = 128, its errstr = Key has been revoked errnum = 129, its errstr = Key was rejected by service errnum = 130, its errstr = Owner died errnum = 131, its errstr = State not recoverable errnum = 132, its errstr = Operation not possible due to RF-kill errnum = 133, its errstr = Memory page has hardware error
(2)进程终止的方式
① main 函数 return
② exit() 是一个标准库函数,定义在 <stdlib.h> 中。它的主要用途是从当前程序中退出,并且通知父进程有关退出的状态码。当调用 exit() 时,会触发以下行为:
清理资源:自动调用已注册的终止处理程序(cleanup handlers),
这些处理程序通常是通过 atexit() 注册的。
缓冲区刷新:对标准 I/O 流进行缓冲区的刷新操作,确保未完成的输出被写入到文件或设备。
通知父进程:向父进程发送一个信号,指示该进程已结束,并提供一个退出状态码。
③ _exit() 是一个系统调用,定义在 <unistd.h> 中。它比 exit() 更底层,主要用于直接终止进程而不做额外的清理工作。具体来说:
直接终止:立即终止进程,不会执行任何标准库提供的清理工作。
不刷新缓冲区:不刷新标准 I/O 缓冲区,这意味着任何未完成的输出将被丢弃。
不通知父进程:直接向内核报告进程已结束,并传递退出状态码给父进程。
【注】这个缓冲区,一定不在操作系统内部,而是语言级缓冲区(C/C++)
3. 进程等待
回收子进程的僵尸状态
wait 函数的作用就是用来等待任意一个子进程,可以返回子进程的 pid (大于0成功,小于0失败)。
对于 waitpid 的参数而言,
pid > 0 即指定一个子进程,pid == -1 即为任意子进程
wait(-1, nullptr, 0) 等同于 wait(nullptr)
wstatus:不仅仅包含进程退出码。
帮助父进程获取子进程正常退出的信息(输出型),
32个 bit 位,是一个位图。
只考虑低16位,下标15 ~ 8对应的元素是正常退出的退出码。
一般而言,父进程创建子进程,父进程就要等待子进程(等待的时候,子进程不退,父进程就会阻塞在 wait 函数内部),直至子进程结束。