在上一篇 我们对CoreCLR中的JIT有了一个基础的了解,这一篇我们将更详细分析JIT的实现.
JIT的实现代码主要在https://github.com/dotnet/coreclr/tree/master/src/jit下,
要对一个的函数的JIT过程进行详细分析, 最好的办法是查看JitDump.
查看JitDump需要自己编译一个Debug版本的CoreCLR, windows可以看这里, linux可以看这里,
编译完以后定义环境变量COMPlus_JitDump=Main
, Main可以换成其他函数的名称, 然后使用该Debug版本的CoreCLR执行程序即可.
JitDump的例子可以看这里, 包含了Debug模式和Release模式的输出.
接下来我们来结合代码一步步的看JIT中的各个过程.
以下的代码基于CoreCLR 1.1.0和x86/x64分析, 新版本可能会有变化.
(为什么是1.1.0? 因为JIT部分我看了半年时间, 开始看的时候2.0还未出来)
JIT的触发
在上一篇中我提到了, 触发JIT编译会在第一次调用函数时, 会从桩(Stub)触发:
这就是JIT Stub实际的样子, 函数第一次调用前Fixup Precode的状态:
Fixup Precode:(lldb) di --frame --bytes
-> 0x7fff7c21f5a8: e8 2b 6c fe ff callq 0x7fff7c2061d80x7fff7c21f5ad: 5e popq %rsi0x7fff7c21f5ae: 19 05 e8 23 6c fe sbbl %eax, -0x193dc18(%rip)0x7fff7c21f5b4: ff 5e a8 lcalll *-0x58(%rsi)0x7fff7c21f5b7: 04 e8 addb $-0x18, %al0x7fff7c21f5b9: 1b 6c fe ff sbbl -0x1(%rsi,%rdi,8), %ebp0x7fff7c21f5bd: 5e popq %rsi0x7fff7c21f5be: 00 03 addb %al, (%rbx)0x7fff7c21f5c0: e8 13 6c fe ff callq 0x7fff7c2061d80x7fff7c21f5c5: 5e popq %rsi0x7fff7c21f5c6: b0 02 movb $0x2, %al
(lldb) di --frame --bytes
-> 0x7fff7c2061d8: e9 13 3f 9d 79 jmp 0x7ffff5bda0f0 ; PrecodeFixupThunk0x7fff7c2061dd: cc int3 0x7fff7c2061de: cc int3 0x7fff7c2061df: cc int3 0x7fff7c2061e0: 49 ba 00 da d0 7b ff 7f 00 00 movabsq $0x7fff7bd0da00, %r100x7fff7c2061ea: 40 e9 e0 ff ff ff jmp 0x7fff7c2061d0
这两段代码只有第一条指令是相关的, 注意callq后面的5e 19 05
, 这些并不是汇编指令而是函数的信息, 下面会提到.
接下来跳转到Fixup Precode Chunk
, 从这里开始的代码所有函数都会共用:
Fixup Precode Chunk:(lldb) di --frame --bytes
-> 0x7ffff5bda0f0 <PrecodeFixupThunk>: 58 popq %rax ; rax = 0x7fff7c21f5ad0x7ffff5bda0f1 <PrecodeFixupThunk+1>: 4c 0f b6 50 02 movzbq 0x2(%rax), %r10 ; r10 = 0x05 (precode chunk index)0x7ffff5bda0f6 <PrecodeFixupThunk+6>: 4c 0f b6 58 01 movzbq 0x1(%rax), %r11 ; r11 = 0x19 (methoddesc chunk index)0x7ffff5bda0fb <PrecodeFixupThunk+11>: 4a 8b 44 d0 03 movq 0x3(%rax,%r10,8), %rax ; rax = 0x7fff7bdd5040 (methoddesc chunk)0x7ffff5bda100 <PrecodeFixupThunk+16>: 4e 8d 14 d8 leaq (%rax,%r11,8), %r10 ; r10 = 0x7fff7bdd5108 (methoddesc)0x7ffff5bda104 <PrecodeFixupThunk+20>: e9 37 ff ff ff jmp 0x7ffff5bda040 ; ThePreStub
这段代码的源代码在vm\amd64\unixasmhelpers.S:
LEAF_ENTRY PrecodeFixupThunk, _TEXTpop rax // Pop the return address. It points right after the call instruction in the precode.// Inline computation done by FixupPrecode::GetMethodDesc()movzx r10,byte ptr [rax+2] // m_PrecodeChunkIndexmovzx r11,byte ptr [rax+1] // m_MethodDescChunkIndexmov rax,qword ptr [rax+r10*8+3]lea METHODDESC_REGISTER,[rax+r11*8]// Tail call to prestubjmp C_FUNC(ThePreStub)LEAF_END PrecodeFixupThunk, _TEXT
popq %rax
后rax会指向刚才callq后面的地址, 再根据后面储存的索引值可以得到编译函数的MethodDesc
, 接下来跳转到The PreStub
:
ThePreStub:(lldb) di --frame --bytes
-> 0x7ffff5bda040 <ThePreStub>: 55 pushq %rbp0x7ffff5bda041 <ThePreStub+1>: 48 89 e5 movq %rsp, %rbp0x7ffff5bda044 <ThePreStub+4>: 53 pushq %rbx0x7ffff5bda045 <ThePreStub+5>: 41 57 pushq %r150x7ffff5bda047 <ThePreStub+7>: 41 56 pushq %r140x7ffff5bda049 <ThePreStub+9>: 41 55 pushq %r130x7ffff5bda04b <ThePreStub+11>: 41 54 pushq %r120x7ffff5bda04d <ThePreStub+13>: 41 51 pushq %r90x7ffff5bda04f <ThePreStub+15>: 41 50 pushq %r80x7ffff5bda051 <ThePreStub+17>: 51 pushq %rcx0x7ffff5bda052 <ThePreStub+18>: 52 pushq %rdx0x7ffff5bda053 <ThePreStub+19>: 56 pushq %rsi0x7ffff5bda054 <ThePreStub+20>: 57 pushq %rdi0x7ffff5bda055 <ThePreStub+21>: 48 8d a4 24 78 ff ff ff leaq -0x88(%rsp), %rsp ; allocate transition block0x7ffff5bda05d <ThePreStub+29>: 66 0f 7f 04 24 movdqa %xmm0, (%rsp) ; fill transition block0x7ffff5bda062 <ThePreStub+34>: 66 0f 7f 4c 24 10 movdqa %xmm1, 0x10(%rsp) ; fill transition block0x7ffff5bda068 <ThePreStub+40>: 66 0f 7f 54 24 20 movdqa %xmm2, 0x20(%rsp) ; fill transition block0x7ffff5bda06e <ThePreStub+46>: 66 0f 7f 5c 24 30 movdqa %xmm3, 0x30(%rsp) ; fill transition block0x7ffff5bda074 <ThePreStub+52>: 66 0f 7f 64 24 40 movdqa %xmm4, 0x40(%rsp) ; fill transition block0x7ffff5bda07a <ThePreStub+58>: 66 0f 7f 6c 24 50 movdqa %xmm5, 0x50(%rsp) ; fill transition block0x7ffff5bda080 <ThePreStub+64>: 66 0f 7f 74 24 60 movdqa %xmm6, 0x60(%rsp) ; fill transition block0x7ffff5bda086 <ThePreStub+70>: 66 0f 7f 7c 24 70 movdqa %xmm7, 0x70(%rsp) ; fill transition block0x7ffff5bda08c <ThePreStub+76>: 48 8d bc 24 88 00 00 00 leaq 0x88(%rsp), %rdi ; arg 1 = transition block*0x7ffff5bda094 <ThePreStub+84>: 4c 89 d6 movq %r10, %rsi ; arg 2 = methoddesc0x7ffff5bda097 <ThePreStub+87>: e8 44 7e 11 00 callq 0x7ffff5cf1ee0 ; PreStubWorker at prestub.cpp:9580x7ffff5bda09c <ThePreStub+92>: 66 0f 6f 04 24 movdqa (%rsp), %xmm00x7ffff5bda0a1 <ThePreStub+97>: 66 0f 6f 4c 24 10 movdqa 0x10(%rsp), %xmm10x7ffff5bda0a7 <ThePreStub+103>: 66 0f 6f 54 24 20 movdqa 0x20(%rsp), %xmm20x7ffff5bda0ad <ThePreStub+109>: 66 0f 6f 5c 24 30 movdqa 0x30(%rsp), %xmm30x7ffff5bda0b3 <ThePreStub+115>: 66 0f 6f 64 24 40 movdqa 0x40(%rsp), %xmm40x7ffff5bda0b9 <ThePreStub+121>: 66 0f 6f 6c 24 50 movdqa 0x50(%rsp), %xmm50x7ffff5bda0bf <ThePreStub+127>: 66 0f 6f 74 24 60 movdqa 0x60(%rsp), %xmm60x7ffff5bda0c5 <ThePreStub+133>: 66 0f 6f 7c 24 70 movdqa 0x70(%rsp), %xmm70x7ffff5bda0cb <ThePreStub+139>: 48 8d a4 24 88 00 00 00 leaq 0x88(%rsp), %rsp0x7ffff5bda0d3 <ThePreStub+147>: 5f popq %rdi0x7ffff5bda0d4 <ThePreStub+148>: 5e popq %rsi0x7ffff5bda0d5 <ThePreStub+149>: 5a popq %rdx0x7ffff5bda0d6 <ThePreStub+150>: 59 popq %rcx0x7ffff5bda0d7 <ThePreStub+151>: 41 58 popq %r80x7ffff5bda0d9 <ThePreStub+153>: 41 59 popq %r90x7ffff5bda0db <ThePreStub+155>: 41 5c popq %r120x7ffff5bda0dd <ThePreStub+157>: 41 5d popq %r130x7ffff5bda0df <ThePreStub+159>: 41 5e popq %r140x7ffff5bda0e1 <ThePreStub+161>: 41 5f popq %r150x7ffff5bda0e3 <ThePreStub+163>: 5b popq %rbx0x7ffff5bda0e4 <ThePreStub+164>: 5d popq %rbp0x7ffff5bda0e5 <ThePreStub+165>: 48 ff e0 jmpq *%rax%rax should be patched fixup precode = 0x7fff7c21f5a8(%rsp) should be the return address before calling "Fixup Precode"
看上去相当长但做的事情很简单, 它的源代码在vm\amd64\theprestubamd64.S:
NESTED_ENTRY ThePreStub, _TEXT, NoHandlerPROLOG_WITH_TRANSITION_BLOCK 0, 0, 0, 0, 0//// call PreStubWorker//lea rdi, [rsp + __PWTB_TransitionBlock] // pTransitionBlock*mov rsi, METHODDESC_REGISTERcall C_FUNC(PreStubWorker)EPILOG_WITH_TRANSITION_BLOCK_TAILCALLTAILJMP_RAXNESTED_END ThePreStub, _TEXT
它会备份寄存器到栈, 然后调用PreStubWorker这个函数, 调用完毕以后恢复栈上的寄存器,
再跳转到PreStubWorker的返回结果, 也就是打完补丁后的Fixup Precode的地址(0x7fff7c21f5a8).
PreStubWorker是C编写的函数, 它会调用JIT的编译函数, 然后对Fixup Precode打补丁.
打补丁时会读取前面的5e
, 5e
代表precode的类型是PRECODE_FIXUP
, 打补丁的函数是FixupPrecode::SetTargetInterlocked.
打完补丁以后的Fixup Precode如下:
Fixup Precode:(lldb) di --bytes -s 0x7fff7c21f5a80x7fff7c21f5a8: e9 a3 87 3a 00 jmp 0x7fff7c5c7d500x7fff7c21f5ad: 5f popq %rdi 0x7fff7c21f5ae: 19 05 e8 23 6c fe sbbl %eax, -0x193dc18(%rip) 0x7fff7c21f5b4: ff 5e a8 lcalll *-0x58(%rsi) 0x7fff7c21f5b7: 04 e8 addb $-0x18, %al 0x7fff7c21f5b9: 1b 6c fe ff sbbl -0x1(%rsi,%rdi,8), %ebp 0x7fff7c21f5bd: 5e popq %rsi 0x7fff7c21f5be: 00 03 addb %al, (%rbx) 0x7fff7c21f5c0: e8 13 6c fe ff callq 0x7fff7c2061d80x7fff7c21f5c5: 5e popq %rsi 0x7fff7c21f5c6: b0 02 movb $0x2, %al
下次再调用函数时就可以直接jmp到编译结果了.
JIT Stub的实现可以让运行时只编译实际会运行的函数, 这样可以大幅减少程序的启动时间, 第二次调用时的消耗(1个jmp)也非常的小.
注意调用虚方法时的流程跟上面的流程有一点不同, 虚方法的地址会保存在函数表中,
打补丁时会对函数表而不是Precode打补丁, 下次调用时函数表中指向的地址是编译后的地址, 有兴趣可以自己试试分析.
接下来我们看看PreStubWorker的内部处理.这篇文章对CoreCLR中JIT的整个流程做出了更详细的分析,但因为JIT中的代码实在太多, 我无法像分析GC的时候一样把代码全部贴出来, 有很多细节也无法顾及.欢迎大家阅读原文进行阅读
相关文章:
-
《代码的未来》读书笔记:内存管理与GC那点事儿
-
CoreCLR源码探索(一) Object是什么
-
CoreCLR源码探索(二) new是什么
-
CoreCLR源码探索(三) GC内存分配器的内部实现
-
.NET跨平台之旅:corehost 是如何加载 coreclr 的
-
.NET CoreCLR开发人员指南(上)
-
CoreCLR源码探索(四) GC内存收集器的内部实现 分析篇
-
CoreCLR源码探索(五) GC内存收集器的内部实现 调试篇
-
CoreCLR源码探索(六) NullReferenceException是如何发生的
-
CoreCLR源码探索(七) JIT的工作原理(入门篇)
原文地址:http://www.cnblogs.com/zkweb/p/7746222.html
.NET社区新闻,深度好文,微信中搜索dotNET跨平台或扫描二维码关注