要最近在调试系统HDMI CEC功能时,遇到一个奇怪的崩溃问题,这边记录下。
初步分析
先上日志:
--------- beginning of crash
03-06 10:48:25.503 1133 1133 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
03-06 10:48:25.503 1133 1133 F DEBUG : Build fingerprint: ':13/TD1A.220804.031/3582:userdebug/release-keys'
03-06 10:48:25.503 1133 1133 F DEBUG : Revision: '0'
03-06 10:48:25.503 1133 1133 F DEBUG : ABI: 'arm64'
03-06 10:48:25.503 1133 1133 F DEBUG : Timestamp: 2024-03-06 10:48:25.490260378-0500
03-06 10:48:25.503 1133 1133 F DEBUG : Process uptime: 6s
03-06 10:48:25.503 1133 1133 F DEBUG : Cmdline: /vendor/bin/hw/android.hardware.tv.cec@1.0-service
03-06 10:48:25.503 1133 1133 F DEBUG : pid: 615, tid: 615, name: cec@1.0-service >>> /vendor/bin/hw/android.hardware.tv.cec@1.0-service <<<
03-06 10:48:25.503 1133 1133 F DEBUG : uid: 1000
03-06 10:48:25.503 1133 1133 F DEBUG : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
03-06 10:48:25.503 1133 1133 F DEBUG : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
03-06 10:48:25.503 1133 1133 F DEBUG : Abort message: 'stack corruption detected (-fstack-protector)'
03-06 10:48:25.503 1133 1133 F DEBUG : x0 0000000000000000 x1 0000000000000267 x2 0000000000000006 x3 0000007fe8d61420
03-06 10:48:25.503 1133 1133 F DEBUG : x4 0000000000808080 x5 0000000000808080 x6 0000000000808080 x7 8080808080808080
03-06 10:48:25.503 1133 1133 F DEBUG : x8 00000000000000f0 x9 00000077cc5b4a00 x10 0000000000000001 x11 00000077cc5f2ce4
03-06 10:48:25.503 1133 1133 F DEBUG : x12 0101010101010101 x13 000000007fffffff x14 0000000000001686 x15 0000000000000030
03-06 10:48:25.503 1133 1133 F DEBUG : x16 00000077cc657d60 x17 00000077cc634b70 x18 00000077d3ae2000 x19 0000000000000267
03-06 10:48:25.504 1133 1133 F DEBUG : x20 0000000000000267 x21 00000000ffffffff x22 0000000000000030 x23 00000077d302a000
03-06 10:48:25.504 1133 1133 F DEBUG : x24 0000000000000004 x25 00000077d302a000 x26 00000077d302a000 x27 b40000763c5972c8
03-06 10:48:25.504 1133 1133 F DEBUG : x28 0000000000000000 x29 0000007fe8d614a0
03-06 10:48:25.504 1133 1133 F DEBUG : lr 00000077cc5e4868 sp 0000007fe8d61400 pc 00000077cc5e4894 pst 0000000000001000
03-06 10:48:25.504 1133 1133 F DEBUG : backtrace:
03-06 10:48:25.504 1133 1133 F DEBUG : #00 pc 0000000000051894 /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
03-06 10:48:25.504 1133 1133 F DEBUG : #01 pc 00000000000664e8 /apex/com.android.runtime/lib64/bionic/libc.so (__stack_chk_fail+20) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
03-06 10:48:25.504 1133 1133 F DEBUG : #02 pc 0000000000006954 /vendor/lib64/hw/android.hardware.tv.cec@1.0-impl.so (android::hardware::tv::cec::V1_0::implementation::HdmiCec::getPortInfo(std::__1::function<void (android::hardware::hidl_vec<android::hardware::tv::cec::V1_0::HdmiPortInfo> const&)>)+376) (BuildId: 647cc2659b38df33f681ae1d58a04c74)
03-06 10:48:25.504 1133 1133 F DEBUG : #03 pc 0000000000016540 /vendor/lib64/android.hardware.tv.cec@1.0.so (android::hardware::tv::cec::V1_0::BnHwHdmiCec::_hidl_getPortInfo(android::hidl::base::V1_0::BnHwBase*, android::hardware::Parcel const&, android::hardware::Parcel*, std::__1::function<void (android::hardware::Parcel&)>)+252) (BuildId: 8ca54579dc40d30a62824bb0a91d98f4)
03-06 10:48:25.504 1133 1133 F DEBUG : #04 pc 0000000000017668 /vendor/lib64/android.hardware.tv.cec@1.0.so (android::hardware::tv::cec::V1_0::BnHwHdmiCec::onTransact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+1132) (BuildId: 8ca54579dc40d30a62824bb0a91d98f4)
03-06 10:48:25.504 1133 1133 F DEBUG : #05 pc 000000000008ee40 /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::BHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+156) (BuildId: 3fafcf3a9734f0d41045c2b5f828b363)
03-06 10:48:25.504 1133 1133 F DEBUG : #06 pc 0000000000093dfc /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::IPCThreadState::executeCommand(int)+2784) (BuildId: 3fafcf3a9734f0d41045c2b5f828b363)
03-06 10:48:25.504 1133 1133 F DEBUG : #07 pc 00000000000931bc /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::IPCThreadState::getAndExecuteCommand()+224) (BuildId: 3fafcf3a9734f0d41045c2b5f828b363)
03-06 10:48:25.504 1133 1133 F DEBUG : #08 pc 0000000000094388 /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::IPCThreadState::joinThreadPool(bool)+172) (BuildId: 3fafcf3a9734f0d41045c2b5f828b363)
03-06 10:48:25.504 1133 1133 F DEBUG : #09 pc 00000000000010e4 /vendor/bin/hw/android.hardware.tv.cec@1.0-service (main+144) (BuildId: f6a65dc725b06643501c269fa219b717)
03-06 10:48:25.504 1133 1133 F DEBUG : #10 pc 000000000004a0f4 /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+96) (BuildId: 058e3ec96fa600fb840a6a6956c6b64e)
03-06 10:48:26.344 1267 1267 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
初步看了下,崩溃在android.hardware.tv.cec@1.0-service 服务进程中。那么简单,上addr2line工具。
addr2line
addr2line --help
Usage: addr2line [option(s)] [addr(s)]Convert addresses into line number/file name pairs.If no addresses are specified on the command line, they will be read from stdinThe options are:@<file> Read options from <file>-a --addresses Show addresses-b --target=<bfdname> Set the binary file format-e --exe=<executable> Set the input file name (default is a.out)-i --inlines Unwind inlined functions-j --section=<name> Read section-relative offsets instead of addresses-p --pretty-print Make the output easier to read for humans-s --basenames Strip directory names-f --functions Show function names-C --demangle[=style] Demangle function names-R --recurse-limit Enable a limit on recursion whilst demangling. [Default]-r --no-recurse-limit Disable a limit on recursion whilst demangling-h --help Display this information-v --version Display the program's versionaddr2line: supported targets: elf64-x86-64 elf32-i386 elf32-iamcu elf32-x86-64 pei-i386 pe-x86-64 pei-x86-64 elf64-l1om elf64-k1om elf64-little elf64-big elf32-little elf32-big pe-bigobj-x86-64 pe-i386 srec symbolsrec verilog tekhex binary ihex plugin
Report bugs to <https://sourceware.org/bugzilla/>
addr2line -ife out/target/product/aosp/symbols/vendor/lib64/hw/android.hardware.tv.cec@1.0-impl.so 0000000000006954
llvm-addr2line
记得android编译链接工具更新了,确实不能用这个版本了。下面切成llvm-addr2line工具。
prebuilts/clang/host/linux-x86/llvm-binutils-stable/llvm-addr2line --help
OVERVIEW: llvm-addr2lineUSAGE: llvm-addr2line [options] addresses...OPTIONS:--addresses Show address before line information--adjust-vma=<offset> Add specified offset to object file addresses-a Alias for --addresses--basenames Strip directory names from paths-C Alias for --demangle--debug-file-directory=<dir>Path to directory where to look for debug files-demangle=false Alias for --no-demangle-demangle=true Alias for --demangle--demangle Demangle function names--dia Use the DIA library to access symbols (Windows only)--dwp=<file> Path to DWP file to be use for any split CUs-e=<file> Alias for --obj--exe=<file> Alias for --obj--exe <file> Alias for --obj-e <file> Alias for --obj-f=<value> Alias for --functions=--fallback-debug-path=<dir>Fallback path for debug binaries--functions=<value> Print function name for a given address--functions Print function name for a given address-f Alias for --functions--help Display this help--inlines Print all inlined frames for a given address--inlining=false Alias for --no-inlines--inlining=true Alias for --inlines--inlining Alias for --inlines-i Alias for --inlines--no-demangle Don't demangle function names--no-inlines Do not print inlined frames--no-untag-addresses Remove memory tags from addresses before symbolization--obj=<file> Path to object file to be symbolized (if not provided, object file should be specified for each input line)--output-style=style Specify print style. Supported styles: LLVM, GNU, JSON--pretty-print Make the output more human friendly--print-address Alias for --addresses--print-source-context-lines=<value>Print N lines of source file context-p Alias for --pretty-print--relative-address Interpret addresses as addresses relative to the image base--relativenames Strip the compilation directory from paths-s Alias for --basenames--verbose Print verbose line info--version Display the version-v Alias for --versionllvm-symbolizer Mach-O Specific Options:--default-arch=<value> Default architecture (for multi-arch objects)--dsym-hint=<dir> Path to .dSYM bundles to search for debug info for the object filesPass @FILE as argument to read options from FILE.
于是,定位命令行切换成:
prebuilts/clang/host/linux-x86/llvm-binutils-stable/llvm-addr2line -ife out/target/product/aosp/symbols/vendor/lib64/hw/android.hardware.tv.cec@1.0-impl.so 0000000000006954
_ZN7android8hardware2tv3cec4V1_014implementation7HdmiCec11getPortInfoENSt3__18functionIFvRKNS0_8hidl_vecINS3_12HdmiPortInfoEEEEEE
hardware/interfaces/tv/cec/1.0/default/HdmiCec.cpp:0
怎么可能是源码的0行,现在轮到我崩溃了。。
背景知识
看来直接通过上面的通用方式,不能直接定位到崩溃点的代码了。
那从进程名,打印出来的函数名:getPortInfo ,对应的崩溃错误:
Abort message: 'stack corruption detected (-fstack-protector)'
来看看能不能发现些什么。
-fstack-protector 检测到的堆栈损坏
编译器的 -fstack-protector
选项会在具有栈上缓冲区的函数中插入检查机制,以防止缓冲区溢出。默认情况下,系统会为平台代码(而非应用)启用此选项。启用此选项后,编译器会向函数序言添加指令,以在堆栈上写入刚刚超过上一局部值的随机值,并向函数结尾添加指令以进行回读并确认是否发生更改。如果该值已更改,则表示该值已被缓冲区溢出覆盖,因此该结尾会调用 __stack_chk_fail
来记录消息和中止。
pid: 26717, tid: 26717, name: crasher >>> crasher <<< signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr -------- Abort message: 'stack corruption detected'r0 00000000 r1 0000685d r2 00000006 r3 00000008r4 ffd516d8 r5 0000685d r6 0000685d r7 0000010cr8 00000000 r9 00000000 sl 00000000 fp ffd518bcip 00000000 sp ffd516c8 lr ee63ece3 pc ee66ef0c cpsr 000e0010backtrace:#00 pc 00049f0c /system/lib/libc.so (tgkill+12)#01 pc 00019cdf /system/lib/libc.so (abort+50)#02 pc 0001e07d /system/lib/libc.so (__libc_fatal+24)#03 pc 0004863f /system/lib/libc.so (__stack_chk_fail+6)#04 pc 000013ed /system/xbin/crasher (smash_stack+76)#05 pc 00001591 /system/xbin/crasher (do_action+280)#06 pc 00002219 /system/xbin/crasher (main+100)#07 pc 000177a1 /system/lib/libc.so (__libc_init+48)#08 pc 00001144 /system/xbin/crasher (_start+96)
0x00 概述
栈溢出保护是一种缓冲区溢出攻击缓解手段,当函数存在缓冲区溢出攻击漏洞时,攻击者可以覆盖栈上的返回地址来让shellcode能够得到执行。当启用栈保护后,函数开始执行的时候会先往栈里插入cookie信息,当函数真正返回的时候会验证cookie信息能否合法,假如不合法就中止程序运行。攻击者在覆盖返回地址的时候往往也会将cookie信息给覆盖掉,导致栈保护检查失败而阻止shellcode的执行。在Linux中我们将cookie信息称为canary(以下统一使用canary
)。
gcc在4.2版本中增加了-fstack-protector和-fstack-protector-all编译参数以支持栈保护功能,4.9新添加了-fstack-protector-strong编译参数让保护的范围更广。以下是-fstack-protector和-fstack-protector-strong的区别:
Linux系统中存在着三种类型的栈:
-
应用程序栈:工作在Ring3,由应用程序来维护;
-
内核进程上下文栈:工作在Ring0,由内核在创立线程的时候创立;
-
内核中断上下文栈:工作在Ring0,在内核初始化的时候给每个CPU核心创立一个。
看来,是哪里可能存在内存溢出。联系到用户之前在未进行多个hdmi cec端口配置时,未有发现此问题。配置多个口后,出现此问题。而 java层代码一致,有变动的,就是HAL层这块了。
定位改动
因为有比较明确的改动地方引起,就从改动开始排查吧。
HAL被调用的地方,也是上面崩溃指向的函数:
Return<void> HdmiCec::getPortInfo(getPortInfo_cb _hidl_cb) {struct hdmi_port_info* legacyPorts;int numPorts;hidl_vec<HdmiPortInfo> portInfos;mDevice->get_port_info(mDevice, &legacyPorts, &numPorts);portInfos.resize(numPorts);for (int i = 0; i < numPorts; ++i) {portInfos[i] = {.type = static_cast<HdmiPortType>(legacyPorts[i].type),.portId = static_cast<uint32_t>(legacyPorts[i].port_id),.cecSupported = legacyPorts[i].cec_supported != 0,.arcSupported = legacyPorts[i].arc_supported != 0,.physicalAddress = legacyPorts[i].physical_address};}_hidl_cb(portInfos);return Void();
}
初始版本
struct hdmi_cec_context_t {hdmi_cec_device_t device;/* our private state goes below here */event_callback_t event_callback;void* cec_arg;struct hdmi_port_info port;int fd;int en_mask;bool enable;bool system_control;int phy_addr;bool hotplug;bool cec_init;
};
static void hdmi_cec_get_port_info(const struct hdmi_cec_device* dev,struct hdmi_port_info* list[], int* total)
{
...list[0] = &ctx->port;list[0]->type = HDMI_OUTPUT;list[0]->port_id = HDMI_CEC_PORT_ID;list[0]->cec_supported = support;list[0]->arc_supported = 0;list[0]->physical_address = val;*total = 1;
}
问题版本
struct hdmi_cec_context_t {hdmi_cec_device_t device;/* our private state goes below here */event_callback_t event_callback;void* cec_arg;struct hdmi_port_info port[4];int fd;int en_mask;bool enable;bool system_control;int phy_addr;bool hotplug;bool cec_init;
};
static void hdmi_cec_get_port_info(const struct hdmi_cec_device* dev,struct hdmi_port_info* list[], int* total)
{
...list[0] = &ctx->port[0];list[0]->type = HDMI_INPUT;list[0]->port_id = 1;list[0]->cec_supported = support;list[0]->arc_supported = 0;list[0]->physical_address = 0x1000;//CVT_DEF_ARC_PHYSICAL_ADDRESS;list[1] = &ctx->port[1];list[1]->type = HDMI_INPUT;list[1]->port_id = 2;list[1]->cec_supported = support;list[1]->arc_supported = 0;list[1]->physical_address = 0x3000;list[2] = &ctx->port[2];list[2]->type = HDMI_INPUT;list[2]->port_id = 3;list[2]->cec_supported = support;list[2]->arc_supported = 0;list[2]->physical_address = 0x4000;list[3] = &ctx->port[3];list[3]->type = HDMI_INPUT;list[3]->port_id = 4;list[3]->cec_supported = support;list[3]->arc_supported = 1;list[3]->physical_address = 0x2000;*total = 4;
}
上面测试过,只添加2个(list[0],list[1]),也是不会崩溃。看起来,是个内存溢出的问题。排查了相关数量定义,限制,似乎是没有找到有限制2个的。反馈还出现过一次配置3个的可以。
关注下面变量的定义及传递:
struct hdmi_port_info* legacyPorts;
mDevice->get_port_info(mDevice, &legacyPorts, &numPorts);
hdmi_cec_get_port_info的参数
struct hdmi_port_info* list[] 是一个指针数组,其中每个元素都是指向 struct hdmi_port_info 结构体的指针。list 是一个指针数组,它可以存储 struct hdmi_port_info* 类型的指针。
修正版本
static void hdmi_cec_get_port_info(const struct hdmi_cec_device* dev,struct hdmi_port_info* list[], int* total)
{...ctx->port[0].type = HDMI_INPUT;ctx->port[0].port_id = 1;ctx->port[0].cec_supported = 1;ctx->port[0].arc_supported = 1;ctx->port[0].physical_address = 0x1000;ctx->port[1].type = HDMI_INPUT;ctx->port[1].port_id = 2;ctx->port[1].cec_supported = 1;ctx->port[1].arc_supported = 0;ctx->port[1].physical_address = 0x2000;ctx->port[2].type = HDMI_INPUT;ctx->port[2].port_id = 3;ctx->port[2].cec_supported = 1;ctx->port[2].arc_supported = 0;ctx->port[2].physical_address = 0x3000;ctx->port[3].type = HDMI_INPUT;ctx->port[3].port_id = 4;ctx->port[3].cec_supported = 1;ctx->port[3].arc_supported = 0;ctx->port[3].physical_address = 0x4000;*list = &ctx->port[0];*total = 4;}
问题分析
让我们逐步解释上述过程中涉及到的相关步骤:
1. 定义 `legacyPorts` 指针:
struct hdmi_port_info* legacyPorts;
这行代码定义了一个名为 `legacyPorts` 的指针,它的类型是 `struct hdmi_port_info*`,即指向 `struct hdmi_port_info` 结构体的指针。
2. 调用 hdmi_cec_get_port_info 函数:
hdmi_cec_get_port_info(mDevice, &legacyPorts, &numPorts);
在这行代码中,我们将 `legacyPorts` 的地址(即指向 `legacyPorts` 指针的指针)和 `numPorts` 的地址(即指向 `numPorts` 变量的指针)传递给 `hdmi_cec_get_port_info` 函数。
3. 在 `hdmi_cec_get_port_info` 函数中:*list = ctx->port;
在函数实现中,`list` 是一个指向指针数组的指针,`ctx->port` 是指向 `struct hdmi_port_info` 数组的指针。
通过 `*list = ctx->port;` 这行代码,我们将 `ctx->port` 数组的起始地址赋值给了 `list` 指针,这样 `list` 指针就指向了 `ctx->port` 数组的内容。
由于 `legacyPorts` 是 `list` 的地址,所以在函数调用结束后,`legacyPorts` 指向了 `ctx->port` 数组的内容。
总结起来,通过使用 `&legacyPorts` 将 `legacyPorts` 的地址传递给 `hdmi_cec_get_port_info` 函数,在函数内部将 `ctx->port` 的地址赋值给了 `*list`,从而使得 `legacyPorts` 指向了 `ctx->port` 数组的内容。这样,通过 `legacyPorts` 指针,我们可以在函数外部访问和操作 `ctx->port` 数组的填充后的端口信息。
总结
所以,通过修正版本的分析,就知道问题版本出问题的原因了。
在用问题版本中,我们使用list[0] = &ctx->port[0],对struct hdmi_port_info* list[]中的每个元素进行赋值。
list[0] = &ctx->port[0];...list[3]->physical_address = 0x2000;
我们在调用时,定义了一个名为 `legacyPorts` 的指针,它的类型是 `struct hdmi_port_info*`,即指向 `struct hdmi_port_info` 结构体的指针。
参考链接:
诊断原生代码崩溃问题 | Android 开源项目 | Android Open Source Project
原创技术干货 | 解读Linux安全机制之栈溢出保护 - 送码网