The case of the exception that a catch (...) didn't catch - The Old New Thing (microsoft.com)https://devblogs.microsoft.com/oldnewthing/20240405-00/?p=109621
Raymond Chen 2024年04月05日
一位客户认为他们修复了一个bug,但他们仍然因为这个bug而崩溃。
根据!analyze
的输出,问题来自于这个堆栈:
contoso!winrt::hresult_error::hresult_error+0x143 contoso!winrt::throw_hresult+0x132 contoso!winrt::impl::consume_LitWare_IIconProvider<winrt::LitWare::IIconProvider>::LoadIcon+0x3b contoso!winrt::Contoso::implementation::IconDataModel::ReloadIcon$_ResumeCoro$1+0x214 contoso!winrt::impl::resume_background_callback+0x10 ntdll!TppSimplepExecuteCallback+0xa3 ntdll!TppWorkerThread+0x8f6 kernel32!BaseThreadInitThunk+0x1d ntdll!RtlUserThreadStart+0x28
这令人困惑,因为“我们已经修复了那个bug!”文件版本号和时间戳确认ReloadIcon
的代码确实捕获了异常:
try{icon = m_provider.LoadIcon(); // ⇐ blamed frame}catch(...){// There was a problem getting the new icon.// Just stick with the old one.LOG_CAUGHT_EXCEPTION();co_return;}
让我们看看崩溃时的堆栈:
KERNELBASE!RaiseFailFastException+0x152 combase!RoFailFastWithErrorContextInternal2+0x4d9 contoso!wil::details::FailfastWithContextCallback+0xc1 contoso!wil::details::WilFailFast+0x47 contoso!wil::details::ReportFailure_NoReturn<3>+0x2df contoso!wil::details::ReportFailure_Base<3,0>+0x30 contoso!wil::details::ReportFailure_CaughtExceptionCommonNoReturnBase<3>+0xa7 contoso!wil::details::ReportFailure_CaughtExceptionCommon+0x22 contoso!wil::details::ReportFailure_CaughtException<3>+0x40 contoso!wil::details::in1diag3::FailFast_CaughtException+0x13 contoso!`<lambda_f370031fe3623a0b308de0bbdeb2db76>::operator()'::`1'::catch$2+0x22 ucrtbase!_CallSettingFrame_LookupContinuationIndex+0x20 ucrtbase!__FrameHandler4::CxxCallCatchBlock+0x115 ntdll!RcFrameConsolidation+0x6 contoso!<lambda_f370031fe3623a0b308de0bbdeb2db76>::operator()+0x1a contoso!std::invoke+0x24 contoso!std::_Invoker_ret<void,1>::_Call+0x24 contoso!std::_Func_impl_no_alloc<<lambda_f370031fe3623a0b308de0bbdeb2db76>,void,Concurrency::task<void> >::_Do_call+0x28 contoso!std::_Func_class<void,Concurrency::task<void> >::operator()+0x31 contoso!Concurrency::details::_MakeTToUnitFunc::__l2::<lambda_64124396551846798083ef48cd389b4a>::operator()+0x46 contoso!std::invoke+0x66 contoso!std::_Invoker_ret<unsigned char,0>::_Call+0x66 contoso!std::_Func_impl_no_alloc<<lambda_64124396551846798083ef48cd389b4a>,unsigned char,Concurrency::task<void> >::_Do_call+0x72 contoso!std::_Func_class<unsigned char,Concurrency::task<void> >::operator()+0x32 contoso!Concurrency::task<void>::_ContinuationTaskHandle<void,void,std::function<void __cdecl(Concurrency::task<void>)>,std::integral_constant<bool,1>,Concurrency::details::_TypeSelectorNoAsync>::_LogWorkItemAndInvokeUserLambda<std::function<unsigned char __cdecl(Concurrency::task<void>)>,Concurrency::task<void> >+0x8b contoso!Concurrency::task<void>::_ContinuationTaskHandle<void,void,std::function<void __cdecl(Concurrency::task<void>)>,std::integral_constant<bool,1>,Concurrency::details::_TypeSelectorNoAsync>::_Continue+0x8c contoso!Concurrency::task<void>::_ContinuationTaskHandle<void,void,std::function<void __cdecl(Concurrency::task<void>)>,std::integral_constant<bool,1>,Concurrency::details::_TypeSelectorNoAsync>::_Perform+0x8 contoso!Concurrency::details::_PPLTaskHandle<unsigned char,Concurrency::task<void>::_ContinuationTaskHandle<void,void,std::function<void __cdecl(Concurrency::task<void>)>,std::integral_constant<bool,1>,Concurrency::details::_TypeSelectorNoAsync>,Concurrency::details::_ContinuationTaskHandleBase>::invoke+0x37 contoso!Concurrency::details::_TaskProcHandle::_RunChoreBridge+0x25 contoso!Concurrency::details::_DefaultPPLTaskScheduler::_PPLTaskChore::_Callback+0x26 msvcp140!Concurrency::details::`anonymous namespace'::_Task_scheduler_callback+0x5d ntdll!TppWorkpExecuteCallback+0x13a ntdll!TppWorkerThread+0x8f6 kernel32!BaseThreadInitThunk+0x1d ntdll!RtlUserThreadStart+0x28
嘿,等等,这看起来一点也不像!analyze
报告的堆栈!发生了什么?
!analyze
使用了第一次存储的异常堆栈。你可以使用!pde.dse
命令转储所有存储的异常。
0:076> !pde.dse Stowed Exception Array @ 0x000000002b1ef170Stowed Exception #1 @ 0x000000001ce068e80x80070005 (FACILITY_WIN32 - Win32 Undecorated Error Codes):E_ACCESSDENIED - General access denied errorStack : 0x2b214de0contoso!winrt::hresult_error::hresult_error+0x143contoso!winrt::throw_hresult+0x132contoso!winrt::impl::consume_LitWare_IIconProvider<winrt::LitWare::IIconProvider>::LoadIcon+0x3bcontoso!winrt::Contoso::implementation::IconDataModel::ReloadIcon$_ResumeCoro$1+0x214contoso!winrt::impl::resume_background_callback+0x10ntdll!TppSimplepExecuteCallback+0xa3ntdll!TppWorkerThread+0x8f6kernel32!BaseThreadInitThunk+0x1dntdll!RtlUserThreadStart+0x28Stowed Exception #2 @ 0x000000001ce023780x80070005 (FACILITY_WIN32 - Win32 Undecorated Error Codes):E_ACCESSDENIED - General access denied errorStack : 0x12cda890litware!winrt::hresult_error::hresult_error+0x12clitware!winrt::throw_hresult+0x83litware!winrt::LitWare::implementation::IconProvider::LoadIcon+0x90litware!winrt::impl::produce<winrt::LitWare::implementation::IconProvider,winrt::LitWare::IIconProvider>::LoadIcon+0x1bcontoso!winrt::impl::consume_LitWare_IIconProvider<winrt::LitWare::IIconProvider>::LoadIcon+0x3bcontoso!winrt::Contoso::implementation::IconDataModel::ReloadIcon$_ResumeCoro$1+0x214contoso!winrt::impl::resume_background_callback+0x10ntdll!TppSimplepExecuteCallback+0xa3ntdll!TppWorkerThread+0x8f6kernel32!BaseThreadInitThunk+0x1dntdll!RtlUserThreadStart+0x28Stowed Exception #3 @ 0x000000001ce04fa80x80070005 (FACILITY_WIN32 - Win32 Undecorated Error Codes):E_ACCESSDENIED - General access denied errorStack : 0x1d94b410combase!RoOriginateError+0x51contoso!wil::details::RaiseRoOriginateOnWilExceptions+0x137contoso!wil::details::ReportFailure_Return<1>+0x1b8contoso!wil::details::ReportFailure_Win32<1>+0x70contoso!wil::details::in1diag3::Return_Win32+0x18contoso!Internal::ContosoSettingsStorage::Save+0xdc729contoso!Internal::ContosoSettings::SaveToDefaultLocalStorage+0xf1contoso!Internal::ContosoSettings::Save+0x4efcontoso!Contoso::AppSettings::save+0x4efcontoso!std::_Func_impl_no_alloc<<lambda_f4300885c0b58e31cf789c4999ed9d7a>,void>::_Do_call+0x2bcontoso!std::_Func_impl_no_alloc<<lambda_052e919cc0e5399df76dff3972c0cac1>,unsigned char>::_Do_call+0x28contoso!Concurrency::task<unsigned char>::_InitialTaskHandle<void,<lambda_f4300885c0b58e31cf789c4999ed9d7a>,Concurrency::details::_TypeSelectorNoAsync>::_Init+0xc3contoso!Concurrency::details::_PPLTaskHandle<unsigned char,Concurrency::task<unsigned char>::_InitialTaskHandle<void,<lambda_f4300885c0b58e31cf789c4999ed9d7a>,Concurrency::details::_TypeSelectorNoAsync>,Concurrency::details::_TaskProcHandle>::invoke+0x55contoso!Concurrency::details::_TaskProcHandle::_RunChoreBridge+0x25contoso!Concurrency::details::_DefaultPPLTaskScheduler::_PPLTaskChore::_Callback+0x26msvcp140!Concurrency::details::`anonymous namespace'::_Task_scheduler_callback+0x5dntdll!TppWorkpExecuteCallback+0x13antdll!TppWorkerThread+0x686kernel32!BaseThreadInitThunk+0x10ntdll!RtlUserThreadStart+0x2b
现在事情开始清晰起来了。
抛出 Windows 运行时异常的经验法则是,在抛出异常或返回失败的HRESULT
之前,你调用RoOriginateError
来捕获堆栈和其他上下文。在处理 Windows 运行时常常用到的是异常被捕获并保存(“存储”),通常在IAsyncAction
或类似的接口中,然后稍后,当调用者执行co_await
或类似的操作时,异常被重新抛出。
当异常被重新抛出时,原始堆栈已经被展开,所以堆栈上没有东西可以追踪。调用RoOriginateError
在为时已晚之前捕获失败点的堆栈。然后这些信息可以用来“拼接”异常的生命周期,从抛出异常的代码开始,到尝试(并失败)捕获它的代码结束。
系统通过在每个线程的数据中存储错误历史来完成这种堆栈拼接,允许组件捕获该历史并将其传输到另一个线程,当任务的错误状态在线程之间移动时,并寻找具有相同HRESULT
的错误。如果有一个最近的捕获堆栈与未处理的异常的HRESULT
匹配,那么系统会说,“我敢打赌这两个属于一起。”
通常,所有这些堆栈拼接工作都很好,因为我们的 API 设计原则说,不应该为可恢复的错误抛出异常。这意味着通常没有大量的异常流量,所以误报的比率很低。
但在这个案例中,我们有一个误报:IconDataModel
调用了IconProvider::LoadIcon()
,它以E_ACCESSDENIED
失败。这个异常随后被捕获并处理。我们从前两个存储的异常中看到了这一点,使用我们刚才学到的关于拼接多个错误堆栈来获得导致失败的更完整的画面。
在这种情况下,IconProvider::LoadIcon()
明确地使用throw_hresult
抛出了一个异常(存储的异常 #2),然后在 ABI 边界处将异常从 C++ 异常转换为HRESULT
,然后在另一边,C++/WinRT 将HRESULT
重新转换为异常并重新抛出(存储的异常 #1)。这个重新抛出的异常随后被catch(...)
捕获,这就是那个异常的结束。
但这并不是导致我们崩溃的原因。
当前活动的堆栈显示我们从 lambda 表达式中引发了一个快速失败异常。调试器告诉我们是这个 lambda:
void ViewPreferences::SaveChanges()
{m_settings.save_async().then([](concurrency::task<void> precedingTask) {try{precedingTask.get();}CATCH_FAIL_FAST();});
}
代码保存设置,并在操作失败时立即失败。
我们在第三个堆栈中看到了这个失败,也就是ContosoSettingsStorage::Save
那个。那个Save
操作以E_ACCESSDENIED
失败,并记录在了失败历史中。
发生的事情是,大约在同一时间发生了两个E_ACCESSDENIED
错误,!analyze
试图弄清楚哪个堆栈属于哪个序列,并没有完全成功,它认为当前的失败与m_provider.LoadIcon()
失败相匹配。但我们使用我们的人类大脑,看到m_provider.LoadIcon()
异常被处理了,真正的罪魁祸首是存储的异常 #3。
你可以调用函数RoTransformError
如果你的代码接收一个错误代码并返回一个不同的错误代码。这告诉 COM 错误跟踪这两个错误序列应该拼接在一起形成一个大的错误序列。