一、前言
在Service组件StartService()方式启动流程分析文章中,针对Context#startService()启动Service流程分析了源码,其实关于Service启动还有一个比较重要的点是Service启动的ANR,因为因为线上出现了上百例的"executing service " + service.shortName的异常。
二、Service-ANR原理
2.1 Service启动ANR原理简述
Service的ANR触发原理,是在启动Service前使用Handler发送一个延时的Message(埋炸弹过程),然后在Service启动完成后remove掉这个Message(拆炸弹过程)。如果在指定的延迟时间内没有remove掉这个Message,那么就会触发ANR(没有在炸弹爆炸前拆掉就会爆炸),弹出AppNotResponding的弹窗。
其实这个机制跟Windows/MacOS的应用程序无响应,是类似的交互设计。
2.2 前台Service VS 后台Service的区别
2.2.1 前台Service
前台Service是一种在通知栏中显示持续通知的服务,它通常用于执行用户明确知晓的任务,比如音乐播放器、定位服务等。前台Service在系统内部被视为用户正在主动使用的组件,因此它具有更高的优先级和较低的系统资源限制。在使用前台Service时,必须在通知栏中显示一个通知,以告知用户有一个正在运行的Service,并且通常还应该提供一些与该Service相关的有用信息。
2.2.3 后台Service
后台Service是一种不会在通知栏中显示通知的服务。它用于执行一些不需要用户直接交互或注意的任务,例如数据同步、网络请求等。后台Service具有较低的系统优先级,系统可能会在资源紧张的情况下终止这些服务,以释放资源。
2.3 Service启动ANR源码执行过程
ps: 还是基于Android SDK28源码分析
基于文章:Service组件StartService()方式启动流程分析的总结,我们已经很清楚,通过startService的方式启动Service的源码过程。因此,本文直接从com.android.server.am.ActiveServices#bringUpServiceLocked方法的源码开始分析,如有不清楚前置的启动流程的同学,可以参考我之前的文章,然后打开AS对照看下这部分的代码。
2.3.1 ActiveServices#bringUpServiceLocked
private String bringUpServiceLocked(ServiceRecord r, int intentFlags, boolean execInFg,boolean whileRestarting, boolean permissionsReviewRequired)throws TransactionTooLargeException {if (r.app != null && r.app.thread != null) {sendServiceArgsLocked(r, execInFg, false);return null;}realStartServiceLocked(r, app, execInFg);}
2.3.2 ActiveServices#realStartServiceLocked
private final void realStartServiceLocked(ServiceRecord r,ProcessRecord app, boolean execInFg) throws RemoteException {// ...// 会转调到scheduleServiceTimeoutLocked(r.app)方法进行埋炸弹操作bumpServiceExecutingLocked(r, execInFg, "create");try {app.thread.scheduleCreateService(r, r.serviceInfo,mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),app.repProcState);} catch (DeadObjectException e) {mAm.appDiedLocked(app);throw e;} finally {// serviceDoneExecutingLocked方法内会拆除炸弹serviceDoneExecutingLocked(r, inDestroying, inDestroying);// ...}
}
2.3.3 埋炸弹过程:ActiveServices#bumpServiceExecutingLocked
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {// ...scheduleServiceTimeoutLocked(r.app);// ...
}
- com.android.server.am.ActiveServices#scheduleServiceTimeoutLocked
void scheduleServiceTimeoutLocked(ProcessRecord proc) {if (proc.executingServices.size() == 0 || proc.thread == null) {return;}Message msg = mAm.mHandler.obtainMessage(ActivityManagerService.SERVICE_TIMEOUT_MSG);msg.obj = proc;mAm.mHandler.sendMessageDelayed(msg,proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
}
- 这里需要知道,炸弹的爆炸时间在ActiveServices中定义了三个:
// 前台Service的超时时间是20s
static final int SERVICE_TIMEOUT = 20*1000;
// 后台Service的超时时间是200s,是前台超时时间的10倍
static final int SERVICE_BACKGROUND_TIMEOUT = SERVICE_TIMEOUT * 10;
// How long the startForegroundService() grace period is to get around to
// calling startForeground() before we ANR + stop it.
static final int SERVICE_START_FOREGROUND_TIMEOUT = 10*1000;
2.3.4 拆炸弹过程:ActiveServices#serviceDoneExecutingLocked
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,boolean finishing) {// ...mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app); // ...
}
2.3.5 炸弹爆炸出发ANR弹窗过程
如上文分析的,ANR弹窗其实就是一个sendMessageDelayed()方式发送的一个Message,想要了解ANR炸弹这么爆炸的,其实检索这个what值为ActivityManagerService.SERVICE_TIMEOUT_MSG的消息处理过程即可。
这个消息的Handler对应的handleMessage方法实现代码在AMS.java中。
- com.android.server.am.ActivityManagerService.MainHandler#handleMessage
final class MainHandler extends Handler {public MainHandler(Looper looper) {super(looper, null, true);}@Overridepublic void handleMessage(Message msg) {switch (msg.what) {// ...case SERVICE_TIMEOUT_MSG: {mServices.serviceTimeout((ProcessRecord)msg.obj);} break;case SERVICE_FOREGROUND_TIMEOUT_MSG: {mServices.serviceForegroundTimeout((ServiceRecord)msg.obj);} break;case SERVICE_FOREGROUND_CRASH_MSG: {mServices.serviceForegroundCrash((ProcessRecord) msg.obj, msg.getData().getCharSequence(SERVICE_RECORD_KEY));} break;// ...}}
- com.android.server.am.ActiveServices#serviceTimeout
void serviceTimeout(ProcessRecord proc) {String anrMessage = null;synchronized(mAm) {if (proc.executingServices.size() == 0 || proc.thread == null) {return;}final long now = SystemClock.uptimeMillis();final long maxTime = now -(proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);ServiceRecord timeout = null;long nextTime = 0;for (int i=proc.executingServices.size()-1; i>=0; i--) {ServiceRecord sr = proc.executingServices.valueAt(i);if (sr.executingStart < maxTime) {timeout = sr;break;}if (sr.executingStart > nextTime) {nextTime = sr.executingStart;}}if (timeout != null && mAm.mLruProcesses.contains(proc)) {Slog.w(TAG, "Timeout executing service: " + timeout);StringWriter sw = new StringWriter();PrintWriter pw = new FastPrintWriter(sw, false, 1024);pw.println(timeout);timeout.dump(pw, " ");pw.close();mLastAnrDump = sw.toString();mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);// 这一句对于我们分析问题比较关键,如果有Service的ANR,// 就会在log中有这样的前缀打印:executing service Service.shortNameanrMessage = "executing service " + timeout.shortName;} else {Message msg = mAm.mHandler.obtainMessage(ActivityManagerService.SERVICE_TIMEOUT_MSG);msg.obj = proc;mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));}}// 真正触发ANR弹窗的位置if (anrMessage != null) {mAm.mAppErrors.appNotResponding(proc, null, null, false, anrMessage);}
}
这里记住anrMessage的格式是:executing service Service.shortName,代表的是Service的启动超时。
- com.android.server.am.AppErrors#appNotResponding
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) {if (mService.mController != null) {try {// 0 == continue, -1 = kill process immediately// !!关键:mService.mController的实现类是:// com.android.server.am.ActivityManagerShellCommand.MyActivityControllerint res = mService.mController.appEarlyNotResponding(app.processName, app.pid, annotation);if (res < 0 && app.pid != MY_PID) {app.kill("anr", true);}} catch (RemoteException e) {mService.mController = null;Watchdog.getInstance().setActivityController(null);}long anrTime = SystemClock.uptimeMillis();if (ActivityManagerService.MONITOR_CPU_USAGE) {mService.updateCpuStatsNow();}// Unless configured otherwise, swallow ANRs in background processes // & kill the process.// 读取开发者选项中的“显示后台ANR”开关boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;boolean isSilentANR;// Don't dump other PIDs if it's a background ANRisSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);// Log the ANR to the main log.StringBuilder info = new StringBuilder();info.setLength(0);// 这里可以看出写入到日志文件中的格式是:ANR in processName,// 比如:我们应用包名是com.techmix.myapp,// 那可在搜索时,直接输入ANR in com.techminx.myapp搜索,可更高效定位到ANR的trace位置info.append("ANR in ").append(app.processName);if (activity != null && activity.shortComponentName != null) {info.append(" (").append(activity.shortComponentName).append(")");}info.append("\n");info.append("PID: ").append(app.pid).append("\n");if (annotation != null) {// 这里的reason还是serviceTimeout中定义的Service启动的anrMessage字符串:// "executing service Service.shortName",没有多余的更明细的分类了,具体是哪一步ANR了。info.append("Reason: ").append(annotation).append("\n");}if (parent != null && parent != activity) {info.append("Parent: ").append(parent.shortComponentName).append("\n");}ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);// For background ANRs, don't pass the ProcessCpuTracker to// avoid spending 1/2 second collecting stats to rank lastPids.File tracesFile = ActivityManagerService.dumpStackTraces(true, firstPids,(isSilentANR) ? null : processCpuTracker, (isSilentANR) ? null : lastPids, nativePids);// 写入cpu占用信息到anr log中String cpuInfo = null;if (ActivityManagerService.MONITOR_CPU_USAGE) {mService.updateCpuStatsNow();synchronized (mService.mProcessCpuTracker) {cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);}info.append(processCpuTracker.printCurrentLoad());info.append(cpuInfo);}info.append(processCpuTracker.printCurrentState(anrTime));// ANR log写入到dropbox文件夹中,annotation变量就是// com.android.server.am.ActiveServices#serviceTimeout中传入的anrMessage变量mService.addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,cpuInfo, tracesFile, null);synchronized (mService) {// 静默ANR的定义?if (isSilentANR) {app.kill("bg anr", true);return;}// 通过Handler发送ANR弹窗的dialog,这里直接跟进// ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG这个消息的handleMessage处理逻辑即可Message msg = Message.obtain();msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;msg.obj = new AppNotRespondingDialog.Data(app, activity, aboveSystem);// 注意这里的mService是AMSmService.mUiHandler.sendMessage(msg);}
}
- 静默ANR的定义?
- 弹出ANR弹窗的逻辑代码:com.android.server.am.ActivityManagerService.UiHandler#handleMessage
case SHOW_NOT_RESPONDING_UI_MSG: {// 还是转调到AppErrors中的方法去实现了,所以这里只是用AMS中的UiHandler切换了一下线程而已// 最开始的startService方法,从应用主线程// ContextImpl#startService->AMS#startService(),后者其实是执行在binder线程池的线程里// 面的,是子线程。所以这里通过消息的方式切到主线程mAppErrors.handleShowAnrUi(msg);ensureBootCompleted();} break;
- com.android.server.am.AppErrors#handleShowAnrUi
// 跟ANR相关的变量直接存储在了ProcessRecord.java类中,每个进程单独维护一个boolean notResponding; // does the app have a not responding dialog?Dialog anrDialog; // dialog being displayed due to app not resp.void handleShowAnrUi(Message msg) {Dialog dialogToShow = null;synchronized (mService) {AppNotRespondingDialog.Data data = (AppNotRespondingDialog.Data) msg.obj;final ProcessRecord proc = data.proc;if (proc == null) {Slog.e(TAG, "handleShowAnrUi: proc is null");return;}if (proc.anrDialog != null) {Slog.e(TAG, "App already has anr dialog: " + proc);MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,AppNotRespondingDialog.ALREADY_SHOWING);return;}// 这个ANR的广播,应用进程如果注册了能接收到吗?Intent intent = new Intent("android.intent.action.ANR");if (!mService.mProcessesReady) {intent.addFlags(Intent.FLAG_RECEIVER_REGISTERED_ONLY| Intent.FLAG_RECEIVER_FOREGROUND);}mService.broadcastIntentLocked(null, null, intent,null, null, 0, null, null, null, AppOpsManager.OP_NONE,null, false, false, MY_PID, Process.SYSTEM_UID, 0 /* TODO: Verify */);boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;if (mService.canShowErrorDialogs() || showBackground) {dialogToShow = new AppNotRespondingDialog(mService, mContext, data);proc.anrDialog = dialogToShow;} else {MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,AppNotRespondingDialog.CANT_SHOW);// 如果ANR弹窗是关闭状态下,直接kill当前应用进程了// 跟ANR弹窗中点关闭应用一样,多是调用AMS#killAppAtUsersRequest方法// 关闭当前进程mService.killAppAtUsersRequest(proc, null);}}// If we've created a crash dialog, show it without the lock heldif (dialogToShow != null) {dialogToShow.show();}}
- com.android.server.am.AppErrors#killAppAtUserRequestLocked
void killAppAtUserRequestLocked(ProcessRecord app, Dialog fromDialog) {app.crashing = false;app.crashingReport = null;app.notResponding = false;app.notRespondingReport = null;if (app.anrDialog == fromDialog) {app.anrDialog = null;}if (app.waitDialog == fromDialog) {app.waitDialog = null;}// 这里的MY_PID是定义在AMS中的:static final int MY_PID = myPid();// 所以这里只要是有效的应用pid,都是能进入if逻辑分支中的if (app.pid > 0 && app.pid != MY_PID) {handleAppCrashLocked(app, "user-terminated" /*reason*/,null /*shortMsg*/, null /*longMsg*/, null /*stackTrace*/, null /*data*/);app.kill("user request after error", true);}}
app.kill()调用的是ProcessRecord#kill(),最终转调到AMS#killProcessGroup()方法了。这里AMS方式kill掉进程的,在Android的logcat中其实都能搜索到,Activity Manager killing的字样的。