Arthas 是一款线上监控诊断产品,通过全局视角实时查看应用 load、内存、gc、线程的状态信息,并能在不修改应用代码的情况下,对业务问题进行诊断,包括查看方法调用的出入参、异常,监测方法执行耗时,类加载信息等,大大提升线上问题排查效率
1.问题背景
今天一到公司开发人员就因线上的性能问题寻求帮助,售后反馈系统升级后物理产品更换业务处理十分缓慢,有时卡主10几秒。
我先尝试从开发环境复现问题,发现开发环境处理并不慢。考虑到线上环境存在性能问题而开发环境没有这个问题,一般情况下是两边数据量差引起的。我们的开发环境中的数据比现场环境低几个数量级。可能某些SQL在线上执行比较慢引发了问题。我尝试打印慢SQL,发现日志中没有执行比较慢的SQL以及可疑日志。 这个问题变得十分棘手。
2.问题分析
在毫无头绪时,只能借助一些工具来协助分析。本问题主要用到了Arthas 的trace命令
trace :获取方法内部调用路径,并输出方法路径上的每个节点上耗时
参考: trace | arthas
排查过程
1.运行arthas ,并选择进程20408,20408是系统客户端使用的进程
D:\staibossCliente\arthas-bin>
D:\staibossCliente\arthas-bin>java -jar arthas-boot.jar
[INFO] arthas-boot version: 3.5.4
[INFO] Process 20408 already using port 3658
[INFO] Process 20408 already using port 8563
[INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.
* [1]: 20408 D:\staibossCliente\staribossclient\stariboss-new-client.exe
1
[INFO] arthas home: D:\staibossCliente\arthas-bin
[INFO] The target process already listen port 3658, skip attach.
[INFO] arthas-client connect 127.0.0.1 3658,---. ,------. ,--------.,--. ,--. ,---. ,---./ O \ | .--. ''--. .--'| '--' | / O \ ' .-'
| .-. || '--'.' | | | .--. || .-. |`. `-.
| | | || |\ \ | | | | | || | | |.-' |
`--' `--'`--' '--' `--' `--' `--'`--' `--'`-----'wiki https://arthas.aliyun.com/doc
tutorials https://arthas.aliyun.com/doc/arthas-tutorials.html
version 3.5.4
main_class
pid 20408
time 2023-11-20 17:38:39
2. 根据业务行为查看方法内部的处理时间
[arthas@20408]$ trace -E com.star.sms.richclient.accept.view.accept.acceptEditor2.ExchangePhysicalProductEditor initOptionalPhysicalProducts
Affect(class count: 1 , method count: 0) cost in 61 ms, listenerId: 10
No class or method is affected, try:
1. Execute `sm CLASS_NAME METHOD_NAME` to make sure the method you are tracing actually exists (it might be in your parent class).
2. Execute `options unsafe true`, if you want to enhance the classes under the `java.*` package.
3. Execute `reset CLASS_NAME` and try again, your method body might be too large.
4. Check arthas log: C:\Users\95225/logs/arthas/arthas.log
5. Visit https://github.com/alibaba/arthas/issues/47 for more details.
[arthas@20408]$
[arthas@20408]$
[arthas@20408]$
[arthas@20408]$ trace -E com.star.sms.richclient.accept.view.accept.acceptEditor2.ExchangePhysicalProductEditor subscriberChanged
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 3) cost in 152 ms, listenerId: 12
`---ts=2023-11-20 17:57:00;thread_name=AWT-EventQueue-0;id=f;is_daemon=false;priority=6;TCCL=com.star.sms.launch.LaunchURLClassLoader@17a8a02`---[12855.2434ms] com.star.sms.richclient.accept.view.accept.acceptEditor2.ExchangePhysicalProductEditor:subscriberChanged()+---[0.0065ms] org.apache.commons.logging.Log:debug() #115+---[0.0061ms] com.star.sms.richclient.accept.view.accept.acceptEditor2.ExchangePhysicalProductEditor:getSelectedSubscriber() #116+---[0.0033ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:setSubscriber() #117+---[min=0.0032ms,max=0.0032ms,total=0.0064ms,count=2] com.star.sms.richclient.accept.view.accept.acceptEditor2.ExchangePhysicalProductEditor:getCurrentAction() #122+---[0.0031ms] com.star.sms.richclient.accept.view.accept.acceptEditor2.ExchangePhysicalProductEditor:getCurrentCustomer() #122+---[5947.6627ms] com.star.sms.richclient.accept.view.accept.acceptEditor2.ExchangePhysicalProductEditor:getPhyProducts() #122 // getPhyProducts消耗了6S+---[6907.4171ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:onSubscriberChanged() #122 // getPhyProducts消耗了7S+---[0.0056ms] com.star.sms.richclient.accept.view.accept.acceptEditor2.ExchangePhysicalProductEditor:getCurrentCustomer() #126`---[0.0048ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:setCurrentCustomer() #126
3.分析日志
从日志中可见, getPhyProducts 消耗了6秒, onSubscriberChanged消耗了7S。定位2个性能慢的方法后,进一步分析
[arthas@20408]$ trace -E com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer onSubscriberChanged|getExchangableInfos|initOptionalPhysicalProducts
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 3) cost in 132 ms, listenerId: 14
`---ts=2023-11-20 17:59:13;thread_name=AWT-EventQueue-0;id=f;is_daemon=false;priority=6;TCCL=com.star.sms.launch.LaunchURLClassLoader@17a8a02`---[0.4848ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:onSubscriberChanged()`---[0.1634ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:clearTable() #195`---ts=2023-11-20 17:59:19;thread_name=AWT-EventQueue-0;id=f;is_daemon=false;priority=6;TCCL=com.star.sms.launch.LaunchURLClassLoader@17a8a02`---[6979.1925ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:onSubscriberChanged()+---[342.3417ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:getExchangableInfos() #198
com.star.sms.model.order.dto.ExchangeInfo:setReturnResourceState() #1082+---[93.7515ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:setTableDatas() #198`---[6543.0383ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:initOptionalPhysicalProducts() #200`---[6542.9974ms] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:initOptionalPhysicalProducts() //initOptionalPhysicalProducts 消耗时间7S+---[0.0147ms] org.apache.commons.logging.Log:debug() #907+---[min=0.0015ms,max=0.0175ms,total=0.7956ms,count=202] com.star.sms.model.product.core.catalog.PhysicalProduct:getResourceDirectory() #911+---[min=0.0017ms,max=0.0306ms,total=0.6809ms,count=202] com.star.sms.model.product.other.ResourceDirectory:getId() #911+---[min=29.9191ms,max=55.5523ms,total=6528.053ms,count=202] com.star.sms.outerInterface.resource.service.IResourceExService:getResCatalogById() #911+---[min=0.0017ms,max=0.0447ms,total=1.0478ms,count=202] com.star.sms.model.resource.core.resparam.ResourceCatalog:getResourceType() #913+---[min=0.0034ms,max=0.0327ms,total=1.4611ms,count=202] com.star.sms.richclient.accept.view.accept2.page.displayer.ExchangePhyProductDisplayer:setPhysicalProductResourceType() #914+---[min=0.0023ms,max=0.0376ms,total=0.9837ms,count=202] org.apache.commons.logging.Log:isDebugEnabled() #929`---[0.0128ms] org.apache.commons.logging.Log:debug() #934
继续查看执行时间,发现程序中initOptionalPhysicalProducts 方法用时最长,是一个程序瓶颈,此方法中使用了循环,循环中IResourceExService:getResCatalogById方法调用了202次。
查看程序代码,这是一个典型的N+1此查询性能问题。 循环中调用了N此业务查询获取ResourceCatalog信息。
private void initOptionalPhysicalProducts() {logger.debug("<== Begin to initialize the optional physical product... ==>");for (PhysicalProduct physicalProduct : physicalProducts) {ResourceCatalog resourceCatalog = resourceExService.getResCatalogById(physicalProduct .getResourceDirectory().getId()); // 这里存在N+1次查询, 其中N=202//省略代码 dosomethingif (logger.isDebugEnabled()) {logger.debug("<-- Get a product,name is:" + physicalProduct.getName()+ "; and its resource type Id is :" + resourceType.getId() + " -->");}}logger.debug("<~~The end of the physical product to initialize the optional~~>");}
3.解决方法
对于N+1此查询可以通过id集合查询+分组方式来处理,修改后的代码
private void initOptionalPhysicalProducts() {logger.debug("<== Begin to initialize the optional physical product... ==>");//1.获取ID集合List<Long> catalogids = new ArrayList<Long>();for (PhysicalProduct physicalProduct : physicalProducts) {catalogids.add(physicalProduct.getResourceDirectory().getId());}//2.只查询1次List<ResourceCatalog> catalogs = resourceExService.getResCatalogByIds(catalogids);//3.分组Map<Long, ResourceCatalog> map = new HashMap<Long, ResourceCatalog>();for (ResourceCatalog resourceCatalog : catalogs) {map.put(resourceCatalog.getId(), resourceCatalog);}for (PhysicalProduct physicalProduct : physicalProducts) {ResourceCatalog resourceCatalog = map.get(physicalProduct.getResourceDirectory().getId()); //内存里获取//省略代码 dosomethingif (logger.isDebugEnabled()) {logger.debug("<-- Get a product,name is:" + physicalProduct.getName()+ "; and its resource type Id is :" + resourceType.getId() + " -->");}}logger.debug("<~~The end of the physical product to initialize the optional~~>");}
按照此处理方式,优化处理getPhyProducts 后问题解决,修正后业务处理时间在1秒内
上一篇:Jboss启动报错Unrecognized VM option PermSize=128m