gprof, Valgrind and gperftools - an evaluation of some tools for application level CPU profiling on

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

In this post I give an overview of my evaluation of three different CPU profiling tools: gperftoolsValgrind and gprof. I evaluated the three tools on usage, functionality, accuracy and runtime overhead.

The usage of the different profilers is demonstrated with the small demo program cpuload, available via my github repository gklingler/cpuProfilingDemo. The intent of cpuload.cpp is just to generate some CPU load - it does nothing useful. The bash scripts in the same repo (which are also listed below) show how to compile/link the cpuload.cpp appropriately and execute the resulting executable to get the CPU profiling data.

gprof

The GNU profiler gprof uses a hybrid approach of compiler assisted instrumentation and sampling. Instrumentation is used to gather function call information (e.g. to be able to generate call graphs and count the number of function calls). To gather profiling information at runtime, a sampling process is used. This means, that the program counter is probed at regular intervals by interrupting the program with operating system interrupts. As sampling is a statistical process, the resulting profiling data is not exact but are rather a statistical approximation gprof statistical inaccuracy.

Creating a CPU profile of your application with gprof requires the following steps:

  1. compile and link the program with a compatible compiler and profiling enabled (e.g. gcc -pg).
  2. execute your program to generate the profiling data file (default filename: gmon.out)
  3. run gprof to analyze the profiling data

Let’s apply this to our demo application:

#!/bin/bash# build the program with profiling support (-gp)
g++ -std=c++11 -pg cpuload.cpp -o cpuload# run the program; generates the profiling data file (gmon.out)
./cpuload# print the callgraph
gprof cpuload

The gprof output consists of two parts: the flat profile and the call graph.

The flat profile reports the total execution time spent in each function and its percentage of the total running time. Function call counts are also reported. Output is sorted by percentage, with hot spots at the top of the list.

Gprof’s call graph is a textual call graph representation which shows the caller and callees of each function.

For detailed information on how to interpret the callgraph, take a look at the official documentation. You can also generate a graphical representation of the callgraph with gprof2dot - a tool to generate a graphical representation of the gprof callgraph)).

The overhead (mainly caused by instrumentation) can be quite high: estimated to 30-260%1 2.

gprof does not support profiling multi-threaded applications and also cannot profile shared libraries. Even if there exist workarounds to get threading support3, the fact that it cannot profile calls into shared libraries, makes it totally unsuitable for today’s real-world projects.

valgrind/callgrind

Valgrind4 is an instrumentation framework for building dynamic analysis tools. Valgrind is basically a virtual machine with just in time recompilation of x86 machine code to some simpler RISC-like intermediate code: UCode. It does not execute x86 machine code directly but it “simulates” the on the fly generated UCode. There are various Valgrind based tools for debugging and profiling purposes. Depending on the chosen tool, the UCode is instrumented appropriately to record the data of interest. For performance profiling, we are interested in the tool callgrind: a profiling tool that records the function call history as a call-graph.

For analyzing the collected profiling data, there is is the amazing visualization tool KCachegrind5. It represents the collected data in a very nice way what tremendously helps to get an overview about whats going on.

Creating a CPU profile of your application with valgrind/callgrind is really simple and requires the following steps:

  1. compile your program with debugging symbols enabled (to get a meaningful call-graph)
  2. execute your program with valgrind --tool=callgrind ./yourprogram to generate the profiling data file
  3. analyze your profiling data with e.g. KCachegrind

Let’s apply this our demo application (profile_valgrind.sh):

#!/bin/bash# build the program (no special flags are needed)
g++ -std=c++11 cpuload.cpp -o cpuload# run the program with callgrind; generates a file callgrind.out.12345 that can be viewed with kcachegrind
valgrind --tool=callgrind ./cpuload# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind

In contrast to gprof, we don’t need to rebuild our application with any special compile flags. We can execute any executable as it is with valgrind. Of course the executed program should contain debugging information to get an expressive call graph with human readable symbol names.

Below you see a KCachegrind with the profiling data of our cpuload demo:

analyzing cpu profiling data with KCachegrind

A downside of Valgrind is the enormous slowdown of the profiled application (around a factor of 50x) what makes it impracticable to use for larger/longer running applications. The profiling result itself is not influenced by the measurement.

gperftools

Gperftools from Google provides a set of tools aimed for analyzing and improving performance of multi-threaded applications. They offer a CPU profiler, a fast thread aware malloc implementation, a memory leak detector and a heap profiler. We focus on their sampling based CPU profiler.

Creating a CPU profile of selected parts of your application with gperftools requires the following steps:

  1. compile your program with debugging symbols enabled (to get a meaningful call graph) and link gperftools profiler.so
  2. #include <gperftools/profiler.h> and surround the sections you want to profile with ProfilerStart("nameOfProfile.log"); and ProfilerStop();
  3. execute your program to generate the profiling data file(s)
  4. To analyze the profiling data, use pprof (distributed with gperftools) or convert it to a callgrind compatible format and analyze it with KCachegrind

Let’s apply this our demo application (profile_gperftools.sh):

#!/bin/bash# build the program; For our demo program, we specify -DWITHGPERFTOOLS to enable the gperftools specific #ifdefs
g++ -std=c++11 -DWITHGPERFTOOLS -lprofiler -g ../cpuload.cpp -o cpuload# run the program; generates the profiling data file (profile.log in our example)
./cpuload# convert profile.log to callgrind compatible format
pprof --callgrind ./cpuload profile.log > profile.callgrind# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind

Alternatively, profiling the whole application can be done without any changes or recompilation/linking, but I will not cover this here as this is not the recommended approach. But you can find more about this in the docs.

The gperftools profiler can profile multi-threaded applications. The run time overhead while profiling is very low and the applications run at “native speed”. We can again use KCachegrind for analyzing the profiling data after converting it to a cachegrind compatible format. I also like the possibility to be able to selectively profile just certain areas of the code, and if you want to, you can easily extend your program to enable/disable profiling at runtime.

Conclusion and comparison

gprof is the dinosaur among the evaluated profilers - its roots go back into the 1980’s. It seems it was widely used and a good solution during the past decades. But its limited support for multi-threaded applications, the inability to profile shared libraries and the need for recompilation with compatible compilers and special flags that produce a considerable runtime overhead, make it unsuitable for using it in today’s real-world projects.

Valgrind delivers the most accurate results and is well suited for multi-threaded applications. It’s very easy to use and there is KCachegrind for visualization/analysis of the profiling data, but the slow execution of the application under test disqualifies it for larger, longer running applications.

The gperftools CPU profiler has a very little runtime overhead, provides some nice features like selectively profiling certain areas of interest and has no problem with multi-threaded applications. KCachegrind can be used to analyze the profiling data. Like all sampling based profilers, it suffers statistical inaccuracy and therefore the results are not as accurate as with Valgrind, but practically that’s usually not a big problem (you can always increase the sampling frequency if you need more accurate results). I’m using this profiler on a large code-base and from my personal experience I can definitely recommend using it.

I hope you liked this post and as always, if you have questions or any kind of feedback please leave a comment below.

  1. GNU gprof Profiler ↑
  2. Low-Overhead Call Path Profiling of Unmodified, Optimized Code for higher order object oriented programs, Yu Kai Hong, Department of Mathematics at National Taiwan University; July 19, 2008, ACM 1-59593-167/8/06/2005 ↑
  3. workaround to use gprof with multithreaded applications ↑
  4. Valgrind ↑
  5. KCachegrind ↑

转载于:https://my.oschina.net/wdyoschina/blog/1506757

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/394538.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

xp计算机属性打不开,xp系统我的电脑右键属性打不开怎么办

在使用xp系统过程中,我们经常需要打开“我的电脑”右键属性,查看系统信息以及进行虚拟内存、性能方面的设置,不过有深度技术ghost xp sp3纯净版用户右键点击我的电脑,发现右键菜单中的“属性”打不开,出现这个问题通常是注册表禁用了这个属性,下面小编跟大家介绍xp系统我的电脑…

状态机学习(二)解析INI文件

题目来自<系统程序员成长计划> 作者:李先静. 状态变化如下 #include <string> #include <iostream> using namespace std;string s "[GRP]\n\ name def \n\ data 2016.11.29 \r\n\ ; this is a comment \r\n\ str this is a test \n\ [zhangshan]…

接口之用例编写、验证

一、用Excel编写用例&#xff08;xlsx格式&#xff09; 截图仅供参考&#xff0c;实际用例编写需要根据实际情况来。 二、用例加载、验证 1、数据的加载 import xlrd,xlwt #python操作excel主要用到xlrd和xlwt这两个库&#xff0c;即xlrd是读excel&#xff0c;xlwt是写excel的库…

计算机二级word真题书娟,计算机二级word试题.docx

PAGEPAGE # / 80Word试题在考生文件夹下打开文档 word.docx &#xff0c;按照要求完成下列操作并以该文件名( word.docx )保存文档。某高校为了使学生更好地进行职场定位和职业准备&#xff0c;提高就业能力&#xff0c;该校学工处将于2013 年 4月 29 日(星期五) 19:30-21:30 在…

农场js_通过销售农场商品来解释Web API

农场jsby Kevin Kononenko凯文科诺年科(Kevin Kononenko) 通过销售农场商品来解释Web API (Web APIs explained by selling goods from your farm) If you have been to a farmer’s market or farm stand, then you can understand the concept of an application programmin…

python安装pyqt4_windows下安装PyQt4

第一步&#xff1a;确认自己电脑上的Python版本。然后下载对应的.whl文件下载第二步&#xff1a;https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyqt4上下载对应版本版本的.whl文件。例如cp-27-cp27m就代表是python2.7的版本。如果要下载python3.6且电脑是64位的则需要下载PyQt…

repcached配置与简单測试

安装libevent-devel进行configure。安装在文件夹/usr/local/repcached下编译安装完毕启动11211节点启动11212节点编写文件验证复制&#xff08;分别向1、2节点存入数据&#xff0c;验证复制&#xff09;ruby执行结果

为Activity设置特定权限才能启动

1.在AndroidManifest文件中&#xff0c;声明一个权限&#xff0c;并在activity中添加属性 <!--声明权限&#xff0c;权限名一般为包名permission类名 --><permission android:name"com.jikexueyuan.notepad.specpermission.permission.MyAty"/> <acti…

nashPay项目遇到的问题

浏览器提示错误代码&#xff1a; Failed to load resource: net::ERR_CONNECTION_REFUSED 出现这个问题是core服务异常&#xff0c;重启core服务可解决 layUi 下拉框赋值 var loadZhongduan function (data) { admin.req({ url: baseUrl "shoukuanZhongduan/getList&quo…

使用Express在Node.js中实现非常基本的路由

by Pau Pavn通过保罗帕文(PauPavn) 使用Express在Node.js中实现非常基本的路由 (Really, really basic routing in Node.js with Express) The goal of this story is to briefly explain how routing works in Express while building a simple — very simple — Node app.这…

计算机抄作通用模块,通用命令行模块的设计及实现

摘要&#xff1a;自从上个世纪八十年代以来,图形用户界面得到快速发展,计算机逐渐进入各类企业,家庭,其应用得到广泛的推广.对比起命令行界面来说,图形界面在交互性上有着不可比拟的优势.但在一些需要执行大量重复性工作的方面,例如在系统管理上,命令行界面提供的脚本功能,能够…

python读写磁盘扇区数据_C++-如何直接读取Windows磁盘扇区的数据?

1.通过CreateFile系列来完成读写扇区可以通过CreateFile打开磁盘逻辑分区&#xff0c;还要通过SetFilePointer以文件操作的方式把指针移到要操作的磁盘扇区开始处&#xff0c;在定位到要访问的扇区开始位置后就可以通过ReadFile或WriteFile函数实施相应的读写访问了&#xff0c…

公司 邮件 翻译 培训 长难句 结课

今天结课啦。。。。。。 明天培训总结&#xff0c;讲翻译技巧总结。 1new forms of thoughts as well as new subjects for thought must arise in the future as they have in the past, giving rise to new standards of elegance. 2if the small hot spots look as expected…

元祖(转载)

一.基本数据类型  整数&#xff1a;int  字符串&#xff1a;str(注&#xff1a;\t等于一个tab键)  布尔值&#xff1a; bool  列表&#xff1a;list   列表用[]  元祖&#xff1a;tuple  元祖用&#xff08;&#xff09;  字典&#xff1a;dict注&#xff1a;所…

leetcood学习笔记-226- 翻转二叉树

题目描述&#xff1a; 第一次提交&#xff1a; class Solution(object):def invertTree(self, root):""":type root: TreeNode:rtype: TreeNode"""if not root:return Nonetemp root.leftroot.left root.rightroot.right temp# root.left,…

现代JavaScript中的精美图案:制冰厂

I’ve been working with JavaScript on and off since the late nineties. I didn’t really like it at first, but after the introduction of ES2015 (aka ES6), I began to appreciate JavaScript as an outstanding, dynamic programming language with enormous, expres…

惠普omen测试软件,双GTX1080奢华魔方PC 惠普OMEN X评测

惠普最近一段时间在游戏PC领域着力发力&#xff0c;桌面的暗影精灵家族热卖&#xff0c;如火如荼的势头终于传导到了台式机领域。而今&#xff0c;惠普也终于有了自己正统意义上的重型武器——桌面游戏台式机OMEN 900暗影精灵II 系列。今天我们就要为大家评测这款三万元的台式机…

python 清华镜像_Anaconda3清华镜像 V5.3.1 最新免费版

相关软件软件大小版本说明下载地址Anaconda3清华镜像是一款功能强大的python管理工具&#xff0c;此软件集成了Conda和Python等大量科学计算分析的包&#xff0c;可以帮助用户快速实现项目环境的配置&#xff0c;有需要的赶快来试试吧&#xff01;【功能特点】1、省时省心&…

Qt第五课 无构造函数可以接受源类型,或构造函数重载决策不明确

场景QJsonArray rgParams { 10, 20, 30, 40 };编译代码的时候出错&#xff0c;C11标准才支持这种类的初始化列表语法&#xff0c;因此如果当前VS的版本过低&#xff0c;必须调整已有的代码&#xff0c;例子如下&#xff1a;QJsonArray rgParams;rgParams.insert(0, 10);rgPar…

二. linux基础命令

linux的基本命令一般有100多个&#xff0c;多练就可以了&#xff1b; 如果登陆用户是root&#xff0c;那么是#&#xff1b;如果是其他用户&#xff0c;则显示的是$ 练习&#xff1a;基本命令 1.创建一个目录/data mkdir /data ls -ld /data 2.在/data下面创建一个文件oldboy.tx…