gprof, Valgrind and gperftools - an evaluation of some tools for application level CPU profiling on

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

In this post I give an overview of my evaluation of three different CPU profiling tools: gperftoolsValgrind and gprof. I evaluated the three tools on usage, functionality, accuracy and runtime overhead.

The usage of the different profilers is demonstrated with the small demo program cpuload, available via my github repository gklingler/cpuProfilingDemo. The intent of cpuload.cpp is just to generate some CPU load - it does nothing useful. The bash scripts in the same repo (which are also listed below) show how to compile/link the cpuload.cpp appropriately and execute the resulting executable to get the CPU profiling data.

gprof

The GNU profiler gprof uses a hybrid approach of compiler assisted instrumentation and sampling. Instrumentation is used to gather function call information (e.g. to be able to generate call graphs and count the number of function calls). To gather profiling information at runtime, a sampling process is used. This means, that the program counter is probed at regular intervals by interrupting the program with operating system interrupts. As sampling is a statistical process, the resulting profiling data is not exact but are rather a statistical approximation gprof statistical inaccuracy.

Creating a CPU profile of your application with gprof requires the following steps:

  1. compile and link the program with a compatible compiler and profiling enabled (e.g. gcc -pg).
  2. execute your program to generate the profiling data file (default filename: gmon.out)
  3. run gprof to analyze the profiling data

Let’s apply this to our demo application:

#!/bin/bash# build the program with profiling support (-gp)
g++ -std=c++11 -pg cpuload.cpp -o cpuload# run the program; generates the profiling data file (gmon.out)
./cpuload# print the callgraph
gprof cpuload

The gprof output consists of two parts: the flat profile and the call graph.

The flat profile reports the total execution time spent in each function and its percentage of the total running time. Function call counts are also reported. Output is sorted by percentage, with hot spots at the top of the list.

Gprof’s call graph is a textual call graph representation which shows the caller and callees of each function.

For detailed information on how to interpret the callgraph, take a look at the official documentation. You can also generate a graphical representation of the callgraph with gprof2dot - a tool to generate a graphical representation of the gprof callgraph)).

The overhead (mainly caused by instrumentation) can be quite high: estimated to 30-260%1 2.

gprof does not support profiling multi-threaded applications and also cannot profile shared libraries. Even if there exist workarounds to get threading support3, the fact that it cannot profile calls into shared libraries, makes it totally unsuitable for today’s real-world projects.

valgrind/callgrind

Valgrind4 is an instrumentation framework for building dynamic analysis tools. Valgrind is basically a virtual machine with just in time recompilation of x86 machine code to some simpler RISC-like intermediate code: UCode. It does not execute x86 machine code directly but it “simulates” the on the fly generated UCode. There are various Valgrind based tools for debugging and profiling purposes. Depending on the chosen tool, the UCode is instrumented appropriately to record the data of interest. For performance profiling, we are interested in the tool callgrind: a profiling tool that records the function call history as a call-graph.

For analyzing the collected profiling data, there is is the amazing visualization tool KCachegrind5. It represents the collected data in a very nice way what tremendously helps to get an overview about whats going on.

Creating a CPU profile of your application with valgrind/callgrind is really simple and requires the following steps:

  1. compile your program with debugging symbols enabled (to get a meaningful call-graph)
  2. execute your program with valgrind --tool=callgrind ./yourprogram to generate the profiling data file
  3. analyze your profiling data with e.g. KCachegrind

Let’s apply this our demo application (profile_valgrind.sh):

#!/bin/bash# build the program (no special flags are needed)
g++ -std=c++11 cpuload.cpp -o cpuload# run the program with callgrind; generates a file callgrind.out.12345 that can be viewed with kcachegrind
valgrind --tool=callgrind ./cpuload# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind

In contrast to gprof, we don’t need to rebuild our application with any special compile flags. We can execute any executable as it is with valgrind. Of course the executed program should contain debugging information to get an expressive call graph with human readable symbol names.

Below you see a KCachegrind with the profiling data of our cpuload demo:

analyzing cpu profiling data with KCachegrind

A downside of Valgrind is the enormous slowdown of the profiled application (around a factor of 50x) what makes it impracticable to use for larger/longer running applications. The profiling result itself is not influenced by the measurement.

gperftools

Gperftools from Google provides a set of tools aimed for analyzing and improving performance of multi-threaded applications. They offer a CPU profiler, a fast thread aware malloc implementation, a memory leak detector and a heap profiler. We focus on their sampling based CPU profiler.

Creating a CPU profile of selected parts of your application with gperftools requires the following steps:

  1. compile your program with debugging symbols enabled (to get a meaningful call graph) and link gperftools profiler.so
  2. #include <gperftools/profiler.h> and surround the sections you want to profile with ProfilerStart("nameOfProfile.log"); and ProfilerStop();
  3. execute your program to generate the profiling data file(s)
  4. To analyze the profiling data, use pprof (distributed with gperftools) or convert it to a callgrind compatible format and analyze it with KCachegrind

Let’s apply this our demo application (profile_gperftools.sh):

#!/bin/bash# build the program; For our demo program, we specify -DWITHGPERFTOOLS to enable the gperftools specific #ifdefs
g++ -std=c++11 -DWITHGPERFTOOLS -lprofiler -g ../cpuload.cpp -o cpuload# run the program; generates the profiling data file (profile.log in our example)
./cpuload# convert profile.log to callgrind compatible format
pprof --callgrind ./cpuload profile.log > profile.callgrind# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind

Alternatively, profiling the whole application can be done without any changes or recompilation/linking, but I will not cover this here as this is not the recommended approach. But you can find more about this in the docs.

The gperftools profiler can profile multi-threaded applications. The run time overhead while profiling is very low and the applications run at “native speed”. We can again use KCachegrind for analyzing the profiling data after converting it to a cachegrind compatible format. I also like the possibility to be able to selectively profile just certain areas of the code, and if you want to, you can easily extend your program to enable/disable profiling at runtime.

Conclusion and comparison

gprof is the dinosaur among the evaluated profilers - its roots go back into the 1980’s. It seems it was widely used and a good solution during the past decades. But its limited support for multi-threaded applications, the inability to profile shared libraries and the need for recompilation with compatible compilers and special flags that produce a considerable runtime overhead, make it unsuitable for using it in today’s real-world projects.

Valgrind delivers the most accurate results and is well suited for multi-threaded applications. It’s very easy to use and there is KCachegrind for visualization/analysis of the profiling data, but the slow execution of the application under test disqualifies it for larger, longer running applications.

The gperftools CPU profiler has a very little runtime overhead, provides some nice features like selectively profiling certain areas of interest and has no problem with multi-threaded applications. KCachegrind can be used to analyze the profiling data. Like all sampling based profilers, it suffers statistical inaccuracy and therefore the results are not as accurate as with Valgrind, but practically that’s usually not a big problem (you can always increase the sampling frequency if you need more accurate results). I’m using this profiler on a large code-base and from my personal experience I can definitely recommend using it.

I hope you liked this post and as always, if you have questions or any kind of feedback please leave a comment below.

  1. GNU gprof Profiler ↑
  2. Low-Overhead Call Path Profiling of Unmodified, Optimized Code for higher order object oriented programs, Yu Kai Hong, Department of Mathematics at National Taiwan University; July 19, 2008, ACM 1-59593-167/8/06/2005 ↑
  3. workaround to use gprof with multithreaded applications ↑
  4. Valgrind ↑
  5. KCachegrind ↑

转载于:https://my.oschina.net/wdyoschina/blog/1506757

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/394538.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

xp计算机属性打不开,xp系统我的电脑右键属性打不开怎么办

在使用xp系统过程中,我们经常需要打开“我的电脑”右键属性,查看系统信息以及进行虚拟内存、性能方面的设置,不过有深度技术ghost xp sp3纯净版用户右键点击我的电脑,发现右键菜单中的“属性”打不开,出现这个问题通常是注册表禁用了这个属性,下面小编跟大家介绍xp系统我的电脑…

状态机学习(二)解析INI文件

题目来自<系统程序员成长计划> 作者:李先静. 状态变化如下 #include <string> #include <iostream> using namespace std;string s "[GRP]\n\ name def \n\ data 2016.11.29 \r\n\ ; this is a comment \r\n\ str this is a test \n\ [zhangshan]…

接口之用例编写、验证

一、用Excel编写用例&#xff08;xlsx格式&#xff09; 截图仅供参考&#xff0c;实际用例编写需要根据实际情况来。 二、用例加载、验证 1、数据的加载 import xlrd,xlwt #python操作excel主要用到xlrd和xlwt这两个库&#xff0c;即xlrd是读excel&#xff0c;xlwt是写excel的库…

repcached配置与简单測试

安装libevent-devel进行configure。安装在文件夹/usr/local/repcached下编译安装完毕启动11211节点启动11212节点编写文件验证复制&#xff08;分别向1、2节点存入数据&#xff0c;验证复制&#xff09;ruby执行结果

元祖(转载)

一.基本数据类型  整数&#xff1a;int  字符串&#xff1a;str(注&#xff1a;\t等于一个tab键)  布尔值&#xff1a; bool  列表&#xff1a;list   列表用[]  元祖&#xff1a;tuple  元祖用&#xff08;&#xff09;  字典&#xff1a;dict注&#xff1a;所…

leetcood学习笔记-226- 翻转二叉树

题目描述&#xff1a; 第一次提交&#xff1a; class Solution(object):def invertTree(self, root):""":type root: TreeNode:rtype: TreeNode"""if not root:return Nonetemp root.leftroot.left root.rightroot.right temp# root.left,…

惠普omen测试软件,双GTX1080奢华魔方PC 惠普OMEN X评测

惠普最近一段时间在游戏PC领域着力发力&#xff0c;桌面的暗影精灵家族热卖&#xff0c;如火如荼的势头终于传导到了台式机领域。而今&#xff0c;惠普也终于有了自己正统意义上的重型武器——桌面游戏台式机OMEN 900暗影精灵II 系列。今天我们就要为大家评测这款三万元的台式机…

二. linux基础命令

linux的基本命令一般有100多个&#xff0c;多练就可以了&#xff1b; 如果登陆用户是root&#xff0c;那么是#&#xff1b;如果是其他用户&#xff0c;则显示的是$ 练习&#xff1a;基本命令 1.创建一个目录/data mkdir /data ls -ld /data 2.在/data下面创建一个文件oldboy.tx…

程序员这样对待简历,你期望面试官怎么对待你?

为什么想到谈这个问题呢&#xff1f; 前段时间公司因业务扩展需要招聘几个研发、运维以及测试人员&#xff0c;在看面试者的简历时&#xff0c;发现很多人都没有认真的去对待简历&#xff0c;只是把招聘网站上的打印一下就好了&#xff01; 这就让我想问几个问题&#xff1a; 1…

mfc try catch 捕获并显示_“全栈2019”Java异常第十七章:Error该不该被捕获?

难度初级学习时间30分钟适合人群零基础开发语言Java开发环境JDK v11IntelliJ IDEA v2018.3友情提示本教学属于系列教学&#xff0c;内容具有连贯性&#xff0c;本章使用到的内容之前教学中都有详细讲解。本章内容针对零基础或基础较差的同学比较友好&#xff0c;可能对于有基础…

长春高中计算机考试时间安排,长春部分高中期末考试时间出炉!

原标题&#xff1a;长春部分高中期末考试时间出炉&#xff01;上次跟大家分享了中小学的放假时间&#xff0c;今天就来说说期末考试时间吧&#xff01;虽然有的学校时间未定&#xff0c;但是按照惯例&#xff0c;长春市各大高中高一高二年级&#xff0c;本次的期末考试时间&…

用习惯了windows系统要怎样去认识linux系统(一)

一、前言对于普通用户来说99%都使用的是windows操作系统&#xff0c;即便那些会使用linux系统的技术员来说&#xff0c;他们PC上安装的也是windows系统。linux系统只是用于服务器市场&#xff0c;可以说现在服务器市场80%使用的是linux系统。那它们两系统之间有哪些区别呢&…

VAssistX使用小窍门

日常使用中的一些VAssistX使用小窍门&#xff0c;简单总结下 一&#xff0c;修改VAssistX默认缓存文件路径&#xff0c;防止默认C盘被占用空间过大 1、 打开注册表HKCU\Software\Whole Tomato&#xff0c;新建UserDataDir&#xff0c;数值为要修改的路径&#xff0c;如下图&am…

多个 gradle 文件夹 \.gradle\wrapper\dists\ 设置gradle不是每次都下载

韩梦飞沙 韩亚飞 313134555qq.com yue31313 han_meng_fei_sha 设置gradle不是每次都下载 \.gradle\wrapper\dists\ 在你导入项目的时候&#xff0c;有个选项的&#xff1a; 你要是选了Use default gradle mapper就会下载一次&#xff0c;Use local gradle distribution就会…

vb获取数组长度_如何实现数组的二分查找

二分查找是一种极其高效、简练的查找算法&#xff0c;它不仅简单&#xff0c;易用&#xff0c;而且还非常的高效。相对于顺序查找&#xff0c;二分查找在效率是呈现指数性提升&#xff0c;数据量越大&#xff0c;越能体现出二分查找法的优势。二分查找的查找过程是&#xff1a;…

所给服务器端程序改写为能够同时响应多个客户端连接请求的服务器程序_一文读懂客户端请求是如何到达服务器的...

点击上方“蓝色字体”&#xff0c;选择 “设为星标”关键讯息&#xff0c;D1时间送达&#xff01;互联网是人类历史上最伟大的发明创造之一&#xff0c;而构成互联网架构的核心在于TCP/IP协议。那么TCP/IP是如何工作的呢&#xff0c;我们先从数据包开始讲起。1、数据包一、HTTP…

消息服务器 推送技术,SSE服务器推送技术

SSE即 server send event 服务器发送事件&#xff0c;在在早期可能会使用ajax向服务器轮询的方式&#xff0c;使浏览器第一时间接受到服务器的消息&#xff0c;但这种频率不好控制&#xff0c;消耗也比较大。但是对于SSE来说&#xff0c;当客户端向服务端发送请求&#xff0c;服…

无线服务器密码让别人改了,wifi密码被改了怎么办_wifi密码被别人改了怎么办?-192路由网...

wifi密码被别人改了怎么办&#xff1f;wifi密码之所以被别人修改&#xff0c;是因为其他人知道了你路由器的登录密码。所以&#xff0c;如果发现自己wifi密码被别人修改了&#xff0c;应该立刻登录到路由器设置界面&#xff0c;修改路由器登录密码、修改wifi密码、并调整wifi加…

CentOS7 Firewall NAT 及端口映射

本节介绍用CentOS7的Firewalll来做NAT以及端口映射实验拓扑:因为我的环境里CentOS7上有KVM虚拟机需要共享网卡上网&#xff0c;所以我把网卡都添加到了桥里面&#xff0c;当然这里也可以不用桥&#xff0c;直接用物理网口&#xff1b;用nmcli创建桥&#xff0c;并添加网口到桥&…

JVM源码---教你傻瓜式编译openjdk7(JAVA虚拟机爱好者必看)

LZ经过一个星期断断续续的研究&#xff0c;终于成功的搞定了JDK的成功编译与调试。尽管网络上的教程也有不少&#xff0c;包括源码中也有自带的编译步骤说明&#xff0c;但真正自己动手的话&#xff0c;还是会遇到不少意料之外的错误。 为了方便各位猿友编译&#xff0c;LZ临时…