PMU单元概览
ARM PMU概要
PMU作为一个扩展功能,是一种非侵入式的调试组件。
对PMU寄存器的访问可以通过CP15协处理器指令和Memory-Mapped地址。
基于PMUv2架构,A7处理器在运行时可以收集关于处理器和内存的各种统计信息。对于处理器来说这些统计信息中的事件非常有用,你可以利用它们来调试或者剖析代码。
更详细内容参考:
《Arm CoreSight Performance Monitoring Unit Architecture》:关于PMU架构介绍,包括寄存器解释、规格、安全等等。
《ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition》:介绍了PMU在Armv7-A/R中的实现。
《Chapter C12 The Performance Monitors Extension》:PMU基本功能介绍
《Appendix D2 Recommended Memory-mapped and External Debug Interfaces for the Performance Monitors》:PMU寄存器介绍。
PMU 配置流程
PMU有两个主流版本PMUv2和PMUv3,其中大部分32位ARM的处理器使用的PMUv2,大部分64位ARM的处理器使用的PMUv3。两者的主要区别是读取相关寄存器的汇编命令不一样,PMUv2 寄存器是通过CP15 协处理器和外部APB接口来编程,PMUv3则是可以直接使用寄存器的名字来通过mrs和msr命令来读取。
设置和使用事件计数器
本节概述了在Cortex-A15(Armv7-A)上设置和使用事件计数器所需的步骤。Armv8-A处理器的步骤相似,尽管可能会有一些细微的变化。
您可以选择不激活周期计数器(标记为可选的步骤)。这不会影响事件计数器,因为它们独立于周期计数器。如果不需要读出周期数,则可以关闭计数器,这将减少PMU对系统的性能影响。
(不是必需的)启用PMU用户态访问(即允许在EL0操作PMU相关寄存器),如果你要测的代码是在用户态,那么你必须要把性能监视器用户启用寄存器(PMUSERENR)中的EN,bit [0]设置为1。如果仅在内核态使用(即EL1)则不必设置PMUSERENR寄存器。
启用PMU–在性能监视器控制寄存器(PMCR)中,将E,bit [0]设置为1。 配置事件计数器
在性能监视器事件计数器选择寄存器(PMSELR)中,将计数器编号(0-5)写入您要配置的SEL位[4:0],即选用那个计数器。
在性能监视器事件类型选择寄存器(PMXEVTYPER)中,将事件编号(从event事件列表中,见上表)写入evtCount,bits [7:0],以便选择计数器正在监视的事件。
启用已配置的事件计数器 -在性能监视器计数启用设置寄存器 (PMCNTENSET)中,将Px,bit[x](其中x对应于要启用的计数器0-5)设置为1。
(可选)启用周期计数器(CCNT) -在性能监视器计数启用设置寄存器(PMCNTENSET)中,将C,bit [31]设置为1。
(可选)重置周期计数器(CCNT) -在性能监视器控制寄存器(PMCR)中,将C,bit [2]设置为1。
重置事件计数器 -在性能监视器控制寄存器(PMCR)中,将P,bit [1]设置为1。
现在配置了计数器,并将在执行继续时监视感兴趣的事件。
(可选)禁用周期计数器(CCNT)-在性能监视器计数启用清除寄存器(PMCNTENCLR)中,将C,bit [31]设置为1。
禁用事件计数器 -在性能监视器计数启用清除寄存器(PMCNTENCLR)中,将Px,bit
[x](其中x对应于要禁用的计数器0-5)设置为1。 读取事件计数器的值
在性能监视器事件计数器选择寄存器(PMSELR)中,将计数器号(0-5)写入到您要读取的SEL位[4:0]。
所选计数器的值存储在性能监视器所选事件计数寄存器(PMXEVCNTR)中。
(可选)读取周期计数器(CCNT)的值-周期计数器的值存储在性能监视器周期计数寄存器(PMCCNTR)中。
PMU驱动的实现 纯汇编版本
pmu_v7.S
/*------------------------------------------------------------
Performance Monitor Block
------------------------------------------------------------*/.arm @ Make sure we are in ARM mode..text.align 2.global getPMN @ export this function for the linker/* Returns the number of progammable counters uint32_t getPMN(void) */getPMN:MRC p15, 0, r0, c9, c12, 0 /* Read PMNC Register */MOV r0, r0, LSR #11 /* Shift N field down to bit 0 */AND r0, r0, #0x1F /* Mask to leave just the 5 N bits */BX lr.global pmn_config @ export this function for the linker/* Sets the event for a programmable counter to record *//* void pmn_config(unsigned counter, uint32_t event) *//* counter = r0 = Which counter to program (e.g. 0 for PMN0, 1 for PMN1) *//* event = r1 = The event code */
pmn_config:AND r0, r0, #0x1F /* Mask to leave only bits 4:0 */MCR p15, 0, r0, c9, c12, 5 /* Write PMNXSEL Register */MCR p15, 0, r1, c9, c13, 1 /* Write EVTSELx Register */BX lr.global ccnt_divider @ export this function for the linker/* Enables/disables the divider (1/64) on CCNT *//* void ccnt_divider(int divider) *//* divider = r0 = If 0 disable divider, else enable dvider */
ccnt_divider:MRC p15, 0, r1, c9, c12, 0 /* Read PMNC */CMP r0, #0x0 /* IF (r0 == 0) */BICEQ r1, r1, #0x08 /* THEN: Clear the D bit (disables the divisor) */ORRNE r1, r1, #0x08 /* ELSE: Set the D bit (enables the divisor) */MCR p15, 0, r1, c9, c12, 0 /* Write PMNC */BX lr/* --------------------------------------------------------------- *//* Enable/Disable *//* --------------------------------------------------------------- */.global enable_pmu @ export this function for the linker/* Global PMU enable *//* void enable_pmu(void) */
enable_pmu:MRC p15, 0, r0, c9, c12, 0 /* Read PMNC */ORR r0, r0, #0x01 /* Set E bit */MCR p15, 0, r0, c9, c12, 0 /* Write PMNC */BX lr.global disable_pmu @ export this function for the linker/* Global PMU disable *//* void disable_pmu(void) */
disable_pmu:MRC p15, 0, r0, c9, c12, 0 /* Read PMNC */BIC r0, r0, #0x01 /* Clear E bit */MCR p15, 0, r0, c9, c12, 0 /* Write PMNC */BX lr.global enable_ccnt @ export this function for the linker/* Enable the CCNT *//* void enable_ccnt(void) */
enable_ccnt:MOV r0, #0x80000000 /* Set C bit */MCR p15, 0, r0, c9, c12, 1 /* Write CNTENS Register */BX lr.global disable_ccnt @ export this function for the linker/* Disable the CCNT *//* void disable_ccnt(void) */
disable_ccnt:MOV r0, #0x80000000 /* Clear C bit */MCR p15, 0, r0, c9, c12, 2 /* Write CNTENC Register */BX lr.global enable_pmn @ export this function for the linker/* Enable PMN{n} *//* void enable_pmn(uint32_t counter) *//* counter = r0 = The counter to enable (e.g. 0 for PMN0, 1 for PMN1) */
enable_pmn:MOV r1, #0x1 /* Use arg (r0) to set which counter to disable */MOV r1, r1, LSL r0MCR p15, 0, r1, c9, c12, 1 /* Write CNTENS Register */BX lr.global disable_pmn @ export this function for the linker/* Enable PMN{n} *//* void disable_pmn(uint32_t counter) *//* counter = r0 = The counter to enable (e.g. 0 for PMN0, 1 for PMN1) */
disable_pmn:MOV r1, #0x1 /* Use arg (r0) to set which counter to disable */MOV r1, r1, LSL r0MCR p15, 0, r1, c9, c12, 1 /* Write CNTENS Register */BX lr.global enable_pmu_user_access @ export this function for the linker/* Enables User mode access to the PMU (must be called in a priviledged mode) *//* void enable_pmu_user_access(void) */
enable_pmu_user_access:MRC p15, 0, r0, c9, c14, 0 /* Read PMUSERENR Register */ORR r0, r0, #0x01 /* Set EN bit (bit 0) */MCR p15, 0, r0, c9, c14, 0 /* Write PMUSERENR Register */BX lr.global disable_pmu_user_access @ export this function for the linker/* Disables User mode access to the PMU (must be called in a priviledged mode) *//* void disable_pmu_user_access(void) */
disable_pmu_user_access:MRC p15, 0, r0, c9, c14, 0 /* Read PMUSERENR Register */BIC r0, r0, #0x01 /* Clear EN bit (bit 0) */MCR p15, 0, r0, c9, c14, 0 /* Write PMUSERENR Register */BX lr/* --------------------------------------------------------------- *//* Counter read registers *//* --------------------------------------------------------------- */.global read_ccnt @ export this function for the linker/* Returns the value of CCNT *//* uint32_t read_ccnt(void) */
read_ccnt:MRC p15, 0, r0, c9, c13, 0 /* Read CCNT Register */BX lr.global read_pmn @ export this function for the linker/* Returns the value of PMN{n} *//* uint32_t read_pmn(uint32_t counter) *//* counter = r0 = The counter to read (e.g. 0 for PMN0, 1 for PMN1) */
read_pmn:AND r0, r0, #0x1F /* Mask to leave only bits 4:0 */MCR p15, 0, r0, c9, c12, 5 /* Write PMNXSEL Register */MRC p15, 0, r0, c9, c13, 2 /* Read current PMNx Register */BX lr/* --------------------------------------------------------------- *//* Software Increment *//* --------------------------------------------------------------- */.global pmu_software_increment @ export this function for the linker/* Writes to software increment register *//* void pmu_software_increment(uint32_t counter) *//* counter = r0 = The counter to increment (e.g. 0 for PMN0, 1 for PMN1) */
pmu_software_increment:MOV r1, #0x01MOV r1, r1, LSL r0MCR p15, 0, r1, c9, c12, 4 /* Write SWINCR Register */BX lr/* --------------------------------------------------------------- *//* Overflow & Interrupt Generation *//* --------------------------------------------------------------- */.global read_flags @ export this function for the linker/* Returns the value of the overflow flags *//* uint32_t read_flags(void) */
read_flags:MRC p15, 0, r0, c9, c12, 3 /* Read FLAG Register */BX lr.global write_flags @ export this function for the linker/* Writes the overflow flags *//* void write_flags(uint32_t flags) */
write_flags:MCR p15, 0, r0, c9, c12, 3 /* Write FLAG Register */BX lr.global enable_ccnt_irq @ export this function for the linker/* Enables interrupt generation on overflow of the CCNT *//* void enable_ccnt_irq(void) */
enable_ccnt_irq:MOV r0, #0x80000000MCR p15, 0, r0, c9, c14, 1 /* Write INTENS Register */BX lr.global disable_ccnt_irq @ export this function for the linker/* Disables interrupt generation on overflow of the CCNT *//* void disable_ccnt_irq(void) */
disable_ccnt_irq:MOV r0, #0x80000000MCR p15, 0, r0, c9, c14, 2 /* Write INTENC Register */BX lr.global enable_pmn_irq @ export this function for the linker/* Enables interrupt generation on overflow of PMN{x} *//* void enable_pmn_irq(uint32_t counter) *//* counter = r0 = The counter to enable the interrupt for (e.g. 0 for PMN0, 1 for PMN1) */
enable_pmn_irq:MOV r1, #0x1 /* Use arg (r0) to set which counter to disable */MOV r0, r1, LSL r0MCR p15, 0, r0, c9, c14, 1 /* Write INTENS Register */BX lr.global disable_pmn_irq @ export this function for the linker/* Disables interrupt generation on overflow of PMN{x} *//* void disable_pmn_irq(uint32_t counter) *//* counter = r0 = The counter to disable the interrupt for (e.g. 0 for PMN0, 1 for PMN1) */
disable_pmn_irq:MOV r1, #0x1 /* Use arg (r0) to set which counter to disable */MOV r0, r1, LSL r0MCR p15, 0, r0, c9, c14, 2 /* Write INTENC Register */BX lr/* --------------------------------------------------------------- *//* Reset Functions *//* --------------------------------------------------------------- */.global reset_pmn @ export this function for the linker/* Resets the programmable counters *//* void reset_pmn(void) */
reset_pmn:MRC p15, 0, r0, c9, c12, 0 /* Read PMNC */ORR r0, r0, #0x02 /* Set P bit (Event Counter Reset) */MCR p15, 0, r0, c9, c12, 0 /* Write PMNC */BX lr.global reset_ccnt @ export this function for the linker/* Resets the CCNT *//* void reset_ccnt(void) */
reset_ccnt:MRC p15, 0, r0, c9, c12, 0 /* Read PMNC */ORR r0, r0, #0x04 /* Set C bit (Event Counter Reset) */MCR p15, 0, r0, c9, c12, 0 /* Write PMNC */BX lr.end @end of code, this line is optional.
/* ------------------------------------------------------------ */
/* End of v7_pmu.s */
/* ------------------------------------------------------------ */
pmu_v7.h
// ------------------------------------------------------------
// PMU for Cortex-A/R (v7-A/R)
// ------------------------------------------------------------#ifndef _V7_PMU_H
#define _V7_PMU_H// Returns the number of progammable counters
unsigned int getPMN(void);// Sets the event for a programmable counter to record
// counter = r0 = Which counter to program (e.g. 0 for PMN0, 1 for PMN1)
// event = r1 = The event code (from appropiate TRM or ARM Architecture Reference Manual)
void pmn_config(unsigned int counter, unsigned int event);// Enables/disables the divider (1/64) on CCNT
// divider = r0 = If 0 disable divider, else enable dvider
void ccnt_divider(int divider);//
// Enables and disables
//// Global PMU enable
// On ARM11 this enables the PMU, and the counters start immediately
// On Cortex this enables the PMU, there are individual enables for the counters
void enable_pmu(void);// Global PMU disable
// On Cortex, this overrides the enable state of the individual counters
void disable_pmu(void);// Enable the CCNT
void enable_ccnt(void);// Disable the CCNT
void disable_ccnt(void);// Enable PMN{n}
// counter = The counter to enable (e.g. 0 for PMN0, 1 for PMN1)
void enable_pmn(unsigned int counter);// Enable PMN{n}
// counter = The counter to enable (e.g. 0 for PMN0, 1 for PMN1)
void disable_pmn(unsigned int counter);//
// Read counter values
//// Returns the value of CCNT
unsigned int read_ccnt(void);// Returns the value of PMN{n}
// counter = The counter to read (e.g. 0 for PMN0, 1 for PMN1)
unsigned int read_pmn(unsigned int counter);//
// Overflow and interrupts
//// Returns the value of the overflow flags
unsigned int read_flags(void);// Writes the overflow flags
void write_flags(unsigned int flags);// Enables interrupt generation on overflow of the CCNT
void enable_ccnt_irq(void);// Disables interrupt generation on overflow of the CCNT
void disable_ccnt_irq(void);// Enables interrupt generation on overflow of PMN{x}
// counter = The counter to enable the interrupt for (e.g. 0 for PMN0, 1 for PMN1)
void enable_pmn_irq(unsigned int counter);// Disables interrupt generation on overflow of PMN{x}
// counter = r0 = The counter to disable the interrupt for (e.g. 0 for PMN0, 1 for PMN1)
void disable_pmn_irq(unsigned int counter);//
// Counter reset functions
//// Resets the programmable counters
void reset_pmn(void);// Resets the CCNT
void reset_ccnt(void);//
// Software Increment// Writes to software increment register
// counter = The counter to increment (e.g. 0 for PMN0, 1 for PMN1)
void pmu_software_increment(unsigned int counter);//
// User mode access
//// Enables User mode access to the PMU (must be called in a priviledged mode)
void enable_pmu_user_access(void);// Disables User mode access to the PMU (must be called in a priviledged mode)
void disable_pmu_user_access(void);#endif
// ------------------------------------------------------------
// End of v7_pmu.h
// ------------------------------------------------------------
使用demo
#include "v7_pmu.h"
#include <stdio.h>
#include <time.h>
#include <stdlib.h>int random_range(int max);
void pmu_start(unsigned int event0,unsigned int event1,unsigned int event2,unsigned int event3,unsigned int event4,unsigned int event5);
void pmu_stop(void);int main ( int argc, char *argv[] ){
int matrix_size;
int i,j,k,z;// To access CPU time
clock_t start, end;
double cpu_time_used;if ( argc != 2 ) {fputs ( "usage: $prog n", stderr );exit ( EXIT_FAILURE );}matrix_size = (int)strtol(argv[1],NULL,10);printf("square matrix size = %dn", matrix_size);/*Using time function output for seed value*/unsigned int seed = (unsigned int)time(NULL);srand(seed);/* Initialize square matrix with command line input value */
int a[matrix_size][matrix_size], b[matrix_size][matrix_size], c[matrix_size][matrix_size];/* Intialize both A[][] and B[][] with random values between 0-5 and set C[][] to zero*/for(i=0;i<matrix_size;i++){for(j=0;j<matrix_size;j++){a[i][j]=random_range(6);b[i][j]=random_range(6);c[i][j]=0;}}/* Multiply A[][] and B[][] and store into C[][]*/start = clock();for(z=0;z<7;z++){if(z==0)pmu_start(0x01,0x02,0x03,0x04,0x05,0x06);if(z==1)pmu_start(0x07,0x08,0x09,0x0A,0x0B,0x0C);if(z==2)pmu_start(0x0D,0x0E,0x0F,0x10,0x11,0x12);if(z==3)pmu_start(0x50,0x51,0x60,0x61,0x62,0x63);if(z==4)pmu_start(0x64,0x65,0x66,0x67,0x68,0x6E);if(z==5)pmu_start(0x70,0x71,0x72,0x73,0x74,0x81);if(z==6)pmu_start(0x82,0x83,0x84,0x85,0x86,0x8A);for(i=0; i<matrix_size; i++){for(j=0; j<matrix_size; j++){for(k=0; k<matrix_size; k++){/* c[0][0]=a[0][0]*b[0][0]+a[0][1]*b[1][0]+a[0][2]*b[2][0]; */c[i][j] += a[i][k]*b[k][j];}}}pmu_stop();}end = clock();cpu_time_used = (end - start) / ((double) CLOCKS_PER_SEC);printf("CPU time used = %.4lfn",cpu_time_used);
printf("square matrix size = %dn", matrix_size);return 0;
}int random_range(int max){return ((rand()%(max-1)) +1);
}void pmu_start(unsigned int event0,unsigned int event1,unsigned int event2,unsigned int event3,unsigned int event4,unsigned int event5){enable_pmu(); // Enable the PMUreset_ccnt(); // Reset the CCNT (cycle counter)reset_pmn(); // Reset the configurable counterspmn_config(0, event0); // Configure counter 0 to count event code 0x03pmn_config(1, event1); // Configure counter 1 to count event code 0x03pmn_config(2, event2); // Configure counter 2 to count event code 0x03pmn_config(3, event3); // Configure counter 3 to count event code 0x03pmn_config(4, event4); // Configure counter 4 to count event code 0x03pmn_config(5, event5); // Configure counter 5 to count event code 0x03enable_ccnt(); // Enable CCNTenable_pmn(0); // Enable counterenable_pmn(1); // Enable counterenable_pmn(2); // Enable counterenable_pmn(3); // Enable counterenable_pmn(4); // Enable counterenable_pmn(5); // Enable counterprintf("CountEvent0=0x%x,CountEvent1=0x%x,CountEvent2=0x%x,CountEvent3=0x%x,CountEvent4=0x%x,CountEvent5=0x%xn", event0,event1,event2,event3,event4,event5);
}void pmu_stop(void){unsigned int cycle_count, overflow, counter0, counter1, counter2, counter3, counter4, counter5;disable_ccnt(); // Stop CCNTdisable_pmn(0); // Stop counter 0disable_pmn(1); // Stop counter 1disable_pmn(2); // Stop counter 2disable_pmn(3); // Stop counter 3disable_pmn(4); // Stop counter 4disable_pmn(5); // Stop counter 5counter0 = read_pmn(0); // Read counter 0counter1 = read_pmn(1); // Read counter 1counter2 = read_pmn(2); // Read counter 2counter3 = read_pmn(3); // Read counter 3counter4 = read_pmn(4); // Read counter 4counter5 = read_pmn(5); // Read counter 5cycle_count = read_ccnt(); // Read CCNToverflow=read_flags(); //Check for overflow flagprintf("Counter0=%d,Counter1=%d,Counter2=%d,Counter3=%d,Counter4=%d,Counter5=%dn", counter0, counter1,counter2,counter3,counter4,counter5);printf("Overflow flag: = %d, Cycle Count: = %d nn", overflow,cycle_count);
}
ARMv8 支持的事件列表
ARMv7 支持的事件
参考文档
https://blog.csdn.net/qq1798831241/article/details/108188200: ARM PMU详解及使用
https://blog.csdn.net/chichi123137/article/details/80145914: ARM CPU 之 PMU部件(性能监控单元)
arm文档:Using the PMU and the Event Counters in DS-5
https://zhuanlan.zhihu.com/p/276680818: perf_event框架之:ARM PMU硬件
https://github.com/afrojer/armperf/blob/master/v7_pmu.S :PMU PERF
http://t.csdnimg.cn/g5a8k