Nsight Compute 是怎么计算Roofline的呢

Nsight Compute 是怎么计算Roofline的呢

  • 1.参考链接
  • 2.小结
  • 3.Nsight Compute 是怎么计算Roofline的呢
  • 4.生成测试程序
  • 5.测试规模为8192时的性能
  • 6.计算Roofline
  • 7.指标解释
  • 8.测试规模为1024时的性能
  • 9.测试规模为128时的性能
  • 10.RTX 3060基础能力测试
  • 11.sm__inst_executed.avg.pct_of_peak_sustained_active
  • 12.全面分析

用Roofline模型去分析pytorch和Triton算子 发现Nsight Compute中的Peak Work跟峰值算力对不上.这里进一步分析

1.参考链接

  • Metrics smsp__sass_thread_inst_executed_op
  • sm__sass_thread_inst_executed_op_ffma_
  • 使用Nsight Compute构建roofline model
  • NsightComputeCli
  • Roofline_model
  • 用Roofline模型去分析pytorch和Triton算子
  • H800基础能力测试
  • nvtx-include

2.小结

  • 理论算力: 35841.852=13.26 TFLOPS
  • 硬件的理论算力密度: 36.87
  • 该测例pytorch测出的实际算力:4147.84 GFOPS【是峰值算力的:31.2%】; 测出的带宽: 6.07GB/s .(当黑盒处理,统计周期里包括了计算和IO,所以并不准确)
  • 该测例的算力密度:682.66 (>36.87) 是计算瓶颈
  • 按Nsight Compute的算法 PeakWork(FFMA): 9.46 TFLOPS (跟具体的规模无关)
  • 按Nsight Compute的算法 PeakTraffic: 349.92 GB/s
  • 按Nsight Compute的算法 AchievedWork: 6.02 TFLOPS 是峰值算力的: 63%
  • 按Nsight Compute的算法 AchievedTraffic: 42.82 Gbyte/second
  • sgemm single kernels: 9936.39 GFLOPS
  • sgemm N=10 without streams: 10260.6 GFLOPS
  • sgemm N=10 with stream: 10339.9 GFLOPS
  • sgemm N=10 batched: 8482.82 GFLOPS
  • 根据该内核的占用情况,理论上每个调度程序可以发出 4.00 个 warp,低于硬件最大值 12。该内核的理论占用率 (33.3%) 受到所需寄存器数量的限制。

3.Nsight Compute 是怎么计算Roofline的呢

公式:C:\Program Files\NVIDIA Corporation\Nsight Compute 2024.1.1\sections\SpeedOfLight_HierarchicalSingleRooflineChart.section
内容如下(实际比这个多):

MetricDefinitions {MetricDefinitions {Name: "derived__sm__sass_thread_inst_executed_op_ffma_pred_on_x2"Expression: "sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained * 2"}MetricDefinitions {Name: "derived__smsp__sass_thread_inst_executed_op_ffma_pred_on_x2"Expression: "smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed * 2"}
}Rooflines {PeakWork {ValueCyclesPerSecondExpression {ValuePerCycleMetrics {Label: "Theoretical Predicated-On FFMA Operations"Name: "derived__sm__sass_thread_inst_executed_op_ffma_pred_on_x2"}CyclesPerSecondMetric {Label: "SM Frequency"Name: "sm__cycles_elapsed.avg.per_second"}}}PeakTraffic {ValueCyclesPerSecondExpression {ValuePerCycleMetrics {Label: "Theoretical DRAM Bytes Accessible"Name: "dram__bytes.sum.peak_sustained"}CyclesPerSecondMetric {Label: "DRAM Frequency"Name: "dram__cycles_elapsed.avg.per_second"}}}Options {Label: "DRAM Roofline"}AchievedValues {AchievedWork {ValueCyclesPerSecondExpression {ValuePerCycleMetrics {Label: "Predicated-On FFMA Operations Per Cycle"Name: "derived__smsp__sass_thread_inst_executed_op_ffma_pred_on_x2"}CyclesPerSecondMetric {Label: "SM Frequency"Name: "smsp__cycles_elapsed.avg.per_second"}}}AchievedTraffic {Metric {Label: "DRAM Bandwidth"Name: "dram__bytes.sum.per_second"Filter {MaxArch: CC_70}}}}
}

4.生成测试程序

tee Theoretical_FLOPS.py <<-'EOF'
import sys
import torch
import torch.nn as nn
import math
import torch
import torch.nn as nn
from fvcore.nn import FlopCountAnalysis, ActivationCountAnalysis
import numpy as np
import os# 定义一个测试模型
class SimpleModel(nn.Module):def __init__(self,input_features,output_features):super(SimpleModel, self).__init__()self.fc1 = torch.nn.utils.skip_init(nn.Linear,input_features,output_features,bias=False)def forward(self, x):x = self.fc1(x)return xinput_features = int(sys.argv[1])
output_features = input_features
batch_size = input_featuresmodel = SimpleModel(input_features,output_features).cuda()
input_data = torch.ones(batch_size, input_features).cuda()test_count=10# 计算 FLOPs 和内存访问量
flops = FlopCountAnalysis(model, input_data).total()*test_count
activations = ActivationCountAnalysis(model, input_data).total() + input_data.numel()
print("activations:",activations)
# 计算参数个数
params = sum(p.numel() for p in model.parameters())# 内存访问量假定为 activations 和params 乘以 4 字节(假设 activations 和 params 是 float32 类型)
activation_memory_access = activations * 4
params_memory_access = params * 4
memory_access = activation_memory_access + params_memory_access
memory_access=memory_access*test_count# warmup
output = model(input_data) 
torch.cuda.synchronize()start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)
start_event.record()
for i in range(test_count):output = model(input_data)
end_event.record()
torch.cuda.synchronize()
total_cuda_time = start_event.elapsed_time(end_event) / 1000  # 转换为秒    # FLOPs 转换至 GFLOPs
flops_measured_glops = flops / 1e9
# 内存带宽测量
memory_access_gb=memory_access/ 1e9
bandwidth_measured = memory_access_gb / total_cuda_time  # 单位:GB/s    arithmetic_intensity_measured=flops_measured_glops/memory_access_gb #GFLOPs/GB(算法的静态属性
flops_measured = arithmetic_intensity_measured*bandwidth_measured# RTX 3060 GPU 的峰值性能和带宽
peak_performance = 13.275136  * 1e3  # 单位:GFLOPs
memory_bandwidth = 360.0             # 单位:GB/sprint("arithmetic_intensity:",peak_performance/memory_bandwidth)
print("flops_measured:",flops_measured,flops_measured/peak_performance)
print("bandwidth_measured:",bandwidth_measured)
print("total_cuda_time:",total_cuda_time)
print("arithmetic_intensity_measured:",arithmetic_intensity_measured)# ncu从这些开始收集性能
import nvtx
with nvtx.annotate("kernel_prof", color="blue"):output = model(input_data)
torch.cuda.synchronize()
EOF

5.测试规模为8192时的性能

/usr/local/cuda/bin/ncu --nvtx --nvtx-include "kernel_prof/"  --metrics sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained,smsp__cycles_elapsed.avg.per_second,smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed,sm__cycles_elapsed.avg.per_second,dram__bytes.sum.peak_sustained,dram__bytes.sum.per_second,dram__cycles_elapsed.avg.per_second python Theoretical_FLOPS.py  8192

输出:

activations: 134217728
arithmetic_intensity: 36.87537777777778
flops_measured: 4147.841730822858 0.3124519199519205
bandwidth_measured: 6.075940035385045
total_cuda_time: 1.325402099609375
arithmetic_intensity_measured: 682.6666666666667
==PROF== Profiling "ampere_sgemm_128x128_tn" - 0: 0%....50%.
...100% - 3 passes
==PROF== Disconnected from process 266138
[266138] python3.10@127.0.0.1ampere_sgemm_128x128_tn (64, 64, 1)x(256, 1, 1), Context 1, Stream 7, Device 0, CC 8.6NVTX Push/Pop Stack for Thread 266138:<default domain><0,kernel_prof>RGB: 0xffREGISTERED: kernel_profSection: Command line profiler metrics--------------------------------------------------------------------- ------------- ------------Metric Name                                                             Metric Unit Metric Value--------------------------------------------------------------------- ------------- ------------dram__bytes.sum.peak_sustained                                           byte/cycle           48dram__bytes.sum.per_second                                             Gbyte/second        42.84dram__cycles_elapsed.avg.per_second                                   cycle/nsecond         7.29sm__cycles_elapsed.avg.per_second                                     cycle/nsecond         1.32sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained         inst/cycle        3,584smsp__cycles_elapsed.avg.per_second                                   cycle/nsecond         1.32smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed    inst/cycle     2,282.58--------------------------------------------------------------------- ------------- ------------

6.计算Roofline

# 峰值性能与带宽
PeakWork=sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained * 2 * sm__cycles_elapsed.avg.per_second = 3584 *2 * 1.32 inst/nsecond = 9.46 TFLOPS
PeakTraffic=dram__bytes.sum.peak_sustained * dram__cycles_elapsed.avg.per_second =  48 * 7.29 byte/nsecond = 48 * 7.29 GB/s = 349.92 GB/s# 实测性能与带宽
AchievedWork=smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed *2 * smsp__cycles_elapsed.avg.per_second = 2282.58 * 2 * 1.32 inst/nsecond = 6.02 TFLOPS
AchievedTraffic=dram__bytes.sum.per_second = 42.82 Gbyte/second

7.指标解释

与 sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained 相比,smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed
最主要的区别在于衡量的是每个时钟周期内的执行事件,而不是持续峰值。这两个指标从不同的角度描述了 GPU 在执行特定类型操作(如 FMA)时的性能:可以逐一解释如下:
smsp: 表示该指标是在 Streaming Multiprocessor(流处理器)层面上测量的。在 NVIDIA 架构中,SM 或 SMSM(流多处理器)是负责处理计算任务的主要组件。
sass: 代表 Shader Assembly,是 NVIDIA GPU 的底层指令集,指的是直接在硬件上执行的指令。
thread_inst_executed: 这表示执行在 GPU 线程中的指令的数量。
op_ffma: 表示融合乘加(Fused Multiply-Add)操作,这是一种同时执行乘法和加法的算数操作,对于浮点运算非常常见和重要。
pred_on: 这意味着这些统计数据仅包括那些在谓词(条件)为真时执行的指令。
sum: 指在一定的采集时间窗或一系列样本中,这一指标的累积总和。
per_cycle_elapsed: 这表示指标是以每个 GPU 时钟周期为单位来计测的。它提供了在每个时钟周期内执行的 FMA 操作的平均次数,通常用于衡量单位时间内的执行效率。
peak_sustained: 指示这是在观测期间持续达到的峰值性能的度量。频率 vs 峰值: per_cycle_elapsed 类似于平均效率(每个时钟周期中执行的平均次数),而 peak_sustained 侧重于在执行高峰期间达到的最高绩效(累积的最大值)。
实时效率: per_cycle_elapsed 更关注瞬时的执行效率,它可以帮助开发者了解在每个具体的执行周期内,硬件是如何响应的。总的来说,smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed 更适合用来评估和优化 GPU 代码在单个时钟周期内的效率,而 sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained 更适合于评估在密集运算期间GPU的最大处理能力。两者结合使用可以提供一个更全面的 GPU 性能分析。

8.测试规模为1024时的性能

/usr/local/cuda/bin/ncu --nvtx --nvtx-include "kernel_prof/"  --metrics sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained,smsp__cycles_elapsed.avg.per_second,smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed,sm__cycles_elapsed.avg.per_second,dram__bytes.sum.peak_sustained,dram__bytes.sum.per_second,dram__cycles_elapsed.avg.per_second python Theoretical_FLOPS.py  128

输出

activations: 2097152
arithmetic_intensity: 36.87537777777778
flops_measured: 1470.6536158405618 0.1107825649274374
bandwidth_measured: 17.23422206063158
total_cuda_time: 0.007301119804382325
arithmetic_intensity_measured: 85.33333333333334
==PROF== Profiling "ampere_sgemm_128x64_tn" - 0: 0%....50%....100% - 3 passes
==PROF== Disconnected from process 267209
[267209] python3.10@127.0.0.1ampere_sgemm_128x64_tn (8, 16, 3)x(128, 1, 1), Context 1, Stream 7, Device 0, CC 8.6NVTX Push/Pop Stack for Thread 267209:<default domain><0,kernel_prof>RGB: 0xffREGISTERED: kernel_profSection: Command line profiler metrics--------------------------------------------------------------------- ------------- ------------Metric Name                                                             Metric Unit Metric Value--------------------------------------------------------------------- ------------- ------------dram__bytes.sum.peak_sustained                                           byte/cycle           48dram__bytes.sum.per_second                                             Gbyte/second       100.81dram__cycles_elapsed.avg.per_second                                   cycle/nsecond         7.29sm__cycles_elapsed.avg.per_second                                     cycle/nsecond         1.32sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained         inst/cycle        3,584smsp__cycles_elapsed.avg.per_second                                   cycle/nsecond         1.32smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed    inst/cycle     2,057.05--------------------------------------------------------------------- ------------- ------------

9.测试规模为128时的性能

/usr/local/cuda/bin/ncu --nvtx --nvtx-include "kernel_prof/"  --metrics sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained,smsp__cycles_elapsed.avg.per_second,smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed,sm__cycles_elapsed.avg.per_second,dram__bytes.sum.peak_sustained,dram__bytes.sum.per_second,dram__cycles_elapsed.avg.per_second python Theoretical_FLOPS.py  128

输出

activations: 32768
arithmetic_intensity: 36.87537777777778
flops_measured: 3.9713012060185076 0.0002991533349276804
bandwidth_measured: 0.37230948806423503
total_cuda_time: 0.005280767917633057
arithmetic_intensity_measured: 10.666666666666668
==PROF== Profiling "ampere_sgemm_32x32_sliced1x4_tn" - 0: 0%....50%....100% - 3 passes
==PROF== Disconnected from process 267388
[267388] python3.10@127.0.0.1ampere_sgemm_32x32_sliced1x4_tn (4, 4, 1)x(128, 1, 1), Context 1, Stream 7, Device 0, CC 8.6NVTX Push/Pop Stack for Thread 267388:<default domain><0,kernel_prof>RGB: 0xffREGISTERED: kernel_profSection: Command line profiler metrics--------------------------------------------------------------------- ------------- ------------Metric Name                                                             Metric Unit Metric Value--------------------------------------------------------------------- ------------- ------------dram__bytes.sum.peak_sustained                                           byte/cycle           48dram__bytes.sum.per_second                                             Gbyte/second        17.21dram__cycles_elapsed.avg.per_second                                   cycle/nsecond         7.24sm__cycles_elapsed.avg.per_second                                     cycle/nsecond         1.31sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained         inst/cycle        3,584smsp__cycles_elapsed.avg.per_second                                   cycle/nsecond         1.31smsp__sass_thread_inst_executed_op_ffma_pred_on.sum.per_cycle_elapsed    inst/cycle       185.00--------------------------------------------------------------------- ------------- ------------

通过不同规模的测试发现,dram__bytes.sum.peak_sustained和sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained不随规模变化

10.RTX 3060基础能力测试

git clone https://www.github.com/nvidia/cuda-samples
cd cuda-samples/Samples/1_Utilities/deviceQuery
make clean && make
./deviceQuery
cd ../bandwidthTest/
make clean && make
./bandwidthTest
cd ../../4_CUDA_Libraries/batchCUBLAS/
make clean && make
./batchCUBLAS -m8192 -n8192 -k8192 --device=0
 CUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "NVIDIA GeForce RTX 3060"CUDA Driver Version / Runtime Version          12.2 / 12.1CUDA Capability Major/Minor version number:    8.6Total amount of global memory:                 12044 MBytes (12629377024 bytes)(028) Multiprocessors, (128) CUDA Cores/MP:    3584 CUDA CoresGPU Max Clock rate:                            1852 MHz (1.85 GHz)Memory Clock rate:                             7501 MhzMemory Bus Width:                              192-bitL2 Cache Size:                                 2359296 bytesMaximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layersMaximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layersTotal amount of constant memory:               65536 bytesTotal amount of shared memory per block:       49152 bytesTotal shared memory per multiprocessor:        102400 bytesTotal number of registers available per block: 65536Warp size:                                     32Maximum number of threads per multiprocessor:  1536Maximum number of threads per block:           1024Max dimension size of a thread block (x,y,z): (1024, 1024, 64)Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)Maximum memory pitch:                          2147483647 bytesTexture alignment:                             512 bytesConcurrent copy and kernel execution:          Yes with 2 copy engine(s)Run time limit on kernels:                     YesIntegrated GPU sharing Host Memory:            NoSupport host page-locked memory mapping:       YesAlignment requirement for Surfaces:            YesDevice has ECC support:                        DisabledDevice supports Unified Addressing (UVA):      YesDevice supports Managed Memory:                YesDevice supports Compute Preemption:            YesSupports Cooperative Kernel Launch:            YesSupports MultiDevice Co-op Kernel Launch:      YesDevice PCI Domain ID / Bus ID / location ID:   0 / 3 / 0Compute Mode:< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.1, NumDevs = 1
Result = PASSRunning on...Device 0: NVIDIA GeForce RTX 3060Quick ModeHost to Device Bandwidth, 1 Device(s)PINNED Memory TransfersTransfer Size (Bytes)        Bandwidth(GB/s)32000000                     12.0Device to Host Bandwidth, 1 Device(s)PINNED Memory TransfersTransfer Size (Bytes)        Bandwidth(GB/s)32000000                     13.2Device to Device Bandwidth, 1 Device(s)PINNED Memory TransfersTransfer Size (Bytes)        Bandwidth(GB/s)32000000                     326.3Result = PASSgpuDeviceInit() CUDA Device [0]: "Ampere==== Running single kernels ====Testing sgemm
#### args: ta=0 tb=0 m=8192 n=8192 k=8192  alpha = (0xbf800000, -1) beta= (0x40000000, 2)
#### args: lda=8192 ldb=8192 ldc=8192
^^^^ elapsed = 0.11065507 sec  GFLOPS=9936.39
@@@@ sgemm test OK==== Running N=10 without streams ====
Testing sgemm
#### args: ta=0 tb=0 m=8192 n=8192 k=8192  alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=8192 ldb=8192 ldc=8192
^^^^ elapsed = 1.07158208 sec  GFLOPS=10260.6
@@@@ sgemm test OK==== Running N=10 with streams ====
Testing sgemm
#### args: ta=0 tb=0 m=8192 n=8192 k=8192  alpha = (0xbf800000, -1) beta= (0x00000000, 0)
#### args: lda=8192 ldb=8192 ldc=8192
^^^^ elapsed = 1.06336808 sec  GFLOPS=10339.9
@@@@ sgemm test OK==== Running N=10 batched ====
Testing sgemm
#### args: ta=0 tb=0 m=8192 n=8192 k=8192  alpha = (0x40000000, 2) beta= (0x40000000, 2)
#### args: lda=8192 ldb=8192 ldc=8192
^^^^ elapsed = 1.29616284 sec  GFLOPS=8482.82
@@@@ sgemm test OK

FP32 理论算力

(028) Multiprocessors, (128) CUDA Cores/MP:    3584 CUDA Cores
GPU Max Clock rate:                            1852 MHz (1.85 GHz)
3584*1.85*2=13.26TFLOPS

11.sm__inst_executed.avg.pct_of_peak_sustained_active

/usr/local/cuda/bin/ncu --nvtx --nvtx-include "kernel_prof/"  --metrics sm__inst_executed.avg.pct_of_peak_sustained_active python Theoretical_FLOPS.py  8192

输出

  ampere_sgemm_128x128_tn (64, 64, 1)x(256, 1, 1), Context 1, Stream 7, Device 0, CC 8.6NVTX Push/Pop Stack for Thread 270100:<default domain><0,kernel_prof>RGB: 0xffREGISTERED: kernel_profSection: Command line profiler metrics-------------------------------------------------- ----------- ------------Metric Name                                        Metric Unit Metric Value-------------------------------------------------- ----------- ------------sm__inst_executed.avg.pct_of_peak_sustained_active           %        73.47-------------------------------------------------- ----------- ------------

12.全面分析

/usr/local/cuda/bin/ncu  --nvtx --nvtx-include "kernel_prof/"   -f --set full --export roofline_report python Theoretical_FLOPS.py  8192

根据该内核的占用情况,理论上每个调度程序可以发出 4.00 个 warp,低于硬件最大值 12。该内核的理论占用率 (33.3%) 受到所需寄存器数量的限制。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/862775.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

IPv6测试指标有哪些?怎么看网站是否完成IPv6升级改造?

IPv6是互联网第六代协议&#xff0c;以其近乎无限的地址资源为未来互联网的发展提供了广阔空间和无限可能&#xff0c;为物联网、大数据、人工智能等新基建的蓬勃发展提供了坚实的网络支撑。我国高度重视IPv6的发展建设&#xff0c;自2017年《推进互联网协议第六版&#xff08;…

高性价比蓝牙耳机有哪些?2024超高性价比蓝牙耳机推荐

在2024移动互联网高速发展的时代&#xff0c;蓝牙耳机已成为我们生活中不可或缺的一部分。走在街头&#xff0c;低头看手机&#xff0c;滑动屏幕选歌&#xff0c;耳边传来清晰的旋律&#xff0c;这一幕已经成为现代生活的标配。但面对市场上琳琅满目的蓝牙耳机品牌和型号&#…

数据库同步最简单的方法

数据库同步到底有咩有简单的方法&#xff0c;有肯定是有的&#xff0c;就看你有咩有缘&#xff0c;看到这篇文章&#xff0c;你就是有缘人。众所周知&#xff0c;数据库同步向来都不是一件简单的事情&#xff0c;它很繁琐&#xff0c;很费精力&#xff0c;很考验经验&#xff0…

Kali Linux源

中科大 deb http://mirrors.ustc.edu.cn/kali kali-rolling main non-free contrib deb-src http://mirrors.ustc.edu.cn/kali kali-rolling main non-free contrib阿里云 deb http://mirrors.aliyun.com/kali kali-rolling main non-free contrib deb-src http://mirrors.…

risc-v 怎么使用内存呢?

内存地址对齐 一般写法 #define ALIGN_4_BYTES 4 #define ALIGN_4_MASK (ALIGN_4_BYTES - 1) //4字节地址对齐 static inline uintptr_t align_4_bytes(uintptr_t address) {return (address ALIGN_4_MASK) & ~ALIGN_4_MASK; }//定义页大小是4k&#xff0c;2的12次方是409…

【LeetCode】每日一题:K个一组反转链表

解题思路 其实更像一个模拟题&#xff0c;但是有两个地方的边界一直没有处理好导致卡了很久。 AC代码 # Definition for singly-linked list. # class ListNode: # def __init__(self, val0, nextNone): # self.val val # self.next next class Solut…

力扣 刷题 使用双指针进行数组去重分析

目录 双指针 一、26.删除有序数组中的重复项 题目 题解 二、80. 删除有序数组中的重复项 II 题目 题解 三、27. 移除元素 题目 题解 双指针 我们这里所说的双指针实际上并不是真正的指针&#xff0c;它只是两个变量&#xff0c;用于标识数组的索引等&#xff0c;因其…

vue3封装表格嵌套表单问题汇总

1.插槽嵌套多层数据ui组件怎么使用 思路&#xff1a;插槽具名【区分】后暴露传递&#xff0c;这个为神魔要区分&#xff0c;因为封装组件表格列表项也有插槽 步骤一&#xff1a;表单插槽暴露 <ElFormclass"form-search":model"formParams"ref"form…

java基于ssm+jsp 多人命题系统

1管理员功能模块 管理员登录&#xff0c;管理员通过输入用户、密码等信息进行系统登录&#xff0c;如图1所示。 图1管理员登录界面图 管理员对个人中心进行操作填写原密码、新密码、确认密码并进行添加、删除、修改以及查看&#xff0c;如图2所示。 图2个人信息功能界面图 学…

vue项目连接多个服务后台地址

vue项目连接多个服务器 场景描述&#xff1a;由于公司项目需要基于若依框架和starlingX后台开发&#xff0c;所以&#xff0c;项目至少要连接两个后台地址 在 vue.config.js 文件里 添加一个地址 devServer: {host: "0.0.0.0",port: port,open: true,proxy: {[pro…

台式电脑没有音响?你还可以用这 7 个软件把手机变成音响

台式电脑没有音响&#xff1f;你还可以用这 7 个软件把手机变成音响 怎么让手机当电脑音响 怎么让电脑连接手机的麦克风 手机怎么变电脑麦克风 1.AudioRelay 官网audiorelay加点net提供 Windows 和 Android 应用程序下载 再打开作为 Client 的 Android 端&#xff0c;它会自…

遥感数据并行运算(satellite remote sensing data parallell processing)

文章内容仅用于自己知识学习和分享&#xff0c;如有侵权&#xff0c;还请联系并删除 &#xff1a;&#xff09; 之前不太会用&#xff0c;单纯想记录一下&#xff0c;后面或许还会用到 1. 教程 [1] Pleasingly Parallel Programming: link 1.1 处理器&#xff0c;核和线程 …

申请免费6个月SSL证书方式和证书特点槽点

当前&#xff0c;HTTPS访问已成为网站标配。随着免费证书平台的不断涌现&#xff0c;Lets Encrypt尤为瞩目&#xff0c;其提供的泛域名和多域名证书申请功能&#xff0c;显著降低了站长和企业的经济负担。从一开始&#xff0c;来此加密就支持通过Lets Encrypt申请免费的域名SSL…

力扣:203. 移除链表元素(Java)

目录 题目描述&#xff1a;示例 1&#xff1a;示例 2&#xff1a;代码实现&#xff1a; 题目描述&#xff1a; 给你一个链表的头节点 head 和一个整数 val &#xff0c;请你删除链表中所有满足 Node.val val 的节点&#xff0c;并返回 新的头节点 。 示例 1&#xff1a; 输入…

ONLYOFFICE 8.1版本桌面编辑器深度体验:创新功能与卓越性能的结合

ONLYOFFICE 8.1版本桌面编辑器深度体验&#xff1a;创新功能与卓越性能的结合 随着数字化办公的日益普及&#xff0c;一款高效、功能丰富的办公软件成为了职场人士的必备工具。ONLYOFFICE团队一直致力于为用户提供全面而先进的办公解决方案。最新推出的ONLYOFFICE 8.1版本桌面编…

Centos安装redis(附:图形化管理工具)

第一步&#xff1a;下载redis wget http://download.redis.io/releases/redis-6.2.7.tar.gz 第二步&#xff1a;解压 tar zxvf redis-6.2.7.tar.gz 第三步&#xff1a;安装依赖环境 yum -y install gcc-c第四步&#xff1a;安装依赖环境 make install第五步&#xff1a;修…

高频科技亮相SEMl-e2024第六届深圳国际半导体展,以超纯工艺推动行业发展

6月26-28日,SEMl-e2024第六届深圳国际半导体展在深圳国际会展中心(宝安新馆)隆重举办。本次展会以【“芯”中有“算”智享未来】为主题,汇聚800多家展商,集中展示了集成电路、电子元器件、第三代半导体及产业链材料和设备为一体的半导体产业链,搭建了供需精准对接、探索行业新发…

时常在面试中被问到的多线程问题:上篇

文章目录 进程和线程有什么区别&#xff1f;定义内存空间资源开销通信方式独立性适用场景 多线程有什么优缺点&#xff1f;优点缺点 线程的创建方式有哪些&#xff1f;1. 继承Thread类2. 实现Runnable接口3. 使用匿名内部类4. 使用Lambda表达式&#xff08;Java 8及以上&#x…

ElementUI笔记

Element&#xff0c;一套为开发者、设计师和产品经理准备的基于 Vue 2.0 的桌面端组件库。 首先安装 ElementUI。 需要再HTML的文档终端里输入 npm i element-ui -S 在 main.js 中写入以下内容&#xff1a; import ElementUI from element-ui ; import element-ui/lib…

超全汇总,性能测试常用指标大全

前言 两种性能指标 业务指标&#xff1b; 技术指标&#xff1b; 通常我们会从两个层面定义性能场景的需求指标&#xff0c;它们有映射关系&#xff0c;技术指标不能脱离业务指标 1、并发 狭义&#xff1a; 指同一个时间点执行相同的操作&#xff08;如&#xff1a;秒杀&am…