目录
1. 示例集合概述
2. Interface 接口
2.1 Aggregation_Disaggregation 聚合与解聚
2.1.1 aggregation_of_m_axi_ports
2.1.2 aggregation_of_nested_structs
2.1.3 aggregation_of_struct
2.1.4 auto_disaggregation_of_struct
2.1.5 disaggregation_of_axis_port
2.1.6 struct_ii_issue
2.2 Memory
2.2.1 ecc_flags
2.2.2 manual_burst
2.2.3 max_widen_port_width
2.2.4 memory_bottleneck
2.2.5 ram_uram
2.2.6 rom_lookup_table_math
2.2.7 using_axi_master
2.3 Register
2.3.1 using_axi_lite
2.3.2 using_axi_lite_with_user_defined_offset
2.4 Streaming
2.4.1 axi_stream_to_master
2.4.2 using_array_of_streams
2.4.3 using_axi_stream_no_side_channel_data
2.4.4 using_axi_stream_with_side_channel_data
2.4.5 using_axi_stream_with_struct
2.4.6 using_axis_array_stream_no_side_channel_data
3. Pipelining
3.1 Functions
3.1.1 function_instantiate
3.1.2 hier_func
3.2 Loops
3.2.1 imperfect_loop
3.2.2 perfect_loop
3.3.3 pipelined_loop
3.3.4 using_free_running_pipeline
4. Task_Level_Parallelism
4.1 Control_driven
4.1.1 Bypassing
4.1.1.1 input_bypass
4.1.1.2 middle_bypass
4.1.1.3 output_bypass
4.1.2 Channels
4.1.2.1 Vitis
4.1.2.2 merge_split
4.1.2.3 simple_fifos
4.1.2.4 using_fifos
4.1.2.5 using_pipos
4.1.2.6 using_stream_of_blocks
4.2 Data_driven
4.1 handling_deadlock
4.2 mixed_control_and_data_driven
4.3 simple_data_driven
5. Modeling
5.1 Pointers
5.1.1 basic_arithmetic
5.1.2 basic_pointers
5.1.3 multiple_pointers
5.1.4 native_casts
5.1.5 stream_better
5.1.6 stream_good
5.1.7 using_double
5.2 basic_loops_primer
5.3 fixed_point_sqrt
5.4 free_running_kernel_remerge_ii4to1
5.5 using_C++_templates
5.6 using_arbitrary_precision_arith
5.7 using_arbitrary_precision_casting
5.8 using_fixed_point
5.9 using_float_and_double
5.10 using_vectors
5.11 variable_bound_loops 变量循环边界
6. Misc
6.1 initialization_and_reset
6.1.1 global_array_RAM
6.1.2 static_array_RAM
6.1.3 static_array_ROM
6.1.4 static_array_of_struct_with_array_RAM
6.1.5 static_struct_with_array_RAM
6.1.6 static_struct_with_array_RAM_Versal
6.2 malloc_removed
6.3 rtl_as_blackbox
7. 学习规划
1. 示例集合概述
GitHub - Xilinx/Vitis-HLS-Introductory-ExamplesContribute to Xilinx/Vitis-HLS-Introductory-Examples development by creating an account on GitHub.https://github.com/Xilinx/Vitis-HLS-Introductory-Examples此示例集与先前的博客《Vitis HLS 学习笔记--HLS优化指令示例-目录-CSDN博客》相得益彰,分别聚焦于展示HLS功能和演示HLS优化指令。与之前的博客相比,需要同时编译宿主代码和PL(可编程逻辑)代码,而本示例集则可完全在Vitis HLS仿真环境下运行,使得效果展示更为直观。这两者互为补充,共同促进了对Vitis HLS的深入理解和掌握。
本示例集分类如下:
- Interface(接口):展示各种模式和接口协议使用的常见示例
- Pipelining(流水线):展示循环和函数的流水线pragma使用的常见示例
- Task_Level_Parallelism(任务级并行):展示任务级并行编程模型和拓扑结构示例
- Modeling(建模):数学和DSP示例以及其他常见使用模型/算法
- Misc(其他):例如C++中的RTL黑盒等其他示例
2. Interface 接口
2.1 Aggregation_Disaggregation 聚合与解聚
2.1.1 aggregation_of_m_axi_ports
#pragma HLS AGGREGATE compact=auto
2.1.2 aggregation_of_nested_structs
嵌套结构体
2.1.3 aggregation_of_struct
2.1.4 auto_disaggregation_of_struct
2.1.5 disaggregation_of_axis_port
2.1.6 struct_ii_issue
迭代间隔违规
2.2 Memory
2.2.1 ecc_flags
Error Checking and Correcting
2.2.2 manual_burst
如果在设计中并未发生自动突发,则可使用 hls::burst_maxi 数据类型执行手动突发
2.2.3 max_widen_port_width
可选参数max_widen_bitwidth,因为Compiler会根据数据类型自动进行数据位宽的调整。
2.2.4 memory_bottleneck
achive II=1 by removing redundant memory accesses in the code。
2.2.5 ram_uram
BIND_STORAGE type=ram_2p impl=uram,DEPENDENCE inter WAR false,WAR is Write-After-Read
2.2.6 rom_lookup_table_math
sin_table[i] = (din1_t)(32768.0 * real_val);
2.2.7 using_axi_master
INTERFACE m_axi port=a depth=50
2.3 Register
2.3.1 using_axi_lite
2.3.2 using_axi_lite_with_user_defined_offset
2.4 Streaming
2.4.1 axi_stream_to_master
hls::stream<int,…> count; 是为了更方便自动优化实现流水线设计
2.4.2 using_array_of_streams
hls::stream<int> s_in[M],array即数组
2.4.3 using_axi_stream_no_side_channel_data
无信道侧,hls::axis<type, 0, 0, 0>,区别于传输控制信号
2.4.4 using_axi_stream_with_side_channel_data
含信道侧,hls::axis<type, WUser, WId, WDest>;
2.4.5 using_axi_stream_with_struct
查看Slide:“HLS - 接口综合:典范”
2.4.6 using_axis_array_stream_no_side_channel_data
3. Pipelining
3.1 Functions
3.1.1 function_instantiate
实例化函数
3.1.2 hier_func
分层函数,__SYNTHESIS__
3.2 Loops
3.2.1 imperfect_loop
循环边界是变量、循环体出现在外层
3.2.2 perfect_loop
循环边界是固定常数,循环体只在最内层
3.3.3 pipelined_loop
3.3.4 using_free_running_pipeline
DATAFLOW
4. Task_Level_Parallelism
4.1 Control_driven
4.1.1 Bypassing
4.1.1.1 input_bypass
(a-->t2, t2->t3) (b->k1; k1->k2; k2-k3)
4.1.1.2 middle_bypass
(a-->t1, t1->t3) (b->k1; k1->k2; k2-k3)
4.1.1.3 output_bypass
(a-->t1, t1->t2) (b->k1; k1->k2; k2-k3)
4.1.2 Channels
4.1.2.1 Vitis
use FIFOs instead of the default PIPOs on host
4.1.2.2 merge_split
<hls_np_channel.h> (number of parallel channel)
4.1.2.3 simple_fifos
config_dataflow -default_channel fifo -fifo_depth 2
4.1.2.4 using_fifos
#pragma HLS performance target_ti=32,ti=transaction interval,事务间隔
4.1.2.5 using_pipos
4.1.2.6 using_stream_of_blocks
hls::stream_of_blocks<block_data_t>
4.2 Data_driven
4.1 handling_deadlock
hls_thread_local hls::stream<int, 100> s1;
4.2 mixed_control_and_data_driven
hls_thread_local hls::task t[4];
4.3 simple_data_driven
5. Modeling
5.1 Pointers
5.1.1 basic_arithmetic
函数无返回,但指针修改了数组中的数据,实际上可以被视为函数的输出
5.1.2 basic_pointers
static 变量通常会在硬件中实现为一个寄存器或存储器单元,其值会在多个调用之间保持不变
5.1.3 multiple_pointers
局部的静态变量,是靠编译器实现作用区域限制的
5.1.4 native_casts
5.1.5 stream_better
默认ap_hs
5.1.6 stream_good
可通过Tcl脚本命令实现ap_fifo,也可以通过编译指令#pragma HLS INTERFACE
5.1.7 using_double
双重指针,应尽量避免使用,因为双重指针会增加访问数据时的间接性,从而导致额外的逻辑开销
5.2 basic_loops_primer
pipline off,unroll
5.3 fixed_point_sqrt
使用了自定义的sqrt函数,建议还是优先使用<hls_math.h>
5.4 free_running_kernel_remerge_ii4to1
Iteration Interval,ap_ctrl_none
5.5 using_C++_templates
5.6 using_arbitrary_precision_arith
<ap_int.h>
5.7 using_arbitrary_precision_casting
5.8 using_fixed_point
5.9 using_float_and_double
5.10 using_vectors
hls::vector<T, N>,适用于 SIMD(Single Instruction Multiple Data)
5.11 variable_bound_loops 变量循环边界
变量循环边界问题:该变量为函数参数,在编译时未知,需要运行时传递。
Loop: for (x=0; x<width; x++) {out_accum += A[x];
}
6. Misc
6.1 initialization_and_reset
6.1.1 global_array_RAM
全局数组,是指在函数外部定义的数组,ap_int<10> A[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
6.1.2 static_array_RAM
static ap_int<10> A[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
6.1.3 static_array_ROM
BIND_STORAGE variable=A type=ROM_1P impl=BRAM;
6.1.4 static_array_of_struct_with_array_RAM
数组结构体;
6.1.5 static_struct_with_array_RAM
结构体
6.1.6 static_struct_with_array_RAM_Versal
6.2 malloc_removed
#include "malloc_removed.h"
6.3 rtl_as_blackbox
7. 学习规划
这个示例集含有丰富的内容,我将在未来的博客文章中,专门挑选其中的重要部分进行详细讨论,并会在这里附上相关链接。
这个目录也方便我快速检索到相关知识点。