[SystemVerilog]常见设计模式/实践

常见设计模式/实践

RTL 设计(尤其是 ASIC)的最终目标是制作出最小、最快的电路。为此,我们需要了解综合工具如何分析和优化设计。此外,我们还关注仿真速度,因为等待测试运行实际上是在浪费工程精力。虽然综合和仿真工具有许多优化通道和转换,但最终结果的一个重要因素是设计模式,即代码是否遵循工具的设计指南。很多优化都是针对特定的设计模式进行的,这使得代码更容易被理解,从而被工具简化。此外,某些设计模式还可以简化代码结构,使代码更具可读性和可重用性。

在本章中,我们将介绍一些常见的设计实践,以及我们应该如何编写逻辑程序和结构化源代码。

4.1 Compiler Directives and Packages

与 C/C++ 类似,SystemVerilog 也定义了一个预处理阶段,在这个阶段,宏被扩展到原始源代码中。与 C/C++ 相比,SystemVerilog 的编译器指令不具备图灵完备性和通用性,这意味着即使是固定边界的递归计算也很难在 SystemVerilog 中指定。不过,它允许在 SystemVerilog 中进行一定程度的预处理。

4.1.1 Compiler Directives

该语言定义了多个编译器指令。我们将在此介绍一些最常用的宏:

  1. `__FILE__
  2. `__LINE__
  3. `define
  4. `else
  5. `elseif
  6. `ifdef
  7. `ifndef
  8. `endif
  9. `undef
  10. `timescale
  11. `include

`__FILE__ 和 `__LINE__ 的使用方法与 C/C++ 中的 __FILE__ 和 __LINE__ 相同。用户可以使用它们进行测试台调试。在预处理过程中,这两个编译器指令将被替换为实际的文件名和行号。

`define 允许你定义宏,这些宏可以在以后的代码中使用。我们将展示两个示例,第一个示例定义了值,第二个示例定义了需要参数的函数式代码片段。请注意,与 C/C++ 不同,宏在代码中使用时必须以 ` 作为前缀。

`define VALUE 10

module top (input logic clk);

logic [31:0] a;

always_ff @(posedge clk)a <= `VALUE;

endmodule

In the example above, we define `VALUE to be 10, and used it as register value. Even though we cover the usage here, please avoid defining constant values as macros in such way. It is because:

  1. It is difficult to find where the macro is defined, e.g. either from a file or command line options
  2. There is no namespace regarding macro values. If there are two macros shares the same name, whichever gets parsed later will be used. This may cause unexpected bugs that is difficult to debug, since the compiler may not issue warning for macro re-definition.

We highly recommend to use define constants in a package, which will be covered later in this chapter.

Another way to use `define is to define some code snippets which can be re-used later, as shown in the example below (also in code/04/macros_arguments.sv):

`define REGISTER(NAME, WIDTH, VALUE, CLK) \logic [WIDTH-1:0] NAME;               \always_ff @(posedge CLK) begin        \NAME <= VALUE;                    \end

module top;

logic        clk;
logic [15:0] in;

// declare 3 registers that are pipelined to signal in, in sequence
`REGISTER(reg1, 16, in,   clk)
`REGISTER(reg2, 16, reg1, clk)
`REGISTER(reg3, 16, reg2, clk)

// set the clock to 0 at time = 0, then tick the clock every 2 unit of time
initial clk = 0;
always clk = #2 ~clk;

initial beginfor (int i = 0; i < 3; i++) beginin = i;// wait for a cycle#4;// print out the register value$display("reg1: %d reg2: %d reg3: %d", reg1, reg2, reg3);end$finish;
end

endmodule

We will see the expected output, where x denotes uninitialized register value:

reg1:     0 reg2:     x reg3:     x
reg1:     1 reg2:     0 reg3:     x
reg1:     2 reg2:     1 reg3:     0

在上面的代码示例中,我们首先定义了三个以流水线方式(链式)输入信号的寄存器。宏 REGISTER 首先定义了寄存器的名称(NAME)和宽度(WIDTH),然后实例化一个 always_ff 块,并在每个时钟周期为寄存器赋值。请注意,我们必须使用 ( )来进行多行定义。

虽然有时使用宏可以节省时间,使代码更容易重复使用,但在重复代码段和宏的使用之间找到平衡点是很重要的。请记住,宏是在预处理阶段替换的,这将给源代码级调试带来挑战。由于所有宏都在全局命名空间中,因此还需要注意宏的重新定义。

在宏定义过程中,有时需要为不同的用途取消定义某些宏名。与 C/C++ 类似,可以使用 `undef 来取消宏定义。

`ifdef 和 `ifndef 可用于测试某些宏是否已定义(或未定义)。您需要用 `endif 关闭编译器指令。您还可以添加 `else 和 `elseif 来应对不同的情况。请注意,对于头文件,它们可以与 `define 一起使用,以提供包含保护,从而允许在多个地方包含头文件。它们的用法与 C/C++ 相同,因此在此不再赘述。

`timescale是一个对模拟器有用的重要编译器指令。它指定了特定设计元素中的时间计量单位和时间精度。对于任何编译单元范围,最多只能定义一个时间刻度。换句话说,在一起编译的两个不同源文件中定义时标是非法的。时间刻度 "的语法如下所示:

// general syntax
`timescale time_unit / time_precision
// e.g.
`timescale 1ns / 1ps
`timescale 1ns / 1ns

参数 time_unit 用于指定时间和延迟的测量单位,参数 time_precision 用于指定延迟值在用于仿真之前的四舍五入方式。time_unit 和 time_precision 的单位可以是 s、ms、us、ns、ps 和 fs。整数部分指定了数值大小的数量级,换句话说,有效数字只有 1、10 和 100。

时标对于模拟抖动和定时违规至关重要。任何与功率相关的分析也需要它。强烈建议在顶层测试台中包含时标,即使没有使用它。

`include 的作用与 C/C++ 中的 #include 相同,它包含另一个文件中的定义。强烈建议为包含文件提供一个包含保护。如果文件名用引号括起来,例如 `include "filename.svh",编译器将首先搜索当前工作目录,然后搜索用户指定的任何位置。如果文件名用角括弧括起来,例如`include <filename.svh>,则文件名必须是语言标准定义的文件。这一规则与 C/C++ 类似。

4.1.2 Packages

尽管 package为设计人员提供了一种共享定义的方法,但编译器指令本质上是要求编译器将包含文件的内容复制到源文件中,这是一种受 C 语言影响的传统功能。由于现代编程语言开始使用模块/包来结构源代码,例如 C++20 中的模块,SystemVerilog 引入了一种称为包的结构,允许设计人员重用定义、接口和函数。由于包是可综合的,因此强烈建议在 RTL 和测试平台中使用它。下面是一个包的示例:

package my_def_pkg;

// local parameters
localparam VALUE = 42;

// struct
typedef struct {logic a;logic b;
} my_struct_t;

// enum
typedef enum logic { RED, GREEN } color_t;

// function
function logic and_op(logic a, logic b);return a & b;
endfunction

endpackage: my_def_pkg

Here is an incomplete list of constructs that are allowed inside a package:

  1. parameter declaration, e.g. parameter and localparam
  2. function declaration, e.g. automatic function
  3. data declaration, e.g., struct and enum
  4. DPI import and export
  5. class declaration
  6. package import declaration

Since parameter cannot be redefined in side a package, we highly recommend to use localparam in lieu of parameter since they are functionally identical in a package. In other words, localparam does not have the visibility restriction in a package.

4.1.2.1 Package Import

To use the package definition in other modules, we need to use import keyword to import definition. There are several ways to import contents of a package and we will cover two commonly used approaches here:

  1. wildcard import. This is similar to Python’s from pkg_name import *:

    import my_def_pkg::*;
  2. explicit import. This is similar to Python’s from pkg_name import class_name:

    import my_def_pkg::my_struct_t;

导入后,标识符(即结构体名称或枚举值名称)可以直接在模块中使用。需要注意的是,我们可以在多个地方进行包导入。根据软件包内容的使用位置,有两种标准的方法:

  1. If the identifier is used for module port definition, the import needs to placed before port list:

    module topimport my_def_pkg::*;(input my_struct_t in);endmodule: top
  2. Otherwise, we shall put the import inside the module:

    module top;import my_def_pkg::*;my_struct_t a;
    endmodule: top
4.1.2.2 Import Packages within a Package

Like software programming languages, you can import a package content inside another package, and the “chained” imports can be visible to the consumer. Here is an example (code/04/chained_packages.sv) illustrates the package imports:

package def1_pkg;typedef enum logic[1:0] {ADD, SUB, MULT, DIV} alu_opcode_t;
endpackage: def1_pkg

package def2_pkg;// import alu_opcode_t from def1_pkgimport def1_pkg::alu_opcode_t;// define a new struct that include alu_opcode_ttypedef struct {alu_opcode_t alu_opcode;logic[7:0] addr;} opcode_t;
endpackage: def2_pkg

module top;// alu_opcode_t is NOT accessible from def2_pkg// the next line is ILLEGAL// import def2_pkg::alu_opcode_t;import def2_pkg::*;
opcode_t opcode;

endmodule: top

Notice unlike some software programming language such as Python, where the imported identifier is accessible as part of the new package, SystemVerilog prohibits such behavior. If you try to import alu_opcode_t from def2_pkg, you will get a recursive import error in the compiler.

4.1.2.3 Package Usage Caveats

由于软件包的内容是有作用域的,因此在使用通配符导入时,有可能会发生命名冲突。经验法则是,当命名冲突时,一定要使用显式导入。有些编码风格禁止使用通配符导入,这样会使代码稍显冗长,但可读性和可维护性更高。具体的范围规则不在本书讨论范围之内,感兴趣的用户可以参考 1800-2017 中的表 26-1。

另一个注意事项是,必须先编译软件包,然后再编译依赖于软件包的模块文件。一种系统化的方法是依靠 make 等构建工具来确保编译顺序。另一种简单的方法是将软件包放在其他源文件之前,同时向工具提供文件名。

4.2 Finite State Machines

有限状态机(FSM)是硬件控制逻辑的核心部分。如何设计好 FSM 会直接影响到综合和验证工作,因为这些工具对如何编写 FSM 有一定的限制。虽然 FSM 的理论超出了本书的范围,但我们将在介绍有关 FSM 的主要话题时尽可能多地涉及 FSM。

4.2.1 Moore and Mealy FSM

一般来说,硬件设计中常用的 FSM 有两种类型,即摩尔机和 Mealy 机。摩尔机以爱德华-摩尔(Edward F. Moore)命名,是一种输出值完全由当前状态决定的 FSM。另一方面,以 George H. Mealy 命名的 Mealy 机器是一种 FSM,其输出值由当前状态和当前输入决定。为了正式区分摩尔机和 Mealy 机,我们可以参考以下数学符号。

  • A finite set of states S
  • An initial state S0 such that S0∈S
  • A finite input set Σ
  • A finite output set Λ
  • A state transition function T:Σ×S → S
  • An output function G

For Moore machines, the output function is �:�→Λ, whereas for Mealy machines, the output function is �:Σ×�→Λ. Although Moore and Mealy machine are mathematically equivalent, there is a major difference when represented as a state transition diagram, as shown in Figure 4 and 5, where both diagram describes the logic that counts consecutive ones and output 1 once the count reaches 2. As a notation, the label on edges in Moore machine represents the input values and the label on the node represents the output value. In Mealy machine, the label on the edge follows input/output notation.

Figure 4: State transition diagram for Moore Machine.

Figure 4: State transition diagram for Moore Machine.

Figure 5: State transition diagram for Mealy Machine.

Figure 5: State transition diagram for Mealy Machine.

由于存在这种差异,当我们在 SystemVerilog 中设计 Moore 和 Mealy 机器时,会看到时序和面积方面的不同: - 要描述相同的控制逻辑,Moore 机器的状态往往多于 Mealy 机器 - 与 Mealy 机器相比,Moore 机器的输出往往有一个额外的周期延迟。

选择使用哪种类型的机器通常取决于您要模拟的控制逻辑。如果在计算输出时忽略输入,那么 Mealy 机器也可用作 Moore 机器,因此 Mealy 机器更为通用。虽然没有什么可以阻止您将这两种机器混合使用,但我们强烈建议您坚持使用一种编码风格,以便工具可以轻松识别您的设计。

4.2.2 FSM State Encoding

There are several different ways to encode your states �, one-hot encoding, Gray encoding, and binary encoding. Given |�|=�:

  • one-hot encoding implies that only one of its bits is set to 1 for a particular state. That means the total number of bits required to represent the states is �. The Hamming distance of this encoding is 2, meaning we have to flip 2 bits for a state transition.
  • Gray encoding, named after Frank Gray, is a special encoding scheme that only requires ���2(�) bits to encode. In addition, its Hamming distance is designed to be 1, which means only one bit change is required to transit a state
  • Binary encoding means the state value is assigned by its index in the states. As a result, it requires ���(�) to encode. Since each state transition may require flipping all bits, e.g., state 0 transits to state 3 for 2-bit state, its hamming distance is �(�).

每种编码都有各自的优势。例如,由于只需要一个比特来测试状态变量,单次热编码允许更小的多路复用逻辑,而加里编码允许更低的开关功耗,因此有利于低功耗设计。选择哪种编码更像是一个工程课题,取决于设计需求。因此,许多综合工具都提供了在综合过程中自动重新编码 FSM 状态的功能。因此,设计人员可以用一种编码方案对 FSM 进行编码,然后用另一种方案进行综合。然而,这也意味着 RTL 的综合版本与完成所有验证的原始 RTL 不同。因此,当工具对 FSM 进行重新编码时,可能会出现一些角落错误。一般来说,我们建议设计团队尽早根据一些工程实验结果决定编码方案。这样做可以确保综合与验证之间的一致性。

在 SystemVerilog 中,我们通常使用枚举来定义状态。与`define 和 localparam 等老式方法相比,使用枚举可以让编译器进行类型检查,从而使代码更安全、更易于调试。下面是几个使用一热编码、灰色编码和二进制编码的示例。

// on-hot encoding
typedef enum logic[3:0] {IDLE  = 4'b0001,READY = 4'b0010,BUSY  = 4'b0100,ERROR = 4'b1000
} hot_hot_state_t;

// Gray encoding
typedef enum logic[2:0] {RED    = 4'b00,GREEN  = 4'b01,BLUE   = 4'b11,YELLOW = 4'b10
} gray_state_t;

// binary encoding
typedef enum logic[1:0] {STAGE_0 = 2'd0,STAGE_1 = 2'd1,STAGE_2 = 2'd2,STAGE_3 = 2'd3
} binary_state_t;

4.2.3 General FSM Structure

As indicated by the formal definition of FSM, we need to design two components of the FSM: state transition logic � and output function �. However, since FSM needs to hold its state, we need another component that sequentially update the FSM state. As a result, a typical FSM always have three components, as shown in the Figure 6.

Figure 6: General FSM structure for Moore and Mealy machine.

Figure 6: General FSM structure for Moore and Mealy machine.

4.2.4 One-, Two-, and Three-Block FSM Coding Style

虽然 FSM 有三个必要的组成部分,但有时我们可以将某些组成部分合并为一个流程。因此,我们有三种流行的 FSM 编码风格,通常称为单块、双块和三块 FSM 编码风格。

在下面的小节中,我们将以连续计数 1 为例,展示不同的编码风格。所有状态的定义以 SystemVerilog 包的形式显示如下。

`ifndef COUNT_ONE_FSM_PKG
`define COUNT_ONE_FSM_PKG

package count_one_fsm_pkg;

typedef enum logic[1:0] {moore_state0,moore_state1,moore_state2
} moore_state_t;

typedef enum logic {mealy_state0,mealy_state1
} mealy_state_t;

endpackage
`endif // COUNT_ONE_FSM_PKG
4.2.4.1 Three-Block FSM Coding Style

Three-block FSM coding style is usually implemented as a Moore machine where:

  1. One block is used to update state with next_state.
  2. One block is used to determine next_state based on state and current inputs.
  3. One block is used to compute output based on state.

The complete example of three-block FSM is shown below (code/04/three_block_fsm_moore.sv):

module three_block_fsm_moore (input logic clk,input logic rst_n,input logic in,output logic out
);

import count_one_fsm_pkg::*;

moore_state_t state, next_state;

// block 1: state <- next_state
always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginstate <= moore_state0;endelse beginstate <= next_state;end
end

// block 2: determine next_state
always_comb begincase (next_state)moore_state0: beginif (in) next_state = moore_state1;else next_state = moore_state0;endmoore_state1: beginif (in) next_state = moore_state2;else next_state = moore_state0;endmoore_state2: beginif (in) next_state = moore_state2;else next_state = moore_state0;enddefault: beginnext_state = moore_state0;endendcase    
end

// block 3: determine output based on state
always_comb begincase (state)moore_state0: out = 0;moore_state1: out = 0;moore_state2: out = 1;default: out = 0; endcase
end

endmodule: three_block_fsm_moore
4.2.4.2 Two-Block FSM Coding Style

Two-block FSM is usually implemented in Mealy machine where: 1. One block is used to update state with next_state. 2. One block is used to determine next_state and the outputs, based on state and current inputs.

The complete example of two-block FSM is shown below (code/04/two_block_fsm_mealy.sv):

module two_block_fsm_mealy (input logic clk,input logic rst_n,input logic in,output logic out
);

import count_one_fsm_pkg::*;

mealy_state_t state, next_state;

// block 1: state <- next_state
always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginstate <= mealy_state0;endelse beginstate <= next_state;end
end

// block 2: determine next_state and output
always_comb begincase (state)mealy_state0: beginif (in) beginnext_state = mealy_state1;out = 0;endelse beginnext_state = mealy_state0;out = 0;endendmealy_state1: beginif (in) beginnext_state = mealy_state1;out = 1;endelse beginnext_state = mealy_state0;out = 0;endendendcase
end

endmodule: two_block_fsm_mealy

使用基于 Mealy 机器的双块 FSM 的好处是,只要输入发生变化,输出就会更新,而无需等待下一个周期。不过,这也给维护带来了困难。由于下一状态逻辑和输出是一起编码的,如果我们需要调整 FSM,可能需要对双块式进行重大重组。至于使用哪种方式,则由设计团队决定。

4.2.4.3 One-Block FSM Coding Style

One-block merges all the blocks together. As a result, maintaining and debugging such FSM is very challenging and we highly discourage people to adopt such FSM style unless absolute necessary. However, for completeness, we will show the code example people so that readers can recognize such programming style in practice.

module one_block_fsm_mealy (input logic clk,input logic rst_n,input logic in,output logic out
);

import count_one_fsm_pkg::*;

mealy_state_t state;

// one block: state update, next state, and output are in the same always_ff block
always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginstate <= mealy_state0;endelse begincase (state)mealy_state0: beginif (in) beginstate <= mealy_state1;out <= 0;endelse beginstate <= mealy_state0;out <= 0;endendmealy_state1: beginif (in) beginstate <= mealy_state1;out <= 1;endelse beginstate <= mealy_state0;out <= 0;endenddefault: beginstate <= mealy_state0;out <= 0;endendcaseend
end

endmodule: one_block_fsm_mealy

4.2.5 How to Write FSM Effectively

Designing an efficient FSM requires engineering work and experiments. A typical workflow is shown below:

  1. Identify states and state transition logic and turn it into a design specification.
  2. Implement FSM based on the specification
  3. (Optional) optimize the FSM based on feedbacks.

FSM 设计的第一步涉及设计探索,包括需要多少个状态、使用什么编码风格、使用什么状态编码以及输出逻辑是什么。可视化 FSM 的常用方法是用状态转换图来表示。另一种表示 FSM 的方法是使用表格,每一行代表一个状态转换。在确定所有状态后,我们可以通过状态还原等方法进一步优化 FSM,将具有完全相同逻辑(相同输出和相同转换)的状态合并为一个状态。

一旦确定了规范,将其转化为 FSM 就非常简单了。每个转换弧都可以用我们前面讨论过的情况项来表示,输出逻辑也是如此。一旦实现完成,我们就需要针对常见的错误(如死锁或无法达到的状态)进行彻底测试。有些问题可能与实现有关,有些可能与规范有关。在任何情况下,我们都需要修正设计/规范,以满足设计要求。在本书后面讨论形式验证时,我们将讨论发现死锁和不可达状态的策略。

4.3 Ready/Valid Handshake

Ready/valid handshake is one of the most used design pattern when transferring data in a latency-insensitive manner. It consists of two components, the source and the sink, where data flows from the former to the latter. The source uses valid signal to indicate whether the data is valid and the sink uses ready signal to indicate whether it is ready to receive data, as shown in the figure below.

Figure 7: Ready/Valid block diagram

Figure 7: Ready/Valid block diagram

Because ready/valid is latency-insensitive, each signal has precise semantics at the posedge of the clock (we assume we are dealing with synchronous circuit): - If the valid signal is high @(posedge clk), we know that data is valid as well - If the ready signal is high @posedge (clk) AND the valid signal is high as well, we complete the data transfer. The size of transfer is often referred as one word. - If the system wishes to transfer more data, then we need to complete a series of one-word transfer, until the entire packet is transferred.

The timing diagram below shows cases where a transfer should or should not occur.

Figure 8: No data transfer

Figure 8: No data transfer

Figure 9: No data transfer

Figure 9: No data transfer

Figure 10: One successful ready/valid data transfer

Figure 10: One successful ready/valid data transfer

Ready/valid handshake has several design pitfalls that needs to avoid: 1. If the source waits for the sink’s ready before asserting valid and vice versa, there will be chance of deadlock since both parties are waiting for each other. To avoid this, the control signal should be computed independently. 2. If the ready/valid signals are computed purely on combinational logic, there will be a combinational loop between the source and sink. To resolve this, either source or sink needs to register the control signals, or compute the signals based on some flopped states.

4.4 Commonly Used Design Building Blocks

In this section we lists some code examples of commonly used design building blocks. These circuits are commonly used in various circuit designs and are optimized for high synthesis quality.

4.4.1 Registers

There are various types registers, such as synchronous and asynchronous registers. Each type has their own benefits. The design team should decide ahead of time what types of registers to use consistently throughout the design. All the code examples here use negative reset.

4.4.2 Asynchronous Reset Registers

Asynchronous reset register has reset on its sensitivity list.


logic r, value;

always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginr <= 1'b0;endelse beginr <= value;end
end
4.4.2.1 Synchronous Reset Registers

Unlike Asynchronous reset registers, synchronous reset register only resets the register on clock edge, hence the name “synchronous”.

logic r, value;

always_ff @(posedge clk) beginif (!rst) beginr <= 1'b0;endelse beginr <= value;end
end
4.4.2.2 Chip-enable Registers

Chip-enable registers has additional single that enables or disables the value update (sometimes called clock-gating). On ASIC, there are usually specially design cells to handle such logic. As a result, if you follow the code example below you will get optimal synthesis result. We will use asynchronous reset register as an example.

logic r, value;

always_ff @(posedge clk, negedge rst_n) beginif (!rst_n) beginr <= 1'b0;endelse if (c_en) beginr <= value;end
end

In generally we do not recommend using your own logic control the register update, for instance, multiplexing the update value instead of using the syntax above, or creating your own clock based on the enable logic. These kinds of modification are unlikely to be picked up by the synthesis tools, hence reduce synthesis quality.

4.4.2.3 Power-up Values

Some FPGA tool chains allows initial values to be set along with declaration, as shown below. Since this approach does not work for ASIC, we do not recommend such approach if you want your code to be portable.

logic a = 1'b0;
logic value;

always_ff @(posedge clk) begina <= value;
end

4.4.3 Multiplexer

Multiplexer is a type of hardware circuit that selects output signals from a list of input signals. There are many ways to implement a multiplexer and we will cover two common implementation of multiplexers.

4.4.3.1 case-based Multiplexer

The simplest way to implement a multiplexer is using case statement. It is straightforward to implement and also allows synthesis tools to recognize the multiplexer and optimize the netlist. Here is an example of multiplexer that takes 5 inputs. Notice that the number of inputs does not need to be 2’s power.

module Mux5#(parameter int WIDTH = 1) (input  logic[WIDTH-1:0] I0,input  logic[WIDTH-1:0] I1,input  logic[WIDTH-1:0] I2,input  logic[WIDTH-1:0] I3,input  logic[WIDTH-1:0] I4,input  logic[$clog2(5):0] S,output logic[WIDTH-1:0] O
);

always_comb beginunique case (S)0: O = I0;1: O = I1;2: O = I2;3: O = I3;4: O = I4;default:O = I0;endcase
end

endmodule

Notice that default is used to handle edges cases where the select signal S is out of range or containing x.

A slightly shorten version is to merge all the input signals into an array and use index operator as multiplexer, as shown below:

module Mux#(parameter int WIDTH=1,parameter int NUM_INPUT=2) (input  logic[NUM_INPUT-1:0][WIDTH-1:0] I,input  logic[$clog2(NUM_INPUT)-1:0] S,output logic[WIDTH-1:0] O
);

assign O = (S < NUM_INPUT)?I[S]:I[0];
endmodule

In the code example above, we implicitly ask the synthesis tool to create a multiplexer for us. There are several advantage of this approach:

  1. We let synthesis tool to do its job to optimize the design
  2. The module works with any arbitrary number inputs (NUM_INPUT has to be larger than 1), as well as outputs.
4.4.3.2 AOI Multiplexer

In situations where hand-optimization is required, we can implement an AOI max. AOI stands for AND-OR-Invert, which implies the the basic logic operation we are going to do with the inputs. AOI gates are efficient with CMOS technology since we can use NAND and NOR logic gate to construct AOI gate.

There are two components of AOI mux, namely a precoder and AOI logic. The precoder translate select signal into one-hot encoding, and AOI logic merge the inputs into output based on the one-hot-encoded select signal. Here is the complete implementation of the AOI mux with 5 inputs (code/04/aoi_mux.sv).

module aoi_mux#(parameter int WIDTH=1,parameter int NUM_INPUT=2) (input  logic[NUM_INPUT-1:0][WIDTH-1:0] I,input  logic[$clog2(NUM_INPUT)-1:0] S,output logic[WIDTH-1:0] O
);

// calculate the ceiling of num_input / 2
localparam NUM_OPS = (NUM_INPUT + 1) >> 1;
localparam MAX_RANGE = NUM_INPUT >> 1;

logic [NUM_INPUT-1:0] sel_one_hot;
// simplified one-hot precoder.
assign sel_one_hot = (S < NUM_INPUT)?1 << S:0;

// intermediate results
logic [NUM_OPS-1:0][WIDTH-1:0] inter_O;

// AOI logic part
always_comb begin// working on each bitfor (int w = 0; w < WIDTH; w++) begin// half the treefor (int i = 0; i < MAX_RANGE; i++) begininter_O[i][w] = (sel_one_hot[i * 2] & I[i * 2][w]) |(sel_one_hot[i * 2 + 1] & I[i * 2 + 1][w]);end// need to take care of odd number of inputsif (NUM_INPUT % 2) begininter_O[MAX_RANGE][w] = sel_one_hot[MAX_RANGE * 2] & I[MAX_RANGE * 2][w];endend
end

// compute the final result, i.e. OR the intermediate result together
// notice that |inter_O doesn't work here since it will reduce to 1-bit signal
always_comb beginO = 0;for (int i = 0; i < NUM_OPS; i++) beginO = O | inter_O[i];end
end

endmodule

The example above can be explained with matrix operation. After one-hot encoding transformation, we create a matrix � where �[�]=���_���_ℎ�� for �∈{0,1,…,���_�����−1}. In other words, all entries in matrix S is zero except for the column indicated by the select signal, which are all one’s. The input signals can be expressed as � where each row of � is one input. We then compute the following result:������=�×�

Notice that since � only consists of one’s and zero’s, multiplication is effectively performing AND operation. Matrix ������ has similar characteristic as matrix � due to the property of one-hot encoding. To obtain the result, we can do a row-wise OR reduction to obtain the final result. Since CMOS technology is more area efficient when we fuse AND and OR operation together, instead of computing one row at a time, we can compute two rows together, hence the variable NUM_OPS is computed based on ⌈���_�����2⌉. Readers are encouraged to work out the process with some simple examples.

AOI mux is an example of how we can express the same logic in a clever way that is optimized for CMOS technology. This kind of optimization requires keen insight on the logic as well as deep understanding of logic synthesis. Unless required, we do not recommend to hand-optimize common logic such as adder or multiplexer since it may not achieve better result than synthesis tools and error prone. Use the syntax sugar offered by the SystemVerilog language and let synthesis tools do the heavy lifting. If the code follows the coding style, synthesis tools can pick up easily and perform automatic optimization.

4.5 Wishbone Protocol: A Case Study

A common place for bugs to occur is the interface between components, where each component may have different design assumptions. One approach to limit such bugs is to adhere to a well-specified protocol such that each component will follow and thus reduce the interface error. In this chapter we will take a look at a simple yet complete protocol, namely WIshbone, and how we can write RTL code based on the spec.

Unlike protocols such as AXI4, Wishbone is an open-source hardware bus interface, which allows engineers and hobbyists to share public domain designs.

4.5.1 Wishbone Introduction

Wishbone bus consists of two channels: a request channel which can either be read or write, and an acknowledge (ACK) channel. These two channels connect the bus master and slave together, as shown in the figure below.

Figure 11: Wishbone channel diagram

Figure 11: Wishbone channel diagram

The master has a list of signals specified by the specification. Notice that it is explicitly stated that IPs can change the interface name (PERMISSION 2.0.0), we will use the names used in the specification to make it easier to compare with the document. Notice that the specification follows the naming convention that suffix _O indicates output port and _I indicates input port.

There are a list of signals that’s shared between master and slave interfaces:

Table 4: Interface signals shared between Wishbone master and slave.
Signal NameFunction
CLK_IAll Wishbone output signals are registered at the rising edge of CLK_I. All Wishbone input signals are stable before the rising edge of CLK_I
DAT_IThe data input array to pass binary data. Maximum 64-bit
DAT_OThe data output array to pass binary data. Maximum 64-bit
RST_IReset signal. This signal only resets the Wishbone interface, not required to reset the other part of the IP.
TGD_IData tag type, which contains additional information about the data. Must be specified in the IP datasheet.
TGD_OData tag type, same as TGD_I

We’ll ignore TGD_I and TGD_O in this section, but keep in mind that they can transfer very useful metadata information such as error checking code to protect data.

Below shows the complete interface ports for the master (excluding the shared ports).

Table 5: Wishbone master interface ports.
Signal NameFunction
ACK_IThe acknowledge indicates the normal termination of a bus cycle
ADR_OThe address used for read/write request
CYC_OThe cycle output. When asserted, indicates a valid bus cycle in progress
STALL_IWhen asserted, indicates that the current slave is not able to accept the transfer
ERR_IWhen asserted, indicates an abnormal cycle termination
LOCK_OWhen asserted, indicates the current bus cycle is uninterruptible
RTY_IWhen asserted, indicates that the interface is not ready to accept/send data and the cycle should be retried
SEL_OIndicates where valid data is expected on the DAT_I signal array during read cycles, and where it is placed on the DAT_O signal array during write cycles
STB_OThe strobe output indicates a valid data transfer cycle. It is used to qualify other signals on the interface.
TGA_OAddress tag type, which contains information associated with address lines, which can be qualified by STR_O.
TGC_OCycle tag type, which contains information associated with bus cycles, which can be qualified by signal CYC_O.
WE_OWrite enable output, which indicates whether the current local bus cycle is a read or write cycle.

Again, we will ignore tag information. Interested readers should check out the specification.

The slave interface is symmetric with the master slave: XX_I from master will have a correspondence port XX_O in the slave and vice versa. In general, Wishbone interface is simpler than other bus interface such as Advanced Microcontroller Bus Architecture (AMBA), which is the reason why we can explain the protocol without lengthy details here.

4.5.2 Wishbone Master Example

We present here a simplified version of master module, where the read write behavior is controlled via a simple interface. For any real-world practice, we need to connect the master to an IP that directly controls the master’s behavior. We also drop the tag, lock, and byte select interface for simplicity, but keep in mind that in a real IP interface we need to implement this as well! We will focus on register read write instead of block transfer; we will also drop corner case handling such as error and retry. Interested readers should try to implement block transfer and other missing features.

First, we need to define the IO ports, where the width or the data is parametrized by WIDTH. We also need to add other parameterization for control and data signals.


module wb_master #(parameter WIDTH=32,parameter ADDR_WIDTH=16) (input  logic                 CLK_I,input  logic[WIDTH-1:0]      DAT_I,output logic[WIDTH-1:0]      DAT_O,input  logic                 RST_I,
input  logic                 ACK_I,output logic[ADDR_WIDTH-1:0] ADR_O,output logic                 CYC_O,input  logic                 STALL_I,output logic                 STB_O,output logic                 WE_O
// external controlsinput  logic                 write,input  logic                 enable,input  logic[ADDR_WIDTH-1:0] addr,input  logic[WIDTH-1:0]      wdata,output logic[WIDTH-1:0]      rdata,output logic                 ready,output logic                 ack
);

请注意,根据命名规则,STALL_I 实质上是从属设备的就绪信号,而 STB_O 则是有效信号。考虑到这一点,我们可以快速勾勒出根据控制信号发送命令的逻辑。请注意,在 Wishbone 中,每个输出都将被注册。请注意,由于我们需要等待客户端确认转换,因此我们需要一个 FSM 来确定传输状态(我们将使用 2 块 FSM 来实现)。由于我们只对单个寄存器的传输感兴趣,因此无需记录传输的字数。


typedef enum logic {IDLE,BUSY
} State;

State state;

Based on the state, we have three different outputs:

always_comb beginunique case (state)IDLE: beginCYC_O = 0;STB_O = 0;endBUSY: beginCYC_O = 1;STB_O = 1;endendcase
end

然后,我们需要根据控制信号改变状态。由于我们只对一个字的传输感兴趣,因此当外部控制信号使能为高电平且从站准备就绪时,我们就开始交易。根据是读还是写请求,我们对 WB 控制数据进行不同的设置。启动事务后,主站进入忙状态,等待从站回执。之后,主站向外部客户端发出交易结束的信号,并返回空闲状态。

always_ff @(posedge CLK_I) begin// reset on highif (RST_I) beginstate <= IDLE;
// reset all registered outputsADDR_O <= 0;WE_O <= 0;DATA_O <= 0;
// external control signalack <= 0;ready <= 1;endelse beginunique case (state)IDLE: begin// only when the we're asked to send data// and slave is readyif (enable && !STALL_I) beginADDR_O <= addr;// write requestif (write) beginDATA_O <= wdata;WE_O <= 1;end else beginDATA_O <= 0;WE_O <= 0;end
SEL_O <= 1;state <= BUSY;// external control signalready <= 0;ack <= 0;endelse begin// external control signalready <= 1;ack <= 0;endendBUSY: begin// wait for slave ackif (ACK_I) begin// we goodstate <= IDLE;DATA_O <= 0;
// we assume control client will hold this signal until response gets backif (enable) beginack <= 1;if (!write) begin// if it's a readwdata <= DAT_I;endelse beginwdata <= 0;endendendendendcaseend
end

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/814921.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

基于STC12C5A60S2系列1T 8051单片机的带字库液晶显示器LCD12864数据传输并行模式显示图像应用

基于STC12C5A60S2系列1T 8051单片机的液晶显示器LCD12864显示图像应用 STC12C5A60S2系列1T 8051单片机管脚图STC12C5A60S2系列1T 8051单片机I/O口各种不同工作模式及配置STC12C5A60S2系列1T 8051单片机I/O口各种不同工作模式介绍液晶显示器LCD12864简单介绍一、LCD12864点阵型液…

2024最方便申请SSL证书方法介绍

申请SSL证书其实就像你去官方机构办个身份证&#xff0c;证明你的网站是合法且安全的。这里给你白话一点的简单步骤&#xff1a; 步骤一&#xff1a;确定需求 1. 域名&#xff1a;确保你有一个要申请证书的域名&#xff0c;就是你的网站地址&#xff0c;比如 www.example.com。…

MySQL Innodb中 可重复读隔离级别是否能完全规避幻读

一、MySQL 可重复读隔离级别下的幻读 在 MySQL Innodb引擎可重复读隔离级别下&#xff0c;已经尽可能最大程度的规避幻读的问题了&#xff0c;使得大多数情况下&#xff0c;重复读都是可以得到一致的结果。 针对于读数据&#xff0c;可以大致分为两种模式&#xff0c;快照读&…

网络篇12 | 链路层 ARP

网络篇12 | 链路层 ARP 01 简介1&#xff09;工作过程2&#xff09;ARP缓存2.1 动态ARP表项2.2 静态ARP表项2.3 短静态ARP表项2.4 长静态ARP表项 02 ARP报文格式1&#xff09;ARP请求报文格式2&#xff09;ARP响应报文格式3&#xff09;套一层以太网帧&#xff08;ARP帧&#x…

熟悉数电知识

23.数电 1. 建立时间、保持时间 建立时间setup time&#xff1a;时钟上升沿到来之前&#xff0c;输入端数据已经来到并稳定持续的时间。 保持时间hold time&#xff1a;时钟上升沿到来之后&#xff0c;传输端数据保持稳定并持续的时间。 2.二分频电路 每当输入一个时钟信号…

关于时频分析的一些事-答知乎问(一)

从信号的时频谱图中可以提取什么特征&#xff1f; 基于时频谱图的特征一般包括能量特征、时域和频域拓展特征以及时频内禀特征。 基于时频图的能量特征 基于时频图的特征中&#xff0c;能量特征是最简单的一种&#xff0c;通过分析时频谱图中的能量分布特性而获取信号的时频…

Gitlab全量迁移

Gitlab全量迁移 一、背景1.前提条件 一、背景 公司研发使用的Gitlab由于服务器下架需要迁移到新的Gitlab服务器上。Gitlab官方推荐了先备份然后再恢复的方法。个人采用官方的另外一种方法&#xff0c;就写这篇文章给需要的小伙伴参考。 源Gitlab: http://old.mygitlab.com #地…

算法库应用- 表的自然链接

功 能: 设计算法,将两个单链表数组的特定位序, 相同者,链接起来 编程人: 王涛 详细博客:https://blog.csdn.net/qq_57484399/article/details/127161982 时 间: 2024.4.14 版 本: V1.0 V1.0 main.cpp /***************************************** 功 能: 设计算法,将两个…

Linux:环境基础开发工具使用

文章目录 前言1.Linux下的软件安装1.1 什么是软件包1.2 如何安装软件1.3 如何卸载软件 2.vim2.1 vim的基本概念2.2 vim的基本操作2.3 vim正常模式命令集2.4 vim末行模式命令集2.5 vim的操作总结 3.Linux下的编译器&#xff1a;gcc3.1 gcc的使用3.2 gcc是如何工作的3.2.1 预处理…

嵌入式学习54-ARM3(中断和时钟)

S3c2440中断控制器 内部外设&#xff1a; DMA &#xff1a;&#xff08;直接内存存取&#xff09; Direct Memor…

基于Linux定时任务实现的MySQL周期性备份

1、创建备份目录 sudo mkdir -p /var/backups/mysql/database_name2、创建备份脚本 sudo touch /var/backups/mysql/mysqldump.sh# 用VIM编辑脚本文件&#xff0c;写入备份命令 sudo vim /var/backups/mysql/mysqldump.sh# 内如如下 #!/bin/bash mysqldump -uroot --single-…

数据库的负载均衡,高可用实验

一 高可用负载均衡集群数据库实验 1.实验拓扑图 2.实验准备(同一LAN区段)&#xff08;ntp DNS&#xff09; 客户端&#xff1a;IP&#xff1a;192.168.1.5 下载&#xff1a;MariaDB 负载均衡器&#xff1a;IP&#xff1a;192.168.1.1 下载&#xff1a;keepalived ipvsadm I…

操作系统银行家算法计算题

设系统某个T0时刻的状态&#xff0c;如表1所示&#xff1a; 表1 系统资源状态表使用银行家算法回答下面问题&#xff1a; &#xff08;1&#xff09;系统在T0时刻状态是否安全&#xff1f;若安全&#xff0c;请给出一个安全序列。 &#xff08;2&#xff09;假如T0时刻进程P1…

C语言指针进阶:数组与指针的联系

目录 1. 数组名的本质2. 使用指针访问数组3. 一维数组传参的本质4. 二级指针5. 指针数组5.1 指针数组模拟二维数组 正文开始。 1. 数组名的本质 数组名代表着这个数组中第一个元素的地址 例如&#xff1a; int arr[4] { 1,2,3,4 }; int *p1 &arr[0]; int *p2 arr;上述…

【智能优化算法】河马优化算法(Hippopotamus optimization algorithm,HO)

河马优化算法&#xff08;Hippopotamus optimization algorithm&#xff0c;HO&#xff09;是发表在中科院二区期刊“Scientific Reports”的文章“Hippopotamus Optimization Algorithm: a Novel Nature-Inspired Optimization Algorithm”上的算法。 01.引言 河马优化算法&a…

响应式导航栏不会做?看我一分钟学会制作导航栏!

引言 随着互联网技术的飞速发展&#xff0c;用户体验在网页设计中的重要性日益凸显。其中&#xff0c;导航栏作为网页的“指南针”&#xff0c;不仅能帮助用户快速定位所需内容&#xff0c;还能体现网站的整体风格和设计理念。本文将介绍如何使用HTML、CSS和JavaScript制作一个…

SpringBoot 微服务token 传递实现

1、前言 随着微服务的流行&#xff0c;微服务之间的安全以及业务需要&#xff0c;都需要穿递token &#xff0c;而token的传递一般通过header 头来传递。从架构的角度来讲 &#xff0c;一般的企业应用都由nginx、业务网关和各个微服务组成。这个nginx 传递header 我就不讲述。下…

[附带黑子定制款鸽鸽版素材包]更改文件夹图标,更改系统音效,更改鼠标指针及样式。

更改文件夹图标 1.选择图片在线格式转换网站转换为ico格式 2.右键文件夹选择属性 3.点击自定义&#xff0c;点击更改图标超链接 4.点击预览选择生成的ico文件 5.点击打开&#xff0c;点击确定&#xff0c;点击应用 更改系统音效&#xff08;真爱粉强烈推荐&#xff09; 1…

react query 学习笔记

文章目录 react query 学习笔记查询客户端 QueryClient获取查询客户端 useQueryClient异步重新请求数据 queryClient.fetchQuery /使查询失效 queryClient.invalidateQueries 与 重新请求数据queryClient.refetchQueries 查询 QueriesuseQuery查询配置对象查询的键值 Query Key…

老板最怕你出这样的代码。。。

大家好&#xff0c; 这一个月时间&#xff0c;阿里巴巴、滴滴、腾讯都发生过应用线上故障的事情&#xff0c;很多同学说是降本增“笑”的后果然后圈内流传一个新想法&#xff0c;为了避免“开源节流” 的事情&#xff0c;工作中要写一些防御性的代码。 什么意思&#xff1f;就…