图像处理算法:白平衡、除法器、乘法器~笔记

参考:

基于FPGA的自动白平衡算法的实现        

白平衡初探 (qq.com)         

FPGA自动白平衡实现步骤详解-CSDN博客

xilinx 除法ip核(divider) 不同模式结果和资源对比(VHDL&ISE)_ise除法器ip核-CSDN博客 

 数字信号处理-04- FPGA常用运算模块-除法器(二)-阿里云开发者社区 (aliyun.com)

 【FPGA】:ip核--Divider(除法器)_除法器ip核-CSDN博客

 数字信号处理-04- FPGA常用运算模块-除法器_tlast-CSDN博客

目的:还原出真实的白色

色温的概念和示例:

涉及的资源:除法器、乘法器

除法器基本介绍

LUTMult
LUTMult . A simple lookup estimate of the reciprocal of the divisor followed by a
multiplier. Only remainder output type is supported because of the bias required in the
reciprocal estimate. This bias would introduce an offset (error) if used to create a
fractional output. This is recommended for operand widths less than or equal to 12 bits.
This implementation uses DSP slices, block RAM and a small amount of FPGA logic
primitives (registers and LUTs). For operand widths where either Radix2 or the LUTMult
options are possible, the LUTMult solution offers a solution using fewer FPGA logic
resources because of the use of DSP and block RAM primitives.
除数的倒数和乘数的简单查找估计。由于倒数估计中需要偏差,因此仅支持余数输出类型。如果用于创建小数输出,此偏差会引入偏移(误差)。对于小于或等于 12 位的操作数宽度,建议这样做。该实现使用 DSP Slice、Block RAM 和少量 FPGA 逻辑原语(寄存器和 LUT)。对于可以使用 Radix2 或 LUTMult 选项的操作数宽度,LUTMult 解决方案提供了一种使用更少 FPGA 逻辑资源的解决方案,因为使用了 DSP 和 Block RAM 原语。
Radix-2.
Radix-2. Radix-2 non-restoring integer division using integer operands, allowing either
a fractional or an integer remainder to be generated. This is recommended for operand
widths less than around 16 bits or for applications requiring high throughput. The
Send Feedback  implementation uses FPGA logic primitives (registers and LUTs). The Radix2 solution does not use DSP or block RAM primitives, so this implementation is recommended
when these primitives are needed elsewhere.
基数-2。使用整数操作数的 Radix-2 非恢复整数除法,允许生成小数或整数余数。对于小于 16 位左右的操作数宽度或需要高吞吐量的应用,建议这样做。发送反馈实现使用 FPGA 逻辑原语(寄存器和 LUT)。 Radix2 解决方案不使用 DSP 或块 RAM 原语,因此当其他地方需要这些原语时,建议使用此实现。
High Radix.
High Radix . High Radix division with prescaling. This is recommended for operand
widths greater than around 16 bits. This implementation uses DSP slices and block
RAMs.
带预缩放的高基数除法。对于大于 16 位左右的操作数宽度,建议这样做。该实现使用 DSP Slice 和 Block RAM。

延迟测算

The latency of the divider core is a function of the AXI4-Stream configuration parameters
and the latency of the algorithm selected. Latency is only a constant when the AXI4-Stream
mode is set to Non-Blocking and when the core algorithm and throughput are set such that
one sample is input per clock cycle. If the core is set to accept data only one in N cycles,
then data is only accepted on cycles N, 2N, 3N, …. It is not the case that data is accepted
immediately as long as N cycles have passed since the previous input. Hence, latency
appears to be increased if data is presented before the core is able to accept it. Another
effect which can cause latency to vary and increase is if full AXI4-Stream behavior is
selected. This is because a FIFO is used to manage data for this mode and the depth of the
FIFO adds to the latency. However, it should be noted that the intention of selecting
AXI4-Stream is to replace the need to balance latency with a handshake which manages
data flow at runtime, so latency should be less of a consideration. Because latency can vary
due to these effects, only minimum latency can be determined as a constant for any given
configuration of the core. In the following sections the latency of the algorithm alone is
discussed.
分频器核心的延迟是 AXI4-Stream 配置参数和所选算法延迟的函数。仅当 AXI4-Stream 模式设置为非阻塞且核心算法和吞吐量设置为每个时钟周期输入一个样本时,延迟才为常数。如果内核设置为仅在 N 个周期中接受一个数据,则仅在第 N、2N、3N、... 个周期上接受数据。只要自上一次输入以来经过≥N个周期,数据就不会被立即接受。因此,如果数据在核心能够接受之前呈现,则延迟似乎会增加。另一个可能导致延迟变化和增加的影响是选择完整的 AXI4-Stream 行为。这是因为 FIFO 用于管理此模式的数据,而 FIFO 的深度会增加延迟。然而,应该注意的是,选择 AXI4-Stream 的目的是用在运行时管理数据流的握手来取代平衡延迟的需要,因此延迟应该不被考虑。由于延迟可能会因这些影响而变化,因此对于任何给定的内核配置,只能将最小延迟确定为常数。在以下部分中,仅讨论算法的延迟。

LUTMult
The latency of the fully pipelined LUTMult is 8.
Radix-2
The latency (number of enabled clock cycles required before the core generates the first
valid output) for a fully pipelined divider is a function of the bit width of the dividend. If
fractional output is required, the fully pipelined latency is also a function of the fractional
bit width. In general:

Radix-2 全流水线除法器的延迟(内核生成第一个有效输出之前所需的启用时钟周期数)是被除数位宽的函数。如果需要小数输出,则完全流水线延迟也是小数位宽度的函数。一般来说:

• Fully pipelined latency is of the order M for integer remainder dividers, where M is the
width of the Quotient
• Fully pipelined latency is of the order M + F for fractional remainder dividers where F is
the width of the Fractional output
• 对于整数余数除法器,完全流水线延迟的数量级为 M,其中 M 是商的宽度
• 对于小数余数除法器,完全流水线延迟的数量级为 M + F,其中 F 是小数输出的宽度
Table 2-1 provides a list of the fully pipelined latency formula for divider selections. With
full pipelining, maximum possible performance is achieved. When clocks per division is 1,
latency can be set manually to a figure between 0 and the value shown in Table 2-1 . This
allows the latency of the core to be reduced at the expense of reducing the maximum clock
frequency at which the core can be clocked. Reducing the latency reduces the number of
registers used, but the LUT count remains approximately the same
表 2-1 提供了用于分频器选择的完全流水线延迟公式的列表。通过完整的流水线,可以实现最大可能的性能。当每分频时钟数为1时,延迟可以手动设置为0到表2-1所示值之间的数字。这允许减少内核的延迟,但代价是降低内核可以计时的最大时钟频率。减少延迟会减少使用的寄存器数量,但 LUT 数量保持大致相同
1. M = Dividend and Quotient Width, F = Fractional Width, A = total Latency of AXI interfaces.
1. M = 被除数和商宽度,F = 小数宽度,A = AXI 接口的总延迟。
High Radix Solution
Tables 2-2 and 2-3 show latency for the High Radix solution. To this, add 0 for NonBlocking
mode, 1 for Blocking mode with no output tready and 3 for Blocking mode with output
tready .
表 2-2 和 2-3 显示了高基数解决方案的延迟。为此,添加 0 表示非阻塞模式,添加 1 表示不带输出的阻塞模式,添加 3 表示带输出的阻塞模式。

除法器配置

Common Options

Describes parameters common to both implementations and allows the selection of the divider implementation.

描述两种实现方式共有的参数,并允许选择分频器实现方式。

• Algorithm Type: This selects between Radix-2, LUTMult and High Radix division solutions.

Radix-2 Options计算频次

Clocks Per Division: Determines the throughput of the Radix-2 solution (interval in clocks between inputs (or outputs)). A low value for this parameter results in high throughput, but also in greater resource use.

每分频时钟数:确定 Radix-2 解决方案的吞吐量(输入(或输出)之间的时钟间隔)。此参数的值较低会导致吞吐量较高,但也会导致资源使用量增加。

High Radix and LUTMult Options

Number of iterations (High Radix only): Read-only text field that reports the number of iterations performed by the High Radix engine for each divide. This sets the maximum throughput of the divider. To achieve this throughput, the operands must be supplied as soon as requested by the core s_axis_dividend_tready and s_axis_divisor_tready outputs.

迭代次数(仅限高基数):只读文本字段,报告高基数引擎为每次划分执行的迭代次数。这设置了分频器的最大吞吐量。为了实现此吞吐量,必须在核心 s_axis_dividend_tready 和 s_axis_divisor_tready 输出请求时立即提供操作数。

Throughput (High Radix only): Read-only text field that reports the maximum throughput that can be sustained by the divider when operands are supplied at a constant rate. In AXI blocking modes, throughput might be slightly higher due to Send Feedback Divider Generator v5.1 26 PG151 February 4, 2021 www.xilinx.com Chapter 4: Design Flow Steps buffering. This rate applies when FlowControl is set to NonBlocking and the output channel DOUT has no tready.

吞吐量(仅限高基数):只读文本字段,报告以恒定速率提供操作数时分频器可以维持的最大吞吐量。在 AXI 阻塞模式下,由于发送反馈分频器生成器设计流程步骤缓冲,吞吐量可能会稍高。当FlowControl 设置为NonBlocking 并且输出通道DOUT 没有tready 时,适用此速率。

Common Options

Detect Divide-by-Zero: Check box. Determines if the core has a DIVIDE_BY_ZERO field in the output tuser port (m_axis_dout_tuser) to signal when a division by zero has been performed

检测除零:复选框。确定内核的输出 tuser 端口 (m_axis_dout_tuser) 中是否有 DIVIDE_BY_ZERO 字段,以在执行除以零时发出信号

AXI4-Stream Options

Flow Control: Blocking or NonBlocking. This is more fully explained in AXI4-Stream Considerations in Chapter 3. NonBlocking mode provides an easier migration path from the previous version of the Divider Generator core. Blocking mode eases data flow management to/from other AXI4-Stream blocking mode cores at the expense of some additional resource and latency

流量控制:阻塞或非阻塞。第 3 章中的 AXI4-Stream 注意事项对此进行了更全面的解释。NonBlocking 模式提供了从先前版本的 Divider Generator 核心的更简单的迁移路径。阻塞模式简化了进出其他 AXI4-Stream 阻塞模式内核的数据流管理,但会带来一些额外的资源和延迟

Optimize Goal: This applies only to blocking mode. When ACLKEN is selected and Optimize Goal is set to Resources, performance might be reduced. See Resource Utilization in Chapter 2.

优化目标:这仅适用于阻塞模式。当选择 ACLKEN 并将优化目标设置为资源时,性能可能会降低。请参阅第 2 章中的资源利用。

• Output has TREADY: Selects whether the output channel has a tready signal. This is required to allow back pressure from downstream, for example, if connected to another AXI4-Stream Blocking core. Without tready, downstream circuitry cannot halt dataflow from the divider, but some resource is saved

• 输出有TREADY:选择输出通道是否有TREADY 信号。这是允许来自下游的背压所必需的,例如,如果连接到另一个 AXI4-Stream Blocking 核心。如果没有tready,下游电路无法停止来自分频器的数据流,但可以节省一些资源

Output TLAST Behavior: Selects the source of the output channel tlast signal. When neither or only one input channel has a tlast then the output tlast is not present or derives from the input tlast appropriately. When both input channels have tlast, the output channel tlast can derive from either alone, the logical OR of both inputs, or the logical AND of both inputs.

输出 TLAST 行为:选择输出通道 tlast 信号的源。当没有或只有一个输入通道具有 tlast 时,则输出 tlast 不存在或从输入 tlast 适当导出。当两个输入通道都有 tlast 时,输出通道 tlast 可以单独源自两个输入的逻辑“或”或两个输入的逻辑“与”。

Latency Options配置流水

Latency Configuration: Automatic (fully pipelined) or manual (determined by following field). Latency configuration for Radix2 solution is configurable only when clocks per division is set to 1. This is due to iterative feedback and hence non-optional registers when clocks per division is greater than 1.

延迟配置:自动(完全流水线)或手动(由以下字段确定)。仅当每分频时钟设置为 1 时,Radix2 解决方案的延迟配置才可配置。这是由于迭代反馈,因此当每分频时钟大于 1 时,寄存器是非可选的。

Latency: When Latency Configuration is set to Automatic, this field provides the latency from input to output in terms of clock enabled clock cycles. When Manual, this field is used to specify the latency required. When high performance (clock frequency) is not required, a lower value in this field can save resources

延迟:当延迟配置设置为自动时,此字段以时钟启用的时钟周期提供从输入到输出的延迟。当手动时,该字段用于指定所需的延迟。当不需要高性能(时钟频率)时,该字段的值较低可以节省资源

Control Signals

ACLKEN: Determines if the core has a clock enable input (ACLKEN).

ARESETn: Determines if the core has an active-Low synchronous clear input (ARESETn).

Note:

a. The signal ARESETn always takes priority over ACLKEN, that is, ARESETn takes effect regardless of the state of ACLKEN.

b. The signal ARESETn is active-Low.

c. The signal ARESETn should be held active for at least two clock cycles. This is because, for performance, ARESETn is internally registered before being fed to the reset port of primitives

除法器输出:

Output Channel

• Remainder Type: This selects between remainder types Fractional and Remainder presented on the FRACTIONAL field of the output tdata port (m_axis_dout_tdata). Fractional remainder type is the only option for High Radix.

• 余数类型:在输出tdata 端口(m_axis_dout_tdata) 的FRACTIONAL 字段上显示的余数类型Fractional 和Remainder 之间进行选择。小数余数类型是高基数的唯一选择。

• Fractional Width: If Fractional remainder type is selected, this determines the number of bits provided on the FRACTIONAL field of the output channel (m_axis_dout_tdata). When High Radix is selected, the total output width (quotient part plus fractional part) is limited to 82. The width of the quotient is equal to the width of the dividend and is set in the Dividend channel section. The width of the tuser port is the sum of the present input channel tuser fields plus one if divide_by_zero detect is active. See AXI4-Stream Considerations in Chapter 3 for the internal structure of the tuser port. This channel also has a tlast port if either of the input channels has a tlast port

• 小数类型:

如果选择小数类型,则这个选择将确定输出通道的小数字段上提供的位数(m_axis_dout_tdata)。

当选择 High Radix 时,总输出宽度(商部分加小数部分)限制为 82。商的宽度等于被除数的宽度,并在 Dividend Channel 部分中设置。如果divide_by_zero 检测处于活动状态,tuser 端口的宽度是当前输入通道tuser 字段的总和加一。有关 tuser 端口的内部结构,请参阅第 3 章中的 AXI4-Stream 注意事项。

如果任一输入通道有 tlast 端口,则该通道也有 tlast 端口

TDATA Structure for Output (DOUT) Channel The structure of m_axis_dout_tdata is more complex. This port contains both quotient and, if present, remainder or fractional outputs. When the remainder type is set to remainder, the two outputs are considered separate and so are byte-oriented before being concatenated to make the m_axis_dout_tdata signal. When remainder type is fractional, the fractional part is considered an extension of the quotient so these two fields are concatenated before being padded to the next byte boundary.

输出 (DOUT) 通道的 TDATA 结构 m_axis_dout_tdata 的结构更为复杂。该端口包含商以及余数或小数输出(如果存在)。

当余数类型设置为余数时,两个输出被视为独立的,因此在连接形成 m_axis_dout_tdata 信号之前是面向字节的。

当余数类型为小数时,小数部分被视为商的扩展,因此这两个字段在填充到下一个字节边界之前会被连接起来。

WITH REMAINDER:PAD+QUOTIENT+REMAINDER+PAD

WITH FRACTIONAL PART:PAD+QUOTIENT+FRACTIONAL

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/642449.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【BBuf的CUDA笔记】十三,OpenAI Triton 入门笔记一

0x0. 前言 2023年很多mlsys工作都是基于Triton来完成或者提供了Triton实现版本,比如现在令人熟知的FlashAttention,大模型推理框架lightllm,diffusion第三方加速库stable-fast等灯,以及很多mlsys的paper也开始使用Triton来实现比…

sqlmap使用教程(3)-探测注入漏洞

1、探测GET参数 以下为探测DVWA靶场low级别的sql注入,以下提交方式为GET,问号(?)将分隔URL和传输的数据,而参数之间以&相连。--auth-credadmin:password --auth-typebasic (DVWA靶场需要登录&#xf…

C语言 小明喝饮料

题目&#xff1a;喝汽水&#xff0c;1瓶汽水1元&#xff0c;2个空瓶可以换汽水&#xff0c;给n元&#xff0c;可以喝多少汽水//理论问题&#xff0c;请勿模仿-^- #include <stdio.h> int main() {int n,ret,i;scanf("%d", &n);ret n;while (n>1){ret …

基于SpringBoot的教务管理系统设计与实现(源码+调试)

项目描述 临近学期结束&#xff0c;还是毕业设计&#xff0c;你还在做java程序网络编程&#xff0c;期末作业&#xff0c;老师的作业要求觉得大了吗?不知道毕业设计该怎么办?网页功能的数量是否太多?没有合适的类型或系统?等等。今天给大家介绍一篇基于SpringBoot的教务管…

QuestDB时序数据库快速入门

简介 QuestDB是一个开源的高性能时序数据库&#xff0c;专门用于处理时间序列相关的数据存储与查询&#xff1b; QuestDB使用列式存储模型。数据存储在表中&#xff0c;每列存储在其自己的文件和其自己的本机格式中。新数据被附加到每列的底部&#xff0c;以便能够按照与摄取…

别再局限于Android和iOS了尝试鸿蒙APP系统开发吧!

最近&#xff0c;多家互联网公司也发布了鸿蒙OS的App开发工程师的岗位&#xff0c;开启了抢人大战&#xff0c;有的企业开出了近百万的年薪招聘鸿蒙OS工程师&#xff0c;而华为甚至为鸿蒙OS资深架构师开出了100万元—160万元的年薪。 「纯血」鸿蒙开启&#xff0c;欲与 Andori…

WEBDYNPRO FPM 框架

框架搭建 1、FPM_OVP_COMPONENT 1 METHOD change_toolbar_btn .2 * enabled "ABAP_TRUE可用 ABAP_FALSE不可用3 * visibility "01不可见 02可见4 DATA: ls_btn TYPE if_fpm_ovp>ty_s_toolbar_button.5 CHECK wd_this->mo_cnr IS BOUND.6 7 TRY .8 …

2011-2022年北大数字普惠金融指数“第五期”(包括省市县)

2011-2022年北大数字普惠金融指数“第五期”&#xff08;包括省市县&#xff09; 1、时间&#xff1a;2011-2022年 其中县级的时间为2014-2022年 2、来源&#xff1a;北大数字普惠金融指数 3、范围&#xff1a;全国31省&#xff0c;337个地级市以及2800个县 4、指标&#x…

API调试?试试Apipost

你是否经常遇到接口开发过程中的各种问题&#xff1f;或许你曾为接口测试与调试的繁琐流程而烦恼。不要担心&#xff01;今天我将向大家介绍一款功能强大、易于上手的接口测试工具——Apipost&#xff0c;并带你深入了解如何玩转它&#xff0c;轻松实现接口测试与调试。 什么是…

linux更新内核

内核介绍 官网链接:https://kernel.org 内核下载库: https://mirrors.edge.kernel.org/pub/linux/kernel/ 更新软件源 rootcary:~# apt-get update rootcary:~# sudo apt-get install libncurses5-dev build-essential kernel-package flex bison libelf-dev libssl-dev 下…

机器学习实验报告——Bayes算法

目录 一、算法介绍 1.1算法背景 1.2算法假设 1.3 贝叶斯与朴素贝叶斯 1.4算法原理 二、算法推导 2.1朴素贝叶斯介绍 2.2朴素贝叶斯算法推导 2.2.1先验后验概率 2.2.2条件概率公式 2.3 独立性假设 2.4 朴素贝叶斯推导 三、算法实现 3.1数据集描述 3.2代码实现 四…

SpringBoot:Bean生命周期自定义初始化和销毁

&#x1f3e1;浩泽学编程&#xff1a;个人主页 &#x1f525; 推荐专栏&#xff1a;《深入浅出SpringBoot》《java项目分享》 《RabbitMQ》《Spring》《SpringMVC》 &#x1f6f8;学无止境&#xff0c;不骄不躁&#xff0c;知行合一 文章目录 前言一、Bean注解指…

如何做好一个信息系统项目经理,一个项目经理的个人体会和经验总结(三)

前言 今天我们继续聊聊在 项目开发阶段&#xff0c;项目经理需要做好的事情 &#x1f603; 二、项目开发阶段&#xff08;续&#xff09; 4. 控制好项目开发质量 要控制好项目开发质量&#xff0c;主要是依赖测试&#xff0c;好的产品都是靠不断地测试&#xff0c;不断地试…

《WebKit 技术内幕》学习之四(3): 资源加载和网络栈

3. 网络栈 3.1 WebKit的网络设施 WebKit的资源加载其实是交由各个移植来实现的&#xff0c;所以WebCore其实并没有什么特别的基础设施&#xff0c;每个移植的网络实现是非常不一样的。 从WebKit的代码结构中可以看出&#xff0c;网络部分代码的确比较少的&#xff0c;它们都在…

西方企业在与中国的竞争中,无可避免地“效仿中国”

长期以来&#xff0c;在西方观察家的视野里&#xff0c;中国科技领域的成功突破主要归结于三大支柱&#xff1a;一是中国建立了完备的基础设施网络&#xff1b;二是大量创新型企业如雨后春笋般涌现&#xff0c;以惊人的速度追赶乃至超越美国硅谷的企业&#xff1b;三是这些创新…

wps word 文档里的空白空间太大了

wps word 文档里的空白空间太大了&#xff0c;如下图1 点击【页面】--->【页边距】&#xff0c;把左边、右边的页边距调为0厘米。如下图2 点击【视图】--->【显示比例】从75%改为页宽&#xff0c;页宽的意思是使页面的宽度与窗口的宽度一致。如下图3 图1

浪花 - 用户加入队伍

一、接口设计 1. 请求参数&#xff1a;TeamJoinRequest package com.example.usercenter.model.request;import lombok.Data; import java.io.Serializable;/*** 加入队伍请求参数封装类*/ Data public class TeamJoinRequest implements Serializable {private static final…

用Axure RP 9制作弹出框

制作流程 1.准备文本框 下拉列表 按钮 动态面板 如图 2.先把下拉列表放好 再放动态面板覆盖 3.点动态面板 进入界面 如图 4.给按钮添加交互 3个按钮一样的 如图 5.提交按钮添加交互 如图

linux安装python3.11

yum -y gcc install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel下载地址 https://www.python.org/ftp/python/3.11.7/Python-3.11.7.tar.xz 上传python文件&#x…

Kafka(二)【文件存储机制 生产者】

目录 一、Kafka 文件存储机制 二、Kafka 生产者 1、生产者消息发送流程 1.1、发送原理 2、异步发送 API 2.1、普通异步发送 案例演示 2.2、带回调函数的异步发送 2.3、同步发送 API 3、生产者分区 3.1、分区的好处 3.2、生产者发送消息的分区策略 &#xff08;1&am…