【Binaryen】partiallyPrecompute函数梳理

在Binaryen中有一个优化名为Precompute,作用是进行一些提前计算,类似于LLVM中的常量折叠类似的操作。
涉及的提交文件在此。

首先放一下全部的代码:

// To partially precompute selects we walk up the stack from them, like this:////  (A//    (B//      (select//        (C)//        (D)//        (condition)//      )//    )//  )//// First we try to apply B to C and D. If that works, we arrive at this:////  (A//    (select//      (constant result of B(C))//      (constant result of B(D))//      (condition)//    )//  )//// We can then proceed to perhaps apply A. However, even if we failed to apply// B then we can try to apply A and B together, because that combination may// succeed where incremental work fails, for example:////  (global $C//    (struct.new    ;; outer//      (struct.new  ;; inner//        (i32.const 10)//      )//    )//  )////  (struct.get    ;; outer//    (struct.get  ;; inner//      (select//        (global.get $C)//        (global.get $D)//        (condition)//      )//    )//  )//// Applying the inner struct.get to $C leads us to the inner struct.new, but// that is an interior pointer in the global - it is not something we can// refer to using a global.get, so precomputing it fails. However, when we// apply both struct.gets at once we arrive at the outer struct.new, which is// in fact the global $C, and we succeed.void partiallyPrecompute(Function* func) {if (!canPartiallyPrecompute || partiallyPrecomputable.empty()) {// Nothing to do.return;}// Walk the function to find the parent stacks of the promising selects. We// copy the stacks and process them later. We do it like this because if we// wanted to process stacks as we reached them then we'd trip over// ourselves: when we optimize we replace a parent, but that parent is an// expression we'll reach later in the walk, so modifying it is unsafe.struct StackFinder : public ExpressionStackWalker<StackFinder> {Precompute& parent;StackFinder(Precompute& parent) : parent(parent) {}// We will later iterate on this in the order of insertion, which keeps// things deterministic, and also usually lets us do consecutive work// like a select nested in another select's condition, simply because we// will traverse the selects in postorder (however, because we cannot// always succeed in an incremental manner - see the comment on this// function - it is possible in theory that some work can happen only in a// later execution of the pass).InsertOrderedMap<Select*, ExpressionStack> stackMap;void visitSelect(Select* curr) {if (parent.partiallyPrecomputable.count(curr)) {stackMap[curr] = expressionStack;}}} stackFinder(*this);stackFinder.walkFunction(func);// Note which expressions we've modified as we go, as it is invalid to// modify more than once. This could happen in theory in a situation like// this:////  (ternary.f32.max  ;; fictional instruction for explanatory purposes//    (select ..)//    (select ..)//    (f32.infinity)//  )//// When we consider the first select we can see that the computation result// is always infinity, so we can optimize here and replace the ternary. Then// the same thing happens with the second select, causing the ternary to be// replaced again, which is unsafe because it no longer exists after we// precomputed it the first time. (Note that in this example the result is// the same either way, but at least in theory an instruction could exist// for whom there was a difference.) In practice it does not seem that wasm// has instructions capable of this atm but this code is still useful to// guard against future problems, and as a minor speedup (quickly skip code// if it was already modified).std::unordered_set<Expression*> modified;for (auto& [select, stack] : stackFinder.stackMap) {// Each stack ends in the select itself, and contains more than the select// itself (otherwise we'd have ignored the select), i.e., the select has a// parent that we can try to optimize into the arms.assert(stack.back() == select);assert(stack.size() >= 2);Index selectIndex = stack.size() - 1;assert(selectIndex >= 1);if (modified.count(select)) {// This select was modified; go to the next one.continue;}// Go up through the parents, until we can't do any more work. At each// parent we'll try to execute it and all intermediate parents into the// select arms.for (Index parentIndex = selectIndex - 1; parentIndex != Index(-1);parentIndex--) {auto* parent = stack[parentIndex];if (modified.count(parent)) {// This parent was modified; exit the loop on parents as no upper// parent is valid to try either.break;}// If the parent lacks a concrete type then we can't move it into the// select: the select needs a concrete (and non-tuple) type. For example// if the parent is a drop or is unreachable, those are things we don't// want to handle, and we stop here (once we see one such parent we// can't expect to make any more progress).if (!parent->type.isConcrete() || parent->type.isTuple()) {break;}// We are precomputing the select arms, but leaving the condition as-is.// If the condition breaks to the parent, then we can't move the parent// into the select arms:////  (block $name ;; this must stay outside of the select//    (select//      (B)//      (C)//      (block ;; condition//        (br_if $target//// Ignore all control flow for simplicity, as they aren't interesting// for us, and other passes should have removed them anyhow.if (Properties::isControlFlowStructure(parent)) {break;}// This looks promising, so try to precompute here. What we do is// precompute twice, once with the select replaced with the left arm,// and once with the right. If both succeed then we can create a new// select (with the same condition as before) whose arms are the// precomputed values.auto isValidPrecomputation = [&](const Flow& flow) {// For now we handle simple concrete values. We could also handle// breaks in principle TODOreturn canEmitConstantFor(flow.values) && !flow.breaking() &&flow.values.isConcrete();};// Find the pointer to the select in its immediate parent so that we can// replace it first with one arm and then the other.auto** pointerToSelect =getChildPointerInImmediateParent(stack, selectIndex, func);*pointerToSelect = select->ifTrue;auto ifTrue = precomputeExpression(parent);if (isValidPrecomputation(ifTrue)) {*pointerToSelect = select->ifFalse;auto ifFalse = precomputeExpression(parent);if (isValidPrecomputation(ifFalse)) {// Wonderful, we can precompute here! The select can now contain the// computed values in its arms.select->ifTrue = ifTrue.getConstExpression(*getModule());select->ifFalse = ifFalse.getConstExpression(*getModule());select->finalize();// The parent of the select is now replaced by the select.auto** pointerToParent =getChildPointerInImmediateParent(stack, parentIndex, func);*pointerToParent = select;// Update state for further iterations: Mark everything modified and// move the select to the parent's location.for (Index i = parentIndex; i <= selectIndex; i++) {modified.insert(stack[i]);}selectIndex = parentIndex;stack[selectIndex] = select;stack.resize(selectIndex + 1);}}// Whether we succeeded to precompute here or not, restore the parent's// pointer to its original state (if we precomputed, the parent is no// longer in use, but there is no harm in modifying it).*pointerToSelect = select;}}}

以下各部分将详细解释:

if (!canPartiallyPrecompute || partiallyPrecomputable.empty()) {// Nothing to do.return;}

首先判断是否需要进行partiallyPrecompute。
而如果存在select指令的话,partiallyPrecomputable中会存放这个元素。

然后是定义一个StackFinder,此StackFinder可以找到select对应的栈式结构。
stackFinder.walkFunction(func);开始分析一个func,

void walkFunction(Function* func) {setFunction(func);static_cast<SubType*>(this)->doWalkFunction(func);static_cast<SubType*>(this)->visitFunction(func);setFunction(nullptr);}

在开始和末尾分别清除了func,重点关注中间两行。
首先doWalkFunction会对func->body进行walk。

  void walk(Expression*& root) {assert(stack.size() == 0);pushTask(SubType::scan, &root);while (stack.size() > 0) {auto task = popTask();replacep = task.currp;assert(*task.currp);task.func(static_cast<SubType*>(this), task.currp);}}

首先向stack中压栈,但stack的定义类型是SmallVector<Task, 10> stack;,所以是一个任务栈,用来记录需要执行的任务,你可以理解为栈中的内容都是函数指针。Task的定义如下,一个TaskFunc记录需要执行的函数,currp指向当前的Expression,可以理解为AST的一个结点 。

  struct Task {TaskFunc func;Expression** currp;Task() {}Task(TaskFunc func, Expression** currp) : func(func), currp(currp) {}};

上述操作最终会形成一个stack,存储Select指令的各个分支和condition。

然后是一系列的判断条件,例如是否该select指令发生了修改,其parent结点是否不满足某些情况。

// Find the pointer to the select in its immediate parent so that we can// replace it first with one arm and then the other.auto** pointerToSelect =getChildPointerInImmediateParent(stack, selectIndex, func);*pointerToSelect = select->ifTrue;auto ifTrue = precomputeExpression(parent);

上面代码调用了precomputeExpression,定义看下方,其实就是判断是否能够生成常量。

// This looks promising, so try to precompute here. What we do is// precompute twice, once with the select replaced with the left arm,// and once with the right. If both succeed then we can create a new// select (with the same condition as before) whose arms are the// precomputed values.auto isValidPrecomputation = [&](const Flow& flow) {// For now we handle simple concrete values. We could also handle// breaks in principle TODOreturn canEmitConstantFor(flow.values) && !flow.breaking() &&flow.values.isConcrete();};

如果True分支和False分支都可以precompute的话,修改select到一个新的select:

// Wonderful, we can precompute here! The select can now contain the// computed values in its arms.select->ifTrue = ifTrue.getConstExpression(*getModule());select->ifFalse = ifFalse.getConstExpression(*getModule());select->finalize();// The parent of the select is now replaced by the select.auto** pointerToParent =getChildPointerInImmediateParent(stack, parentIndex, func);*pointerToParent = select;// Update state for further iterations: Mark everything modified and// move the select to the parent's location.for (Index i = parentIndex; i <= selectIndex; i++) {modified.insert(stack[i]);}selectIndex = parentIndex;stack[selectIndex] = select;stack.resize(selectIndex + 1);

其实就是新建一个select,true分支变为新的true分支,false分支变为新的false分支,condition不变,然后将其放入stack中,同时添加到modified中。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/22165.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

vs - vs2013/vs2019工程文件的区别

文章目录 vs - vs2013/vs2019工程文件的区别概述笔记sln文件的区别VisualStudioVersion vcxproj文件的区别ToolsVersionPlatformToolset 备注更方便的方法END vs - vs2013/vs2019工程文件的区别 概述 为了避免安装UCRT的问题&#xff0c;想将手头的vs2019工程降级为vs2013工程…

VLM MobileVLM 部署笔记

目录 模型是自动下载的 在1060显卡上能跑 LLaMA Meta MobileVLM V2 MobileLLaMA-1.4B 调错 开源项目地址&#xff1a; GitHub - Meituan-AutoML/MobileVLM: Strong and Open Vision Language Assistant for Mobile Devices 模型是自动下载的 路径&#xff1a; C:\User…

解决Mac ~/.bash_profile 配置的环境变量重启终端后失效问题

在Mac系统中&#xff0c;配置环境变量通常是在~/.bash_profile文件中进行。然而&#xff0c;有时会遇到配置的环境变量在重启终端后失效的问题。 解决办法&#xff1a; 在~/.zshrc文件最后或最前面&#xff0c;增加一行 source ~/.bash_profile

SARscape雷达图像处理软件简介

合成孔径雷达&#xff08;SAR&#xff09;拥有独特的技术魅力和优势&#xff0c;渐成为国际上的研究热点之一&#xff0c;其应用领域越来越广泛。SAR数据可以全天候对研究区域进行量测、分析以及获取目标信息。高级雷达图像处理工具SARscape&#xff0c;能让您轻松将原始SAR数据…

Leetcode 第 131 场双周赛题解

Leetcode 第 131 场双周赛题解 Leetcode 第 131 场双周赛题解题目1&#xff1a;3158. 求出出现两次数字的 XOR 值思路代码复杂度分析 题目2&#xff1a;3159. 查询数组中元素的出现位置思路代码复杂度分析 题目3&#xff1a;3160. 所有球里面不同颜色的数目思路代码复杂度分析 …

AI 时代,产品经理该如何进化

前言 传统的互联网业务或者游戏业务&#xff0c;产品或者业务输出需求&#xff0c;技术人员只需要指哪打哪就好了。而人工智能发展到当下这个尴尬的阶段&#xff0c;仿佛它能干很多事&#xff0c;但是真把它往业务里搁就发现&#xff0c;这个叛逆的小东西不一定胜任的了这些有…

AI大模型学习笔记之四:生成式人工智能是如何工作的?

OpenAI 发布 ChatGPT 已经1年多了&#xff0c;生成式人工智能&#xff08;AIGC&#xff09;也已经广为人知&#xff0c;我们常常津津乐道于 ChatGPT 和 Claude 这样的人工智能系统能够神奇地生成文本与我们对话&#xff0c;并且能够记忆上下文情境。 GPT-4多模态分析对话 Midj…

数字机顶盒、显示器方案DCDC芯片OC5816 2A,18V同步降压DC-DC

概述 OC5816 是一款 2A 的高集成度、高效率同步整流降压转换器。在一个相当宽的输出电流负载范围内&#xff0c;OC5816 可以高效工作。 OC5816 的两种工作模式&#xff0c;固定频率PWM 峰值电流控制和轻载 PFM 开关模式&#xff0c;允许系统高效工作在一个相当宽的输出电流…

i 人 聊 天 手 册(e人禁止入内)

在之前的读书笔记-《蔡康永的说话之道》中&#xff0c;作者给大家分享了很多具体的要点&#xff0c;其更偏向于战术层面&#xff0c;我更想要的是一个类似聊天手册的东西&#xff0c;就让我自己来总结下吧。 虽然在 MBTI 中&#xff0c;按照获取能量的方式定义了 i 人、e 人&a…

【面试干货】如何选择MySQL数据库存储引擎(MyISAM 或 InnoDB)

【面试干货】如何选择MySQL数据库存储引擎(MyISAM 或 InnoDB&#xff09; &#x1f496;The Begin&#x1f496;点点关注&#xff0c;收藏不迷路&#x1f496; MySQL数据库存储引擎是一个 关键 的考虑因素。MySQL提供了多种存储引擎&#xff0c;其中最常用的是 MyISAM 和 InnoD…

封装一个页面自适应方法

封装一个页面自适应方法 在 Vue 中&#xff0c;你可以封装一个页面自适应的方法来根据屏幕大小动态调整页面的布局和样式。以下是一个示例代码&#xff1a; export const getPageSize () > {const { innerWidth, innerHeight } window;const width innerWidth > 192…

攻防世界---misc---a_good_idea

1、下载附件得到一张图片&#xff0c;winhex分析&#xff0c;发现有压缩包 2、在kali中用普通用户对jpg进行binwalk 3、得到两张图片和一个文本&#xff0c;查看文本信息&#xff1a;提示试着找到像素的秘密 4、提到像素就想到了Stegsolve这个工具&#xff0c;将这两张图片用该…

rpm打包 postgres14.9 repmgr pgpool

rpm打包 postgres14.9 repmgr pgpool 上一篇讲解了rpm打包的基础知识之后&#xff0c;我们就可以根据实际业务自行打包了&#xff0c;需要注意的是依赖问题&#xff0c;需要提前讲依赖准备好&#xff0c;对于各种系统需要的依赖的依赖也不一致&#xff0c;可以根据具体报错去相…

Python项目开发实战:二手房数据分析预测系统(案例教程)

一、项目背景与意义 在房地产市场日益繁荣的今天,二手房市场占据了重要地位。对于购房者、房地产中介和开发商来说,了解二手房市场的动态、价格趋势以及潜在价值至关重要。因此,开发一个基于Python的二手房数据分析预测系统具有实际应用价值和商业意义。本项目旨在利用Pytho…

2024.05.21 校招 实习 内推 面经

绿*泡*泡VX&#xff1a; neituijunsir 交流*裙 &#xff0c;内推/实习/校招汇总表格 1、实习 | 云鲸智能暑期实习热招岗位&#xff08;内推&#xff09; 实习 | 云鲸智能暑期实习热招岗位&#xff08;内推&#xff09; 2、实习 | 亚马逊实习生招聘倒计时&#xff01; 实习…

HOW - Lodash 使用指南和源码学习

目录 一、什么是 lodash二、为什么需要 Lodash三、API 分类ArrayCollectionDateFunctionLangMathNumberObjectStringSeqUtil 我们经常在项目里遇到 Lodash 函数的引入&#xff0c;如&#xff1a; debounce(Function)cloneDeep(Lang)isNull(Lang)isUndefined(Lang)isNil(Lang)i…

106、python-第四阶段-3-设计模式-单例模式

不是单例类&#xff0c;如下&#xff1a; class StrTools():pass str1StrTools() str2StrTools() print(str1) print(str2) 运用单例&#xff0c;先创建一个test.py class StrTools():pass str1StrTools()然后创建一个hello.py&#xff0c;在这个文件中引用test.py中的对象&a…

JVM-JAVA-双亲委派机制

双亲委派机制 双亲委派机制Tomcat打破双亲委派机制 双亲委派机制 双亲委派机制&#xff0c;加载某个类时会先委托父加载器寻找目标类&#xff0c;找不到再委托上层父加载器加载&#xff0c;如果所有父加载器在自己的加载类路径下都找不到目标类&#xff0c;则在自己的类加载路径…

网络攻击的常见形式

开篇 本篇文章来自于《网络安全 ——技术与实践》的学习整理笔记。 正篇 口令窃取 相比于利用系统缺陷破坏网络系统&#xff0c;最容易的方法还是通过窃取用户的口令进入系统。因为人们倾向于选择很糟糕的口令作为登录密码&#xff0c;所以口令猜测很容易成功。通常&#xff0…

C语言:基础知识

创作不易&#xff0c;友友们给个三连吧 一、C语⾔的基本概念与发展历史 1.1 人和计算机进行交流的语言 通常&#xff0c;我们使用英语、中文等语言来进行两个人之间的交流。这意味着当我们想要和他人进行交流时&#xff0c;我们需要一种语言来表达自己的感受。同样的&#xf…