在Binaryen中有一个优化名为Precompute,作用是进行一些提前计算,类似于LLVM中的常量折叠类似的操作。
涉及的提交文件在此。
首先放一下全部的代码:
// To partially precompute selects we walk up the stack from them, like this://// (A// (B// (select// (C)// (D)// (condition)// )// )// )//// First we try to apply B to C and D. If that works, we arrive at this://// (A// (select// (constant result of B(C))// (constant result of B(D))// (condition)// )// )//// We can then proceed to perhaps apply A. However, even if we failed to apply// B then we can try to apply A and B together, because that combination may// succeed where incremental work fails, for example://// (global $C// (struct.new ;; outer// (struct.new ;; inner// (i32.const 10)// )// )// )//// (struct.get ;; outer// (struct.get ;; inner// (select// (global.get $C)// (global.get $D)// (condition)// )// )// )//// Applying the inner struct.get to $C leads us to the inner struct.new, but// that is an interior pointer in the global - it is not something we can// refer to using a global.get, so precomputing it fails. However, when we// apply both struct.gets at once we arrive at the outer struct.new, which is// in fact the global $C, and we succeed.void partiallyPrecompute(Function* func) {if (!canPartiallyPrecompute || partiallyPrecomputable.empty()) {// Nothing to do.return;}// Walk the function to find the parent stacks of the promising selects. We// copy the stacks and process them later. We do it like this because if we// wanted to process stacks as we reached them then we'd trip over// ourselves: when we optimize we replace a parent, but that parent is an// expression we'll reach later in the walk, so modifying it is unsafe.struct StackFinder : public ExpressionStackWalker<StackFinder> {Precompute& parent;StackFinder(Precompute& parent) : parent(parent) {}// We will later iterate on this in the order of insertion, which keeps// things deterministic, and also usually lets us do consecutive work// like a select nested in another select's condition, simply because we// will traverse the selects in postorder (however, because we cannot// always succeed in an incremental manner - see the comment on this// function - it is possible in theory that some work can happen only in a// later execution of the pass).InsertOrderedMap<Select*, ExpressionStack> stackMap;void visitSelect(Select* curr) {if (parent.partiallyPrecomputable.count(curr)) {stackMap[curr] = expressionStack;}}} stackFinder(*this);stackFinder.walkFunction(func);// Note which expressions we've modified as we go, as it is invalid to// modify more than once. This could happen in theory in a situation like// this://// (ternary.f32.max ;; fictional instruction for explanatory purposes// (select ..)// (select ..)// (f32.infinity)// )//// When we consider the first select we can see that the computation result// is always infinity, so we can optimize here and replace the ternary. Then// the same thing happens with the second select, causing the ternary to be// replaced again, which is unsafe because it no longer exists after we// precomputed it the first time. (Note that in this example the result is// the same either way, but at least in theory an instruction could exist// for whom there was a difference.) In practice it does not seem that wasm// has instructions capable of this atm but this code is still useful to// guard against future problems, and as a minor speedup (quickly skip code// if it was already modified).std::unordered_set<Expression*> modified;for (auto& [select, stack] : stackFinder.stackMap) {// Each stack ends in the select itself, and contains more than the select// itself (otherwise we'd have ignored the select), i.e., the select has a// parent that we can try to optimize into the arms.assert(stack.back() == select);assert(stack.size() >= 2);Index selectIndex = stack.size() - 1;assert(selectIndex >= 1);if (modified.count(select)) {// This select was modified; go to the next one.continue;}// Go up through the parents, until we can't do any more work. At each// parent we'll try to execute it and all intermediate parents into the// select arms.for (Index parentIndex = selectIndex - 1; parentIndex != Index(-1);parentIndex--) {auto* parent = stack[parentIndex];if (modified.count(parent)) {// This parent was modified; exit the loop on parents as no upper// parent is valid to try either.break;}// If the parent lacks a concrete type then we can't move it into the// select: the select needs a concrete (and non-tuple) type. For example// if the parent is a drop or is unreachable, those are things we don't// want to handle, and we stop here (once we see one such parent we// can't expect to make any more progress).if (!parent->type.isConcrete() || parent->type.isTuple()) {break;}// We are precomputing the select arms, but leaving the condition as-is.// If the condition breaks to the parent, then we can't move the parent// into the select arms://// (block $name ;; this must stay outside of the select// (select// (B)// (C)// (block ;; condition// (br_if $target//// Ignore all control flow for simplicity, as they aren't interesting// for us, and other passes should have removed them anyhow.if (Properties::isControlFlowStructure(parent)) {break;}// This looks promising, so try to precompute here. What we do is// precompute twice, once with the select replaced with the left arm,// and once with the right. If both succeed then we can create a new// select (with the same condition as before) whose arms are the// precomputed values.auto isValidPrecomputation = [&](const Flow& flow) {// For now we handle simple concrete values. We could also handle// breaks in principle TODOreturn canEmitConstantFor(flow.values) && !flow.breaking() &&flow.values.isConcrete();};// Find the pointer to the select in its immediate parent so that we can// replace it first with one arm and then the other.auto** pointerToSelect =getChildPointerInImmediateParent(stack, selectIndex, func);*pointerToSelect = select->ifTrue;auto ifTrue = precomputeExpression(parent);if (isValidPrecomputation(ifTrue)) {*pointerToSelect = select->ifFalse;auto ifFalse = precomputeExpression(parent);if (isValidPrecomputation(ifFalse)) {// Wonderful, we can precompute here! The select can now contain the// computed values in its arms.select->ifTrue = ifTrue.getConstExpression(*getModule());select->ifFalse = ifFalse.getConstExpression(*getModule());select->finalize();// The parent of the select is now replaced by the select.auto** pointerToParent =getChildPointerInImmediateParent(stack, parentIndex, func);*pointerToParent = select;// Update state for further iterations: Mark everything modified and// move the select to the parent's location.for (Index i = parentIndex; i <= selectIndex; i++) {modified.insert(stack[i]);}selectIndex = parentIndex;stack[selectIndex] = select;stack.resize(selectIndex + 1);}}// Whether we succeeded to precompute here or not, restore the parent's// pointer to its original state (if we precomputed, the parent is no// longer in use, but there is no harm in modifying it).*pointerToSelect = select;}}}
以下各部分将详细解释:
if (!canPartiallyPrecompute || partiallyPrecomputable.empty()) {// Nothing to do.return;}
首先判断是否需要进行partiallyPrecompute。
而如果存在select指令的话,partiallyPrecomputable中会存放这个元素。
然后是定义一个StackFinder,此StackFinder可以找到select对应的栈式结构。
stackFinder.walkFunction(func);
开始分析一个func,
void walkFunction(Function* func) {setFunction(func);static_cast<SubType*>(this)->doWalkFunction(func);static_cast<SubType*>(this)->visitFunction(func);setFunction(nullptr);}
在开始和末尾分别清除了func,重点关注中间两行。
首先doWalkFunction会对func->body进行walk。
void walk(Expression*& root) {assert(stack.size() == 0);pushTask(SubType::scan, &root);while (stack.size() > 0) {auto task = popTask();replacep = task.currp;assert(*task.currp);task.func(static_cast<SubType*>(this), task.currp);}}
首先向stack中压栈,但stack的定义类型是SmallVector<Task, 10> stack;
,所以是一个任务栈,用来记录需要执行的任务,你可以理解为栈中的内容都是函数指针。Task的定义如下,一个TaskFunc记录需要执行的函数,currp指向当前的Expression,可以理解为AST的一个结点 。
struct Task {TaskFunc func;Expression** currp;Task() {}Task(TaskFunc func, Expression** currp) : func(func), currp(currp) {}};
上述操作最终会形成一个stack,存储Select指令的各个分支和condition。
然后是一系列的判断条件,例如是否该select指令发生了修改,其parent结点是否不满足某些情况。
// Find the pointer to the select in its immediate parent so that we can// replace it first with one arm and then the other.auto** pointerToSelect =getChildPointerInImmediateParent(stack, selectIndex, func);*pointerToSelect = select->ifTrue;auto ifTrue = precomputeExpression(parent);
上面代码调用了precomputeExpression,定义看下方,其实就是判断是否能够生成常量。
// This looks promising, so try to precompute here. What we do is// precompute twice, once with the select replaced with the left arm,// and once with the right. If both succeed then we can create a new// select (with the same condition as before) whose arms are the// precomputed values.auto isValidPrecomputation = [&](const Flow& flow) {// For now we handle simple concrete values. We could also handle// breaks in principle TODOreturn canEmitConstantFor(flow.values) && !flow.breaking() &&flow.values.isConcrete();};
如果True分支和False分支都可以precompute的话,修改select到一个新的select:
// Wonderful, we can precompute here! The select can now contain the// computed values in its arms.select->ifTrue = ifTrue.getConstExpression(*getModule());select->ifFalse = ifFalse.getConstExpression(*getModule());select->finalize();// The parent of the select is now replaced by the select.auto** pointerToParent =getChildPointerInImmediateParent(stack, parentIndex, func);*pointerToParent = select;// Update state for further iterations: Mark everything modified and// move the select to the parent's location.for (Index i = parentIndex; i <= selectIndex; i++) {modified.insert(stack[i]);}selectIndex = parentIndex;stack[selectIndex] = select;stack.resize(selectIndex + 1);
其实就是新建一个select,true分支变为新的true分支,false分支变为新的false分支,condition不变,然后将其放入stack中,同时添加到modified中。