全文 - MLIR Toy Tutorial Chapter 4: 使用 interfaces 开启通用变换

Eloudy

已于 2025-04-10 20:17:32 修改

阅读量819

点赞数 5

文章标签： mlir 编译器

于 2025-04-10 12:12:07 首次发布

本文链接：https://blog.youkuaiyun.com/eloudy/article/details/147099364

版权

背景：与可扩展的 IR 搏斗

通过 dialects，MLIR 允许表示很多不同的抽象层级；前几章我们定义的 Toy dialect 就是一个例子。虽然不同的 dialects，可以表示不同的抽象，但是，它们之间通常存在我们会用到的一组共享的编译器变换和分析。这里会出现的问题是这样的，幼稚地为每一个 dialect 实现每一个变换，将导致大量的重复代码，因为他们内部的算法一般是相似的，甚至是一样的。我们会给编译器变换提供一种能力，去透明地 hook 进像 Toy 这样的 dialect，以便获得他们需要的信息。
MLIR 为一些核心的变换提供了一组总是可以获得的 hooks，如同前几章所见，我们通过一个 hook 在我们的 operations 上注册了一些 canonicalizations（getCanonicalizationPatterns）。然而，这些类型的 hooks 扩展起来并不真的很好。因此，一个更加通用的方案被设计出来，这在 MLIR 中的术语叫做 interfaces，以便使得 MLIR 基础设施如同其做到的表示一样是可扩展的。interfaces 为 dialects 和 operations 提供一个通用的机制，以便给编译器变换和分析提供信息。

形状推导：为生成代码做好准备

我们的 Toy IR 当前是工作在泛型张量上的，也就是说，如果不是常量张量的初始化，我们是不知张量的形状的。这使得优化和代码生成都变得复杂了。幸运的是，我们可以通过计算过程来简单地传播形状，直到所有张量的形状都被知道。问题是怎么处理对用户定义的泛型函数的调用：每个调用处可能生成不同的张量的形状。一个可能性是基于参数类型做符号推导，但是，如果我们在这个语言中引入更多的控制流的话，这可能会比较难以泛化这种符号推导的方法。另一个方法是函数特化，在携带新参数形状的函数调用处，复制一份函数，并且将它特化。我们在 Toy 语言中使用的方式是将所有的函数调用做内联，然后执行过程内的形状传播。

做内联

这里我们或许可以写一个特别为 Toy dialect 设计的内联算法，但是，根据我们希望的复杂级别，事情可能变的相当复杂。就算不考虑建模的成本，光纯粹从无到有的实现结构的变换已经是一件复杂的工作了。令人感到庆幸，MLIR 提供了一个 dialects 可以插入的通用的内联算法。我们只需要在 Toy 中提供 interfaces 给内联器，以便其 hook 进去。我们首先要做的事情是在 Toy dialect 中定义在内联 operations 上的约束条件。约束信息是通过一个 dialect interface 提供的。
它本质上是一个包含了一组虚钩子函数的 class，dialect 可以重载这些虚钩子函数。在这个例子中，这个 interface 是 DialectInlinerInterface。

/// This class defines the interface for handling inlining with Toy operations.
/// We simplify inherit from the base interface class and override
/// the necessary methods.
struct ToyInlinerInterface : public DialectInlinerInterface {
  using DialectInlinerInterface::DialectInlinerInterface;

  /// This hook checks to see if the given callable operation is legal to inline
  /// into the given call. For Toy this hook can simply return true, as the Toy
  /// Call operation is always inlinable.
  bool isLegalToInline(Operation *call, Operation *callable,
                       bool wouldBeCloned) const final {
    return true;
  }

  /// This hook checks to see if the given operation is legal to inline into the
  /// given region. For Toy this hook can simply return true, as all Toy
  /// operations are inlinable.
  bool isLegalToInline(Operation *, Region *, bool,
                       IRMapping &) const final {
    return true;
  }

  /// This hook cheks if the given 'src' region can be inlined into the 'dest'
  /// region. The regions here are the bodies of the callable functions. For
  /// Toy, any function can be inlined, so we simply return true.
  bool isLegalToInline(Region *dest, Region *src, bool wouldBeCloned,
                       IRMapping &valueMapping) const final {
    return true;
  }

  /// This hook is called when a terminator operation has been inlined. The only
  /// terminator that we have in the Toy dialect is the return
  /// operation(toy.return). We handle the return by replacing the values
  /// previously returned by the call operation with the operands of the
  /// return.
  void handleTerminator(Operation *op,
                        ValueRange valuesToRepl) const final {
    // Only "toy.return" needs to be handled here.
    auto returnOp = cast<ReturnOp>(op);

    // Replace the values directly with the return operands.
    assert(returnOp.getNumOperands() == valuesToRepl.size());
    for (const auto &it : llvm::enumerate(returnOp.getOperands()))
      valuesToRepl[it.index()].replaceAllUsesWith(it.value());
  }
};

除此之外，这个内联器将会丢弃那些私有可见且未被使用的函数定义。我们也必须设置函数在 MLIR 生成器中的可见性（main 函数除外）。

/// Emit a new function and add it to the MLIR module.
mlir::toy::FuncOp mlirGen(FunctionAST &funcAST) {
  ...
  // If this function isn't main, then set the visibility to private.
  if (funcAST.getProto()->getName() != "main")
    function.setPrivate();

  return function;
}

然后，跟处理 operations 那样，直接将我们的 dialect interface 注册到 Toy dialect上。

void ToyDialect::initialize() {
  addInterfaces<ToyInlinerInterface>();
}

接下来，我们需要提供一个途径，让内联器知道 toy.generic_call 表示一个调用，而 toy.func 表示一个函数。
MLIR 提供 operation inteface 来标记一个 operation 为 call-like 或者是 callable-like 的。
不像 dialect interfaces, operation interfaces 对于一个 operation 的具体的核心的信息，提供了更加精细的粒度。
这里我们将要添加的 operation interfaces 是 CallOpInterface 和 CallableOpInterface。

为了增加这个 interface，我们需要在我们的 operation specification file （Ops.Td）中包含其定义:

include "mlir/Interfaces/CallInterfaces.td"

并且把 CallOpInterface 添加进 GenericCallOp 的 traits 列表中：

def FuncOp : Toy_Op<"func",
    [FunctionOpInterface, IsolatedFromAbove]> {
  ...
}

def GenericCallOp : Toy_Op<"generic_call",
    [DeclareOpInterfaceMethods<CallOpInterface>]> {
  ...
}

上边的代码里，我们使用了 DeclareOpInterfaceMethods 指令自动声明了CallOpInterface 的所有方法到 GenericCallOp 的 class 声明中。
我们已经在 FuncOp class 的字段 extraClassDeclaration 中提供了定义：

/// Returns the region on the function operation that is callable.
Region *FuncOp::getCallableRegion() { return &getBody(); }

// ....

/// Return the callee of the generic call operation, this is required by the
/// call interface.
CallInterfaceCallable GenericCallOp::getCallableForCallee() {
  return (*this)->getAttrOfType<SymbolRefAttr>("callee");
}

/// Set the callee for the generic call operation, this is required by the call
/// interface.
void GenericCallOp::setCalleeFromCallable(CallInterfaceCallable callee) {
  (*this)->setAttr("callee", callee.get<SymbolRefAttr>());
}

/// Get the argument operands to the called function, this is required by the
/// call interface.
Operation::operand_range GenericCallOp::getArgOperands() { return getInputs(); }

/// Get the argument operands to the called function as a mutable range, this is
/// required by the call interface.
MutableOperandRange GenericCallOp::getArgOperandsMutable() {
  return getInputsMutable();
}

现在，这个内联器已经被告知了关于 Toy dialect 的信息，我们可以增加内联器 pass到 Toy 的 pass manager：

pm.addPass(mlir::createInlinerPass());

现在让我们一起看一个可以工作的示例：【没有在Toy IR 中做内联的效果，】

toy.func @multiply_transpose(%arg0: tensor<*xf64>, %arg1: tensor<*xf64>) -> tensor<*xf64> {
  %0 = toy.transpose(%arg0 : tensor<*xf64>) to tensor<*xf64>
  %1 = toy.transpose(%arg1 : tensor<*xf64>) to tensor<*xf64>
  %2 = toy.mul %0, %1 : tensor<*xf64>
  toy.return %2 : tensor<*xf64>
}
toy.func @main() {
  %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
  %1 = toy.reshape(%0 : tensor<2x3xf64>) to tensor<2x3xf64>
  %2 = toy.constant dense<[1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00, 6.000000e+00]> : tensor<6xf64>
  %3 = toy.reshape(%2 : tensor<6xf64>) to tensor<2x3xf64>
  %4 = toy.generic_call @multiply_transpose(%1, %3) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64>
  %5 = toy.generic_call @multiply_transpose(%3, %1) : (tensor<2x3xf64>, tensor<2x3xf64>) -> tensor<*xf64>
  toy.print %5 : tensor<*xf64>
  toy.return
}

我们有两个对 multiply_transpose 函数的调用需要内联到main中。但是，如果我们看一下输出会发现没有什么被改变。我们以后了最后一个步骤：在调用边界这里有一个隐藏的类型变换。如果我们看一下上边，generic_call 的操作数的类型为 tensor<2x3xf64>,然而，这个函数期待的输入为 tensor<*xf6> 类型的。为了解决这种差异，这个内联器期待插入一个显式的类型变换操作。为此，我们需要添加一个新的 operation 到 Toy dialect 中去，ToyCastOp(toy.cast)，来表示这个不同 shapes 之间的变换。

def CastOp : Toy_Op<"cast", [
    DeclareOpInterfaceMethods<CastOpInterface>,
    Pure,
    SameOperandsAndResultShape]
  > {
  let summary = "shape cast operation";
  let description = [{
    The "cast" operation converts a tensor from one type to an equivalent type
    without changing any data elements. The source and destination types
    must both be tensor types with the same element type. If both are ranked,
    then shape is required to match. The operation is invalid if converting
    to a mismatching constant dimension.
  }];

  let arguments = (ins F64Tensor:$input);
  let results = (outs F64Tensor:$output);
  let assemblyFor

需要注意的是，这个强制类型转换的 operation 的定义里，在其 traits 列表中添加了一个 CastOpInterface。这个 interface 为 cast-like operation 【比如这里我们自定义的CastOp】提供了几个实用工具，例如折叠的一致性类型转换和验证。通过为 areCastCompatible 方法提供定义，我们 hook 进了这个 interface：

/// Returns true if the given set of input and result types are compatible with
/// this cast operation. This is required by the `CastOpInterface` to verify
/// this operation and provide other additional utilities.
bool CastOp::areCastCompatible(TypeRange inputs, TypeRange outputs) {
  if (inputs.size() != 1 || outputs.size() != 1)
    return false;
  // The inputs must be Tensors with the same element type.
  TensorType input = llvm::dyn_cast<TensorType>(inputs.front());
  TensorType output = llvm::dyn_cast<TensorType>(outputs.front());
  if (!input || !output || input.getElementType() != output.getElementType())
    return false;
  // The shape is required to match if both types are ranked.
  return !input.hasRank() || !output.hasRank() || input == output;
}

有了像样的类型转换 operation，我们现在可以在 ToyInlinerInterface 上重载必要的 hook，并且在必要的时候插入它。

struct ToyInlinerInterface : public DialectInlinerInterface {
  ...

  /// Attempts to materialize a conversion for a type mismatch between a call
  /// from this dialect, and a callable region. This method should generate an
  /// operation that takes 'input' as the only operand, and produces a single
  /// result of 'resultType'. If a conversion can not be generated, nullptr
  /// should be returned.
  Operation *materializeCallConversion(OpBuilder &builder, Value input,
                                       Type resultType,
                                       Location conversionLoc) const final {
    return builder.create<CastOp>(conversionLoc, resultType, input);
  }
};

现在，让我们通过流水线，重新一起看一个可以工作的示例：

toy.func @main() {
  %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
  %1 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
  %2 = toy.cast %1 : tensor<2x3xf64> to tensor<*xf64>
  %3 = toy.cast %0 : tensor<2x3xf64> to tensor<*xf64>
  %4 = toy.transpose(%2 : tensor<*xf64>) to tensor<*xf64>
  %5 = toy.transpose(%3 : tensor<*xf64>) to tensor<*xf64>
  %6 = toy.mul %4, %5 : tensor<*xf64>
  toy.print %6 : tensor<*xf64>
  toy.return
}

注意，这个通用的内联器也将会执行简化工作，所以，输出的内容会比期待的更整洁一些。

过程内的形状推导

现在我们已经内联了所有的函数，留给我们的是一个main 函数，其中包含了静态的和动态的影响形状的 operations。我们现在可以写一个简单的形状推导pass，在过程内传播形状（在一个单独的函数中）。我们可以把这个传播过程写成一个 pass，它可以直接编码 Toy dialect 中的 operations 的约束，但是，把这个变换写得更通用的话，看着是一个更不错的方案。作为一个首推的好原则，把一个编译器变换设计的尽可能通用是最好的，这样一来，将来可以把它扩展到其他的 dialect 上。会有数不清的其他的dialect可能有相同的需求，或遇到相似的问题。

对于形状推导，如果我们把这个问题分解到它的核心上，我们真的只是想让 operations 告诉我们，在给定的一组静态的已知的输入时，期待的输出的形状。（我们确实可以让事情比现在更复杂，但仅就我们的需求而言，可以保持简单）考虑到这个特性对于一个特定的 operation 是核心特性，我们可以定义一个 operation interface，这样，可以在需要对它们做形状推导的那些 operation 上指定这个 operation interface。

就跟定义 operations 那样，我们也可以实用 ODS framework 来定义 operation interface。

这个 interface 的定义中继承了 OpInterface，这个 interface 会拿着给定的名字作为 template 参数，生成 C++ interface class。我们会将这个生成的 class 简单地命名为 ShapeInference。我们也给这个 interface 提供了描述。

def ShapeInferenceOpInterface : OpInterface<"ShapeInference"> {
  let description = [{
    Interface to access a registered method to infer the return types for an
    operation that can be used during type inference.
  }];
}

接下来，我们定义了operations 需要提供的 interface 方法。一个 interface 方法的组成：
一个描述，一个字符串式的C++返回值类型，一个字符串式的方法名字，以及一些按需的可选的组件。更多信息可以参考 ODS 文档。

def ShapeInferenceOpInterface : OpInterface<"ShapeInference"> {
  ...

  let methods = [
    InterfaceMethod<"Infer and set the output shape for the current operation.",
                    "void", "inferShapes">
  ];
}

interface 已经被定义了，我们可以把它添加到有需要的 Toy operations 中去，跟我们把 CallOpInterface 添加到 GenericCallOp中类似。

def MulOp : Toy_Op<"mul",
    [..., DeclareOpInterfaceMethods<ShapeInferenceOpInterface>]> {
  ...
}

Toy 中的每个 operations 将需要为 inferShapes() 方法提供一个定义。这里以 mul op 作为示例，结果的形状被推导（为如同那些输入的形状）。

/// Infer the output shape of the MulOp, this is required by the shape inference
/// interface.
void MulOp::inferShapes() { getResult().setType(getLhs().getType()); }

在这一点上，每一个有必要的 Toy operations 会提供一个机制来推导出它们的输出的形状。ShapeInferencePass 将会在函数上起作用：它将会在每个函数上独立的运行。MLIR 也支持通用的 OperationPasses，它们运行在独立的 operation 上，但是，我们这里的 module 只包含 functions，所以这里没有将其推广到所有 operations 的需要。

可以通过创建一个继承 mlir::OperationPass<FuncOp> class 的 class，来实现一个这样的pass，并且需要重载其中的 runOnOperation() 方法。

class ShapeInferencePass
    : public mlir::PassWrapper<ShapeInferencePass, OperationPass<FuncOp>> {
  void runOnOperation() override {
    FuncOp function = getOperation();
    ...
  }
};

与此同时，让我们创建一 helper 方法来实例化这个 pass：

std::unique_ptr<mlir::Pass> mlir::toy::createShapeInferencePass() {
  return std::make_unique<ShapeInferencePass>();
}

这个形状推导算法的步骤如下：
    1，建立一个 worklist，其中所包含的 operations 返回一个动态形状的张量：这些就是需要形状推导的 operations。
    2，在 worklist 上迭代：
       2.1，找到一个 operation 做处理：worklist 中下一个预备好了的 operation 获得的所有参数都是非通用（非范型）的参数；
       2.2，如果找不到需要处理的 operation，就从循环中跳出；
       2.3，从 worklist 中移除处理过的 operation；
       2.4，从 operation 的参数类型推导它的输出的形状。
    3，如果这个 worklist 变成空的，这个算法成功执行完毕。

当如所描述地处理一个 operation 时，我们使用下面的代码片段来查询确认这个 operation 是否注册了 ShapeInference interface：

  // Ask the operation to infer its output shapes.
  LLVM_DEBUG(llvm::dbgs() << "Inferring shape for: " << *op << "\n");

  /// We check if an operation has a particular interface by casting.
  if (ShapeInference shapeOp = dyn_cast<ShapeInference>(op)) {
    shapeOp.inferShapes();
  } else {
    op->emitError("unable to infer shape of operation without shape "
                  "inference interface");
    return signalPassFailure();
  }

接下来我们就可以把我们的 pass 添加到 pass manager 中了：

  pm.addPass(mlir::createShapeInferencePass());

如果回到我们最初的示例代码，现在我们可以得到如下输出：

toy.func @main() {
  %0 = toy.constant dense<[[1.000000e+00, 2.000000e+00, 3.000000e+00], [4.000000e+00, 5.000000e+00, 6.000000e+00]]> : tensor<2x3xf64>
  %1 = toy.transpose(%0 : tensor<2x3xf64>) to tensor<3x2xf64>
  %2 = toy.mul %1, %1 : tensor<3x2xf64>
  toy.print %2 : tensor<3x2xf64>
  toy.return
}

你可以创建 toyc-ch4，并且尝试如下命令：

$ toyc-ch4 test/Examples/Toy/Ch4/codegen.toy -emit=mlir -opt

下一章中，我们将瞄准更低层级的 dialect，开始代码生成过程，以便实现优化 Toy operations 中的一些计算量较大者。

全文 - MLIR Toy Tutorial Chapter 4: 使用 interfaces 开启 通用变换

背景：与可扩展的 IR 搏斗

形状推导：为生成代码做好准备

做内联

过程内的形状推导

全文 - MLIR Toy Tutorial Chapter 4: 使用 interfaces 开启通用变换