基于虚拟机源码分析move合约（五）：vector的基本操作

biakia0610

已于 2022-10-18 16:00:34 修改

阅读量682

点赞数

CC 4.0 BY-SA版权

分类专栏： Rust Move 区块链文章标签：区块链 rust Move

于 2022-10-18 15:48:43 首次发布

本文链接：https://blog.youkuaiyun.com/biakia0610/article/details/127388447

区块链同时被 3 个专栏收录

22 篇文章

订阅专栏

Rust

11 篇文章

订阅专栏

Move

7 篇文章

订阅专栏

合约：

module test_05::test_move{
    use std::vector;
    public fun test_integer(){
        let v = vector::empty<u64>();
        vector::push_back(&mut v, 5);
        let a:& u64 = vector::borrow(&v, 0);
        let b:&mut u64 = vector::borrow_mut(&mut v, 0);
        vector::pop_back(&mut v);
        vector::destroy_empty(v);
    } 
}

这个合约演示了move中vector的基本操作

我们通过下面的命令执行反编译：

move disassemble --name test_move

我们通过反编译可以得到如下指令：

// Move bytecode v5
module f2.test_move {


public test_integer() {
L0:     a: &u64
L1:     b: &mut u64
L2:     v: vector<u64>
B0:
        0: VecPack(2, 0)
        1: StLoc[2](v: vector<u64>)
        2: MutBorrowLoc[2](v: vector<u64>)
        3: LdU64(5)
        4: VecPushBack(2)
        5: ImmBorrowLoc[2](v: vector<u64>)
        6: LdU64(0)
        7: VecImmBorrow(2)
        8: Pop
        9: MutBorrowLoc[2](v: vector<u64>)
        10: LdU64(0)
        11: VecMutBorrow(2)
        12: Pop
        13: MutBorrowLoc[2](v: vector<u64>)
        14: VecPopBack(2)
        15: Pop
        16: MoveLoc[2](v: vector<u64>)
        17: VecUnpack(2, 0)
        18: Ret
}
}

VecPack(2, 0)

这个操作用来初始化一个vector，第一个参数2代表了索引，这个索引指向的是这个vector内部数据的类型，这个类型会在编译的时候解析然后进行缓存，通过这个索引可以直接找到这个类型，比如我们合约中vector::empty<u64>()，这里的u64就是数据类型。第二个参数0表示初始化的时候有多少数据会存入，这里由于使用的是vector::empty操作，因此没有初始化数据，如果我们使用let v = vector<u64>[0,1,2];这样的初始化方法，这时候第二个参数其实是3。

下面看看具体代码：

Bytecode::VecPack(si, num) => {
        let ty = resolver.instantiate_single_type(*si, self.ty_args())?;
        gas_meter.charge_vec_pack(make_ty!(&ty),
                  interpreter.operand_stack.last_n(*num as usize)?,)?;
        let elements = interpreter.operand_stack.popn(*num as u16)?;
        let value = Vector::pack(&ty, elements)?;
        interpreter.operand_stack.push(value)?;
}

首先通过索引解析出内部数据的类型，这里这个解析出来的ty其实是U64，然后从栈上弹出num个数据，这里我们是初始化空vector，因此其实elements是个空数组，然后调用Vector::pack方法生成Vector数据结构，最后存入栈上，下面来看下Vector::pack的代码：

pub fn pack(type_param: &Type, elements: Vec<Value>) -> PartialVMResult<Value> {
        let container = match type_param {
            Type::U8 => Value::vector_u8(
                elements
                    .into_iter()
                    .map(|v| v.value_as())
                    .collect::<PartialVMResult<Vec<_>>>()?,
            ),
            Type::U64 => Value::vector_u64(
                elements
                    .into_iter()
                    .map(|v| v.value_as())
                    .collect::<PartialVMResult<Vec<_>>>()?,
            ),
            Type::U128 => Value::vector_u128(
                elements
                    .into_iter()
                    .map(|v| v.value_as())
                    .collect::<PartialVMResult<Vec<_>>>()?,
            ),
            Type::Bool => Value::vector_bool(
                elements
                    .into_iter()
                    .map(|v| v.value_as())
                    .collect::<PartialVMResult<Vec<_>>>()?,
            ),
            Type::Address => Value::vector_address(
                elements
                    .into_iter()
                    .map(|v| v.value_as())
                    .collect::<PartialVMResult<Vec<_>>>()?,
            ),

            Type::Signer | Type::Vector(_) | Type::Struct(_) | Type::StructInstantiation(_, _) => {
                Value(ValueImpl::Container(Container::Vec(Rc::new(RefCell::new(
                    elements.into_iter().map(|v| v.0).collect(),
                )))))
            }

            Type::Reference(_) | Type::MutableReference(_) | Type::TyParam(_) => {
                return Err(
                    PartialVMError::new(StatusCode::UNKNOWN_INVARIANT_VIOLATION_ERROR)
                        .with_message(format!("invalid type param for vector: {:?}", type_param)),
                )
            }
        };

        Ok(container)
    }

由于我们的类型是U64，因此走下面的分支：

Type::U64 => Value::vector_u64(
         elements.into_iter()
                 .map(|v| v.value_as())
                 .collect::<PartialVMResult<Vec<_>>>()?,
)

我们可以看到调用了Value::vector_u64方法：

pub fn vector_u64(it: impl IntoIterator<Item = u64>) -> Self {
        Self(ValueImpl::Container(Container::VecU64(Rc::new(
            RefCell::new(it.into_iter().collect()),
        ))))
    }

最后其实返回了一个Container类型的Value,我们可以看下Container的定义:

enum ValueImpl {
    Invalid,

    U8(u8),
    U64(u64),
    U128(u128),
    Bool(bool),
    Address(AccountAddress),

    Container(Container),

    ContainerRef(ContainerRef),
    IndexedRef(IndexedRef),
}

enum Container {
    Locals(Rc<RefCell<Vec<ValueImpl>>>),
    Vec(Rc<RefCell<Vec<ValueImpl>>>),
    Struct(Rc<RefCell<Vec<ValueImpl>>>),
    VecU8(Rc<RefCell<Vec<u8>>>),
    VecU64(Rc<RefCell<Vec<u64>>>),
    VecU128(Rc<RefCell<Vec<u128>>>),
    VecBool(Rc<RefCell<Vec<bool>>>),
    VecAddress(Rc<RefCell<Vec<AccountAddress>>>),
}

可以看到Container也分多种，这里我们用到了VecU64，内部持有一个Rc<RefCell<Vec<u64>>，这个是rust原生的对u64类型数组的引用。

因此，Vector::pack方法会返回一个包含u64类型数组的引用的Container，并把这个Container包装成Vector，最后被存入栈上。

StLoc[2](v: vector<u64>)：将上面的Vector从栈上删除，然后存入寄存器2。

MutBorrowLoc[2](v: vector<u64>)

这个操作会返回一个引用，这个引用会指向寄存器2的数据，这条指令对应了vector::push_back(&mut v, 5);中的&mut v部分代码。下面看看实际代码：

Bytecode::MutBorrowLoc(idx) | Bytecode::ImmBorrowLoc(idx) => {
       let instr = match instruction {
                  Bytecode::MutBorrowLoc(_) => S::MutBorrowLoc,
                  _ => S::ImmBorrowLoc,
       };
       gas_meter.charge_simple_instr(instr)?;
       interpreter.operand_stack
                  .push(self.locals.borrow_loc(*idx as usize)?)?;
}

可以看到无论是MutBorrowLoc还是ImmBorrowLoc，本质都是一样的，都是调用locals.borrow_loc返回一个引用，然后存入栈上，下面来看下locals.borrow_loc的代码：

pub fn borrow_loc(&self, idx: usize) -> PartialVMResult<Value> {
        // TODO: this is very similar to SharedContainer::borrow_elem. Find a way to
        // reuse that code?

        let v = self.0.borrow();
        if idx >= v.len() {
            return Err(
                PartialVMError::new(StatusCode::UNKNOWN_INVARIANT_VIOLATION_ERROR).with_message(
                    format!(
                        "index out of bounds when borrowing local: got: {}, len: {}",
                        idx,
                        v.len()
                    ),
                ),
            );
        }

        match &v[idx] {
            ValueImpl::Container(c) => Ok(Value(ValueImpl::ContainerRef(ContainerRef::Local(
                c.copy_by_ref(),
            )))),

            ValueImpl::U8(_)
            | ValueImpl::U64(_)
            | ValueImpl::U128(_)
            | ValueImpl::Bool(_)
            | ValueImpl::Address(_) => Ok(Value(ValueImpl::IndexedRef(IndexedRef {
                container_ref: ContainerRef::Local(Container::Locals(Rc::clone(&self.0))),
                idx,
            }))),

            ValueImpl::ContainerRef(_) | ValueImpl::Invalid | ValueImpl::IndexedRef(_) => Err(
                PartialVMError::new(StatusCode::UNKNOWN_INVARIANT_VIOLATION_ERROR)
                    .with_message(format!("cannot borrow local {:?}", &v[idx])),
            ),
        }
    }

首先会检验是否越界，然后判断对应位置的Value是什么类型的Value。由上面的指令我们知道是ValueImpl::Container，因此走下面的分支：

ValueImpl::Container(c) => Ok(Value(ValueImpl::ContainerRef(ContainerRef::Local(
                c.copy_by_ref(),
))))


fn copy_by_ref(&self) -> Self {
        match self {
            Self::Vec(r) => Self::Vec(Rc::clone(r)),
            Self::Struct(r) => Self::Struct(Rc::clone(r)),
            Self::VecU8(r) => Self::VecU8(Rc::clone(r)),
            Self::VecU64(r) => Self::VecU64(Rc::clone(r)),
            Self::VecU128(r) => Self::VecU128(Rc::clone(r)),
            Self::VecBool(r) => Self::VecBool(Rc::clone(r)),
            Self::VecAddress(r) => Self::VecAddress(Rc::clone(r)),
            Self::Locals(r) => Self::Locals(Rc::clone(r)),
        }
    }

可以看到底层会通过Rc::clone拷贝一份引用，然后返回一个ValueImpl::ContainerRef类型的Value。

因此，vector::push_back(&mut v, 5);中的&mut v部分代码，实际是拷贝一份底层rust的vector的引用，然后生成一个ValueImpl::ContainerRef类型的Value存入栈上。

LdU64(5)：加载数据5到栈上，这条指令对应了vector::push_back(&mut v, 5);中的5部分代码

VecPushBack(2)

这条指令会对栈上的两个数据执行vector::push_back操作，下面我们来看下代码：

Bytecode::VecPushBack(si) => {
        let elem = interpreter.operand_stack.pop()?;
        let vec_ref = interpreter.operand_stack.pop_as::<VectorRef>()?;
        let ty = &resolver.instantiate_single_type(*si, self.ty_args())?;
        gas_meter.charge_vec_push_back(make_ty!(ty), &elem)?;
        vec_ref.push_back(elem, ty)?;
}

先从栈上pop出一个数据，也就是上面的数据5，然后再pop出一个引用，这个VectorRef其实就是ContainerRef：

pub struct VectorRef(ContainerRef);

然后解析下数据类型，也就是U64，最后调用vec_ref.push_back：

pub fn push_back(&self, e: Value, type_param: &Type) -> PartialVMResult<()> {
        let c = self.0.container();
        check_elem_layout(type_param, c)?;

        match c {
            Container::VecU8(r) => r.borrow_mut().push(e.value_as()?),
            Container::VecU64(r) => r.borrow_mut().push(e.value_as()?),
            Container::VecU128(r) => r.borrow_mut().push(e.value_as()?),
            Container::VecBool(r) => r.borrow_mut().push(e.value_as()?),
            Container::VecAddress(r) => r.borrow_mut().push(e.value_as()?),
            Container::Vec(r) => r.borrow_mut().push(e.0),
            Container::Locals(_) | Container::Struct(_) => unreachable!(),
        }

        self.0.mark_dirty();
        Ok(())
    }

可以看到最终使用的是VecU64内部包含的引用，也就是上面的Rc<RefCell<Vec<u64>>，调用它的borrow_mut拿到一个可修改的引用，最后把数据push进去。

ImmBorrowLoc[2](v: vector<u64>)

这个操作会返回一个引用，这个引用会指向寄存器2的数据，这条指令对应了vector::borrow(&v, 0);中的& v部分代码，和上面的MutBorrowLoc操作类似，本质都是生成一个ContainerRef放入栈上。

LdU64(0)：加载数据0到栈上，这条指令对应了vector::borrow(&v, 0);中的0部分代码

VecImmBorrow(2)

这条指令会对栈上的两个数据执行vector::borrow操作，下面我们来看下代码：

Bytecode::VecImmBorrow(si) => {
        let idx = interpreter.operand_stack.pop_as::<u64>()? as usize;
        let vec_ref = interpreter.operand_stack.pop_as::<VectorRef>()?;
        let ty = resolver.instantiate_single_type(*si, self.ty_args())?;
        let res = vec_ref.borrow_elem(idx, &ty);
        gas_meter.charge_vec_borrow(false, make_ty!(&ty), res.is_ok())?;
        interpreter.operand_stack.push(res?)?;
}

最终使用的是vec_ref.borrow_elem，不用看也可以猜到，最终还是用到了rust的原生引用，下面看看代码：

pub fn borrow_elem(&self, idx: usize, type_param: &Type) -> PartialVMResult<Value> {
        let c = self.0.container();
        check_elem_layout(type_param, c)?;
        if idx >= c.len() {
            return Err(PartialVMError::new(StatusCode::VECTOR_OPERATION_ERROR)
                .with_sub_status(INDEX_OUT_OF_BOUNDS));
        }
        Ok(Value(self.0.borrow_elem(idx)?))
    }

首先校验index是否越界，然后调用内部的borrow_elem，也就是ContainerRef的borrow_elem：

impl ContainerRef {
    fn borrow_elem(&self, idx: usize) -> PartialVMResult<ValueImpl> {
        let len = self.container().len();
        if idx >= len {
            return Err(
                PartialVMError::new(StatusCode::UNKNOWN_INVARIANT_VIOLATION_ERROR).with_message(
                    format!(
                        "index out of bounds when borrowing container element: got: {}, len: {}",
                        idx, len
                    ),
                ),
            );
        }

        let res = match self.container() {
            Container::Locals(r) | Container::Vec(r) | Container::Struct(r) => {
                let v = r.borrow();
                match &v[idx] {
                    // TODO: check for the impossible combinations.
                    ValueImpl::Container(container) => {
                        let r = match self {
                            Self::Local(_) => Self::Local(container.copy_by_ref()),
                            Self::Global { status, .. } => Self::Global {
                                status: Rc::clone(status),
                                container: container.copy_by_ref(),
                            },
                        };
                        ValueImpl::ContainerRef(r)
                    }
                    _ => ValueImpl::IndexedRef(IndexedRef {
                        idx,
                        container_ref: self.copy_value(),
                    }),
                }
            }

            Container::VecU8(_)
            | Container::VecU64(_)
            | Container::VecU128(_)
            | Container::VecAddress(_)
            | Container::VecBool(_) => ValueImpl::IndexedRef(IndexedRef {
                idx,
                container_ref: self.copy_value(),
            }),
        };

        Ok(res)
    }
}

还是会先校验越界，然后看下是什么类型的ContainerRef，这里我们是Container::VecU64，因此会调用copy_value：

impl ContainerRef {
    fn copy_value(&self) -> Self {
        match self {
            Self::Local(container) => Self::Local(container.copy_by_ref()),
            Self::Global { status, container } => Self::Global {
                status: Rc::clone(status),
                container: container.copy_by_ref(),
            },
        }
    }
}

Local和Global的区别，是本地变量和链上变量的区别，在前面生成ContainerRef的时候，内部其实用的是Local，所以会调用container.copy_by_ref()：

fn copy_by_ref(&self) -> Self {
        match self {
            Self::Vec(r) => Self::Vec(Rc::clone(r)),
            Self::Struct(r) => Self::Struct(Rc::clone(r)),
            Self::VecU8(r) => Self::VecU8(Rc::clone(r)),
            Self::VecU64(r) => Self::VecU64(Rc::clone(r)),
            Self::VecU128(r) => Self::VecU128(Rc::clone(r)),
            Self::VecBool(r) => Self::VecBool(Rc::clone(r)),
            Self::VecAddress(r) => Self::VecAddress(Rc::clone(r)),
            Self::Locals(r) => Self::Locals(Rc::clone(r)),
        }
    }

可以看到还是用到了Rc::clone去拷贝一份引用，最后会返回一份ValueImpl::IndexedRef的数据存入栈上，这个IndexedRef其实和ReadRef指令有关，有机会再解释。

Pop：上面的数据由于没有进一步被用到，生命周期已经结束，因此pop掉

MutBorrowLoc[2](v: vector<u64>)

这个操作会返回一个引用，这个引用会指向寄存器2的数据，这条指令对应了vector::borrow_mut(&mut v, 0);中的&mut v部分代码，具体上面解释过了就不再赘述。

LdU64(0)：加载数据0到栈上，这条指令对应了vector::borrow_mut(&mut v, 0);中的0部分代码

VecMutBorrow(2)

这条指令会对栈上的两个数据执行vector::borrow_mut操作，下面我们来看下代码：

Bytecode::VecMutBorrow(si) => {
        let idx = interpreter.operand_stack.pop_as::<u64>()? as usize;
        let vec_ref = interpreter.operand_stack.pop_as::<VectorRef>()?;
        let ty = &resolver.instantiate_single_type(*si, self.ty_args())?;
        let res = vec_ref.borrow_elem(idx, ty);
        gas_meter.charge_vec_borrow(true, make_ty!(ty), res.is_ok())?;
        interpreter.operand_stack.push(res?)?;
}

可以看到代码和VecImmBorrow逻辑一样，最终都会生成ValueImpl::IndexedRef的数据存入栈上，而这个引用是否mut则是在IndexedRef处实现，后续文章会结合实际例子来解释。

Pop：上面的数据由于没有进一步被用到，生命周期已经结束，因此pop掉

MutBorrowLoc[2](v: vector<u64>)

这个操作会返回一个引用，这个引用会指向寄存器2的数据，这条指令对应了vector::pop_back(&mut v);中的&mut v部分代码，具体上面解释过了就不再赘述。

VecPopBack(2)

这条指令会对栈上的两个数据执行vector::pop_back操作，下面我们来看下代码：

Bytecode::VecPopBack(si) => {
        let vec_ref = interpreter.operand_stack.pop_as::<VectorRef>()?;
        let ty = &resolver.instantiate_single_type(*si, self.ty_args())?;
        let res = vec_ref.pop(ty);
        gas_meter.charge_vec_pop_back(make_ty!(ty), res.as_ref().ok())?;
        interpreter.operand_stack.push(res?)?;
}

实际调用的是vec_ref.pop，盲猜一波最终用到了rust的vector的pop函数，下面看下代码：

pub fn pop(&self, type_param: &Type) -> PartialVMResult<Value> {
        let c = self.0.container();
        check_elem_layout(type_param, c)?;

        macro_rules! err_pop_empty_vec {
            () => {
                return Err(PartialVMError::new(StatusCode::VECTOR_OPERATION_ERROR)
                    .with_sub_status(POP_EMPTY_VEC))
            };
        }

        let res = match c {
            Container::VecU8(r) => match r.borrow_mut().pop() {
                Some(x) => Value::u8(x),
                None => err_pop_empty_vec!(),
            },
            Container::VecU64(r) => match r.borrow_mut().pop() {
                Some(x) => Value::u64(x),
                None => err_pop_empty_vec!(),
            },
            Container::VecU128(r) => match r.borrow_mut().pop() {
                Some(x) => Value::u128(x),
                None => err_pop_empty_vec!(),
            },
            Container::VecBool(r) => match r.borrow_mut().pop() {
                Some(x) => Value::bool(x),
                None => err_pop_empty_vec!(),
            },
            Container::VecAddress(r) => match r.borrow_mut().pop() {
                Some(x) => Value::address(x),
                None => err_pop_empty_vec!(),
            },
            Container::Vec(r) => match r.borrow_mut().pop() {
                Some(x) => Value(x),
                None => err_pop_empty_vec!(),
            },
            Container::Locals(_) | Container::Struct(_) => unreachable!(),
        };

        self.0.mark_dirty();
        Ok(res)
    }

我们是Container::VecU64类型的数据，因此走下面的分支：

Container::VecU64(r) => match r.borrow_mut().pop() {
                Some(x) => Value::u64(x),
                None => err_pop_empty_vec!(),
            }

可以看到确实使用的是Rc<RefCell<Vec<u64>>的相应的方法。

返回的值最终会压入栈上。

Pop：这个返回的值没有用到，因此生命周期结束，直接pop掉

MoveLoc[2](v: vector<u64>)

这个操作会把我们第一步生成的Vector数据结构从寄存器2删除，然后压入栈

VecUnpack(2, 0)

这条指令用来销毁Vector数据结构的，具体看下面的代码：

Bytecode::VecUnpack(si, num) => {
        let vec_val = interpreter.operand_stack.pop_as::<Vector>()?;
        let ty = &resolver.instantiate_single_type(*si, self.ty_args())?;
        gas_meter.charge_vec_unpack(make_ty!(ty), NumArgs::new(*num))?;
        let elements = vec_val.unpack(ty, *num)?;
        for value in elements {
            interpreter.operand_stack.push(value)?;
        }
}

首先从栈上pop一个数据，我们对比代码可以看到，这个数据就是第一条指令压入的Vector数据结构，这里拿到后调用unpack方法：

pub fn unpack(self, type_param: &Type, expected_num: u64) -> PartialVMResult<Vec<Value>> {
        check_elem_layout(type_param, &self.0)?;
        let elements: Vec<_> = match self.0 {
            Container::VecU8(r) => take_unique_ownership(r)?
                .into_iter()
                .map(Value::u8)
                .collect(),
            Container::VecU64(r) => take_unique_ownership(r)?
                .into_iter()
                .map(Value::u64)
                .collect(),
            Container::VecU128(r) => take_unique_ownership(r)?
                .into_iter()
                .map(Value::u128)
                .collect(),
            Container::VecBool(r) => take_unique_ownership(r)?
                .into_iter()
                .map(Value::bool)
                .collect(),
            Container::VecAddress(r) => take_unique_ownership(r)?
                .into_iter()
                .map(Value::address)
                .collect(),
            Container::Vec(r) => take_unique_ownership(r)?.into_iter().map(Value).collect(),
            Container::Locals(_) | Container::Struct(_) => unreachable!(),
        };
        if expected_num as usize == elements.len() {
            Ok(elements)
        } else {
            Err(PartialVMError::new(StatusCode::VECTOR_OPERATION_ERROR)
                .with_sub_status(VEC_UNPACK_PARITY_MISMATCH))
        }
    }

我们的数据是Container::VecU64，因此走下面的分支：

Container::VecU64(r) => take_unique_ownership(r)?
                .into_iter()
                .map(Value::u64)
                .collect()

它会调用take_unique_ownership，这个方法用来销毁一个引用并返回内部的数据：


fn take_unique_ownership<T: Debug>(r: Rc<RefCell<T>>) -> PartialVMResult<T> {
    match Rc::try_unwrap(r) {
        Ok(cell) => Ok(cell.into_inner()),
        Err(r) => Err(
            PartialVMError::new(StatusCode::UNKNOWN_INVARIANT_VIOLATION_ERROR)
                .with_message(format!("moving value {:?} with dangling references", r)),
        ),
    }
}

最终会把底层rust的vector的数据全部返回，由于expected_num是0，因此这里返回的elements其实为0，因此最外层的压入栈的操作也不会执行。

Ret：函数结束，直接返回