【iOS底层】06：cache_t分析

最新推荐文章于 2021-07-29 22:24:03 发布

miaocuilin

最新推荐文章于 2021-07-29 22:24:03 发布

阅读量218

点赞数

分类专栏： iOS底层文章标签： iOS底层 cache

本文链接：https://blog.youkuaiyun.com/miaocuilin/article/details/118220257

版权

iOS底层专栏收录该内容

17 篇文章

订阅专栏

本文详细分析了iOS中cache_t的数据结构，通过LLDB验证其组成，探讨了bucketsAndMaybeMask、_occupied等字段的作用。文章揭示了cache_t采用的无序hash链表存储方式，以及在达到容量3/4时的扩容机制。同时，讨论了为什么不直接添加新元素，而是清空后再扩容的原因，以及buckets()函数与_bucketsAndMaybeMask的关系。最后，概述了cache的读取流程和insert时机。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

struct objc_class : objc_object {
// Class ISA;
    Class superclass;
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits; 
}

在分析完了类里的ISA，bits以后，我们来看下另一块比较重要的--cache。

一、cache_t数据结构分析

我们先从源码来看下cache_t的数据结构：

struct cache_t {
private:
    explicit_atomic<uintptr_t> _bucketsAndMaybeMask; // 8
    union {
        struct {
            explicit_atomic<mask_t>    _maybeMask; // 4
#if __LP64__
            uint16_t                   _flags;  // 2
#endif
            uint16_t                   _occupied; // 2
        };
        explicit_atomic<preopt_cache_t *> _originalPreoptCache; // 8
    };
    //省略剩余部分...
}

我们先来按照bits的方法，在lldb中看下cache数据
我们在LGPerson类中声明一个对象方法 - (void)saySomething;
并在main函数中调用一下

LGPerson *p = [LGPerson alloc];
[p saySomething];

(lldb) p/x pClass //获取类的首地址
(Class) $1 = 0x0000000100008400 LGPerson
(lldb) p (cache_t *)0x0000000100008410 //首地址平移16字节获取cache
(cache_t *) $2 = 0x0000000100008410
(lldb) p *$2 //查看cache值内容
(cache_t) $3 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic<unsigned long> = {
      Value = 4298515408
    }
  }
   = {
     = {
      _maybeMask = {
        std::__1::atomic<unsigned int> = {
          Value = 0
        }
      }
      _flags = 32808
      _occupied = 0
    }
    _originalPreoptCache = {
      std::__1::atomic<preopt_cache_t *> = {
        Value = 0x0000802800000000
      }
    }
  }
}
(lldb) p [p saySomething]  //上边value没有值是因为我们没有调用方法，没有缓存，调用一下
2021-06-25 14:22:15.679137+0800 KCObjcBuild[57009:1207201] -[LGPerson saySomething]
(lldb) p *$2
(cache_t) $4 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic<unsigned long> = {
      Value = 4301537696
    }
  }
   = {
     = {
      _maybeMask = {
        std::__1::atomic<unsigned int> = {
          Value = 7
        }
      }
      _flags = 32808
      _occupied = 1
    }
    _originalPreoptCache = {
      std::__1::atomic<preopt_cache_t *> = {
        Value = 0x0001802800000007
      }
    }
  }
}

上边在cache里我们看到了几个值，
_bucketsAndMaybeMask
_maybeMask
_flags
_occupied
_originalPreoptCache

发现获取不到什么有用的线索了，我们去源码里看一看，看看有什么新的结构或者方法供我们调用探索。

源码探索

我们发现一些看起来有用的方法

什么自增一个占用(incrementOccupied())，设置BucketsAndMask，以及中间的一些对于Buckets的操作，看起来buckets很重要，我们去看看bucket_t的结构。

bucket_t结构：

啧啧，终于发现了sel和imp。另外我们还发现了他们的调用方法

二、LLDB验证cache_t结构

lldb获取下saySomething方法的sel和imp试试。

(lldb) p $2.buckets() //获取buckets
(bucket_t *) $4 = 0x0000000100706040
(lldb) p *$4
(bucket_t) $5 = {
  _sel = {
    std::__1::atomic<objc_selector *> = (null) {
      Value = nil
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 0
    }
  }
}    
/*发现$5中居然是空的，猜测下既然$4是一个bucket_t *类型的指针，那么是不是一个类似数组或者
 *链表的形式存储的呢？
 */
(lldb) p $2.buckets()[1]
(bucket_t) $6 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 49008
    }
  }
}    //果然有值了
(lldb) p $6.sel() //获取sel
(SEL) $7 = "saySomething"
(lldb) p $6.imp($4, p.class) //获取imp
(IMP) $8 = 0x0000000100003be0 (KCObjcBuild`-[LGPerson saySomething])

终于被我们找到了。

cache_t结构图

三、脱离源码分析cache_t

- (void)say1;
- (void)say2;
- (void)say3;
- (void)say4;
- (void)say5;
- (void)say6;
- (void)say7;
- (void)say8;

#ifdef DEBUG
#define LGLog(format, ...) printf("%s\n", [[NSString stringWithFormat:format, ## __VA_ARGS__] UTF8String]);
#else
#define LGLog(format, ...);
#endif


typedef uint32_t mask_t;

struct mc_bucket_t {
    SEL _sel;
    IMP _imp;
};

struct mc_cache_t {
    mc_bucket_t *_buckets;//直接取buckets
    mask_t _maybeMask; // 4
    uint16_t  _flags;  // 2
    uint16_t  _occupied; // 2
};

struct mc_class_data_bits_t {
    //friend不要了，因为下边有mc_objc_class
    uintptr_t bits;
};

struct mc_objc_class {
    // Class ISA;
    Class isa;//这里isa是继承自object的，自己造别忘了自己填上
    Class superclass;
    mc_cache_t cache;             // formerly cache pointer and vtable
    mc_class_data_bits_t bits;
};


int main(int argc, const char * argv[]) {
    @autoreleasepool {
        // class_data_bits_t
        LGPerson *p = [LGPerson alloc];
        Class pClass = p.class;
        [p say1];
//        [p say2];
//        [p say3];
//        [p say4];
//        [p say5];
//        [p say6];
//        [p say7];
//        [p say8];
        struct mc_objc_class *mcClass = (struct mc_objc_class *)pClass;

        for (mask_t i = 0; i < mcClass->cache._maybeMask; i++) {
            mc_bucket_t bucket = mcClass->cache._buckets[i];
            LGLog(@"%@--%p", NSStringFromSelector(bucket._sel), bucket._imp);
        }
        NSLog(@"%@",p);
    }
    return 0;
}

[p say1];
输出：
111***1--4

[p say2];
输出：
222***2--4

[p say3];
输出：
333***3--4

[p say4];
输出：
222***2--8

[p say5];
输出：
222***3--8

[p say6];
输出：
222***4--8

[p say7];
输出：
222***5--8

[p say8];
输出：
333***6--8

111***1--4
222***2--4
333***3--4
222***2--8
222***3--8
222***4--8
222***5--8
333***6--8

调用say1-8打印for循环输出所有sel--imp

-[LGPerson say1]
say1--0xbc20
(null)--0x0
(null)--0x0

-[LGPerson say1]
-[LGPerson say2]
say1--0xbc28
say2--0xbdc8
(null)--0x0

-[LGPerson say1]
-[LGPerson say2]
-[LGPerson say3]
(null)--0x0
say3--0xbc90
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0

-[LGPerson say1]
-[LGPerson say2]
-[LGPerson say3]
-[LGPerson say4]
(null)--0x0
say3--0xbc98
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
say4--0xbcf8

-[LGPerson say1]
-[LGPerson say2]
-[LGPerson say3]
-[LGPerson say4]
-[LGPerson say5]
(null)--0x0
say3--0xbc80
(null)--0x0
say5--0xbf40
(null)--0x0
(null)--0x0
say4--0xbce0

-[LGPerson say1]
-[LGPerson say2]
-[LGPerson say3]
-[LGPerson say4]
-[LGPerson say5]
-[LGPerson say6]
say6--0xbfa8
say3--0xbc88
(null)--0x0
say5--0xbf48
(null)--0x0
(null)--0x0
say4--0xbce8

-[LGPerson say1]
-[LGPerson say2]
-[LGPerson say3]
-[LGPerson say4]
-[LGPerson say5]
-[LGPerson say6]
-[LGPerson say7]
say6--0xbf90
say3--0xbcb0
(null)--0x0
say5--0xbf70
(null)--0x0
say7--0xbe30
say4--0xbcd0

-[LGPerson say1]
-[LGPerson say2]
-[LGPerson say3]
-[LGPerson say4]
-[LGPerson say5]
-[LGPerson say6]
-[LGPerson say7]
-[LGPerson say8]
(null)--0x0
(null)--0x0
say8--0xbe58
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0
(null)--0x0

可以看出cache存储无序，是hash链表方式存储，而且是在存储到3/4时扩容至原来大小的2倍，然后先清空旧bucket，再存储新插入的（比如调用say3的时候）。

四、cache_t底层原理分析

_bucketsAndMaybeMask
这个值像是存储了两个值，buckets和maybeMask。
但是没有继续探索下去的思路了，我们想下cache肯定是做了一些增删改查，来存储和清空一些信息。

继续探索源码我们发现在cache_t中有个方法

发下操作了一些属性比如occupied，capacity，buckets()。

mask_t newOccupied = occupied() + 1;
    unsigned oldCapacity = capacity(), capacity = oldCapacity;
    if (slowpath(isConstantEmptyCache())) {
        // Cache is read-only. Replace it.
        if (!capacity) capacity = INIT_CACHE_SIZE;  //4
        reallocate(oldCapacity, capacity, /* freeOld */false); //查看这里
    }
    else if (fastpath(newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity))) {
        // Cache is less than 3/4 or 7/8 full. Use it as-is.
    }
#if CACHE_ALLOW_FULL_UTILIZATION
    else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity) {
        // Allow 100% cache utilization for small buckets. Use it as-is.
    }
#endif
    else {
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        reallocate(oldCapacity, capacity, true);
    }

一步步进来，因为buckets肯定会有很多个所以不可能都存放在cache的内存里，那么这里_bucketsAndMaybeMask就相当于一个指针指向所有buckets的开始端，occupied=0因为还没有开始存储。

继续走cache_t里的代码

bucket_t *b = buckets();
    mask_t m = capacity - 1;//4-1=3  解释了为什么打印的mask=3 occupied=1.
    mask_t begin = cache_hash(sel, m);//数组插入到哪，哈希一个插入地址。
    mask_t i = begin;

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot.
    do {
        if (fastpath(b[i].sel() == 0)) {
            incrementOccupied(); //occupied自增
            b[i].set<Atomic, Encoded>(b, sel, imp, cls());
            return;
        }
        if (b[i].sel() == sel) {
            // The entry was added to the cache by some other thread
            // before we grabbed the cacheUpdateLock.
            return;
        }
    } while (fastpath((i = cache_next(i, m)) != begin));
    
    bad_cache(receiver, (SEL)sel);

那么为什么我们lldb首次调用saySomething的时候occupied=7呢。

看这段代码

当大于capacity的3/4时

两倍扩容。

那么关于输出nil的解释：

say1,2插入say3的时候清空say1,2然后扩容到8，在插入say3，所以只有say3了就。

疑问：那么为啥不扩容后，不直接在say1,2后边添加say3 ，为啥要清空先。

回答：如果要把say1,2拿过来，再塞入新内存里，‘数组平移’是非常消耗内存的。在算法里就剔除了直接。不如直接在需要say1,2的时候缓存里再拿出来。

五、补充

1.buckets()[1]
对于数组来说是取数组的第二个元素，但是对于结构体struct来说是链表下标平移。数组也相当于是下标平移。平移的单位取决于内部元素大小。

2._bucketsAndMaybeMask

哇偶~ _bucketsAndMaybeMask的地址居然和buckets()的首地址是相同的！

我们再看下buckets()函数的源码

struct bucket_t *cache_t::buckets() const
{
    uintptr_t addr = _bucketsAndMaybeMask.load(memory_order_relaxed);
    return (bucket_t *)(addr & bucketsMask);
}

static constexpr uintptr_t bucketsMask = ~0ul;

看到bucketsMask为非0；即全是1。

3.不同设备走的判断条件不同

#define CACHE_MASK_STORAGE_OUTLINED 1
#define CACHE_MASK_STORAGE_HIGH_16 2
#define CACHE_MASK_STORAGE_LOW_4 3
#define CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS 4

#if defined(__arm64__) && __LP64__
#if TARGET_OS_OSX || TARGET_OS_SIMULATOR
//高16位 就是内存地址左边 大端模式：读取顺序不同
#define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
#else
#define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16
#endif
#elif defined(__arm64__) && !__LP64__
//低4位 内存地址右边
#define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_LOW_4
#else
#define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_OUTLINED
#endif

4.负载因子

3/4 7/8(M1电脑) 哈希算法空间利用率+哈希冲突的方面考虑。这两个值的情况下利用率比较高，且冲突率比较低。

5.cache读取流程分析

*拿到类的首地址，首地址+0x10，获取到cache，cache中有个bucketsAndMaybeMask(buckets的首地址)，然后再平移逐个取出各个bucket以及其中的sel和imp。

*insert插入时机：log_and_fill_cache调用。

*objc-cache.mm文件流程分析。cache_getImp之前调用了objc_msgSend*