dmlc中的data.h

本文深入探讨了DMLC库中的"data.h"头文件,它提供了高效的数据加载和处理功能。通过使用DMLC的数据接口,开发者可以便捷地处理大规模机器学习任务中的数据集,支持多种数据格式,并实现并行读取和预处理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

//这是包含一个稀疏矩阵中的一些行
//适用于流式算法
/*!
 * \brief a block of data, containing several rows in sparse matrix
 *  This is useful for (streaming-sxtyle) algorithms that scans through rows of data
 *  examples include: SGD, GD, L-BFGS, kmeans
 *
 *  The size of batch is usually large enough so that parallelizing over the rows
 *  can give significant speedup
 * \tparam IndexType type to store the index used in row batch
 */
template<typename IndexType>
struct RowBlock {
  /*! \brief batch size */
  size_t size;
  /*! \brief array[size+1], row pointer to beginning of each rows */
  const size_t *offset;
  /*! \brief array[size] label of each instance */
  const real_t *label;
  /*! \brief With weight: array[size] label of each instance, otherwise nullptr */
  const real_t *weight;
  /*! \brief feature index */
  const IndexType *index;
  /*! \brief feature value, can be NULL, indicating all values are 1 */
  const real_t *value;
  /*!
   * \brief get specific rows in the batch
   * \param rowid the rowid in that row
   * \return the instance corresponding to the row
   */
  inline Row<IndexType> operator[](size_t rowid) const;
  /*! \return memory cost of the block in bytes */
  inline size_t MemCostBytes(void) const {
    size_t cost = size * (sizeof(size_t) + sizeof(real_t));
    if (weight != NULL) cost += size * sizeof(real_t);
    size_t ndata = offset[size] - offset[0];
    if (index != NULL) cost += ndata * sizeof(IndexType);
    if (value != NULL) cost += ndata * sizeof(real_t);
    return cost;
  }
  /*!
   * \brief slice a RowBlock to get rows in [begin, end)
   * \param begin the begin row index
   * \param end the end row index
   * \return the sliced RowBlock
   */
  inline RowBlock Slice(size_t begin, size_t end) const {
    CHECK(begin <= end && end < size);
    RowBlock ret;
    ret.size = end - begin;
    ret.label = label + begin;
    if (weight != NULL) {
      ret.weight = weight + begin;
    } else {
      ret.weight = NULL;
    }
    ret.offset = offset + begin;
    ret.index = index;
    ret.value = value;
    return ret;
  }
};
//获取一行
// implementation of operator[]
template<typename IndexType>
inline Row<IndexType>
RowBlock<IndexType>::operator[](size_t rowid) const {
  CHECK(rowid < size);
  Row<IndexType> inst;
  inst.label = label[rowid];
  if (weight != NULL) {
    inst.weight = weight[rowid];
  } else {
    inst.weight = 1.0f;
  }
  inst.length = offset[rowid + 1] - offset[rowid];
  inst.index = index + offset[rowid];
  if (value == NULL) {
    inst.value = NULL;
  } else {
    inst.value = value + offset[rowid];
  }
  return inst;
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值