offline_eval_map_corloc.py

本文分享了使用TensorFlow进行蝴蝶检测比赛的经验,详细介绍了数据集划分、模型选择与调整、评估指标计算的过程,以及解决评估过程中遇到的问题。
部署运行你感兴趣的模型镜像
    <div id="post_detail">
<div class="post">
	<h2>
		<a id="cb_post_title_url" href="https://www.cnblogs.com/caffeaoto/p/8758962.html">用Tensorflow做蝴蝶检测</a>
	</h2>
	<div class="postbody">
	<div id="cnblogs_post_body" class="blogpost-body"><p>报名了一个蝴蝶检测比赛,一共给了700多张图,包含94种蝴蝶类别,要求检测出图片中的蝴蝶并正确分类。</p>

1.

https://www.cnblogs.com/caffeaoto/p/8758962.html
拿到数据集后,第一部就是将700多张图分成了 483张训练样本和238张测试样本(由于数据集中,有15种类别的蝴蝶只有一张,所以在测试样本中,仅包含了79种蝴蝶类别)

2.利用一个现有的包含蝴蝶类别的模型直接对测试集中的蝴蝶进行检测(相当于二分类),这里选用的是“ faster_rcnn_inception_resnet_v2_atrous_oid_2018_01_28 ”模型。该模型是在Open Image 数据集上训练的,总共有545个不同物体类别。

先是看的 object_detection/object_detection_tutorial.ipynb 这里直接导入的是frozen model,导致无法修改阈值,所以师兄换了一种模型导入方式,可以把检测的阈值调低,以提高检测到的蝴蝶数目

然后我需要做的是对检测的结果进行评估(计算Percison),但是走了点弯路。

tensorflow教程中的流程是,先把测试集转成TFrecord格式,然后再inference之后,直接在后面加入检测到的结果。但是在模型的输入处出现的问题,因为原本Frozen Model的输入是一个tensor,而师兄修改后的模型输入直接就是 image,所以我不知道怎么将输入进行转换。这里还需要后续的学习。于是我就换了一个方向,以师兄的code为基础,每次从文件夹中提出一张image,进行inference,然后我就根据image信息,解析其对应的xml文件中的信息,并写成tf_example的格式;同时将模型的输出dets添加到刚刚生成的example中。这样就解决了之前的问题,顺利把 annotation+detection结果保存成了 TFrecord格式。

下一步就是利用 object_detection/metrics/offline_eval_map_corloc.py 进行评估,但是出现了两个问题,导致我一度陷入僵局。第一个就是出现了  " ground_truth_group_of .size  :None type has no attrbute to size ",我以为是我的TFrecord出现了问题,但是最后发现是因为 

decoded_dict = data_parser.parse(example)

这里解析的时候,由于我原本TFrecord中并没有写入  standard_fields.TfExampleFields.object_group_of.object_group_of 信息,所以在解析的时候,这个内容就被填上了None ,所以不存在size,导致上面的问题产生。

self.optional_items_to_handlers = {
        fields.InputDataFields.groundtruth_difficult:
            Int64Parser(fields.TfExampleFields.object_difficult),
        fields.InputDataFields.groundtruth_group_of:
            Int64Parser(fields.TfExampleFields.object_group_of)

 查下 open image 中的特有的group_of参数是什么意思: Indicates that the box spans a group of objects (e.g., a bed of flowers or a crowd of people). We asked annotators to use this tag for cases with more than 5 instances which are heavily occluding each other and are physically touching.

 也就是说,带有group_of标记的说明,该框中包含了5个以上的物体,如拥挤的人群,一个铺满鲜花的床等等。

 

还有一个问题就是,我的GroundTruth中只有蝴蝶和背景两类,但是原本模型的label_map中却包含545类,所以其余的类别是没有GT的,这样在程序中有一个判断:

复制代码
# object_detection/utils/object_detection_evaluation.py
 if (self.num_gt_instances_per_class == 0).any():
      logging.warn(
          'The following classes have no ground truth examples: %s',
          np.squeeze(np.argwhere(self.num_gt_instances_per_class == 0)) +
          self.label_id_offset)
复制代码

我后来找到后,直接将其注释掉,最终跑通了。

该模型在蝴蝶单类的检测Precision=0.728

 



 

以下主要解释下评估的代码,防止以后忘记。假设模型输出validation_detections.tfrecord已保存在

models/research/butterfly路径下

第一步是生成配置文件:

复制代码
# From models/research/butterfly
SPLIT=validation  # or test

mkdir -p ${SPLIT}_eval_metrics

echo "
label_map_path: ‘…/object_detection/data/oid_bbox_trainable_label_map.pbtxt’
tf_record_input_reader: { input_path: ‘${SPLIT}_detections.tfrecord’ }
" > S P L I T e v a l m e t r i c s / {SPLIT}_eval_metrics/ SPLITevalmetrics/{SPLIT}_input_config.pbtxt

echo "
metrics_set: ‘open_images_detection_metrics’
" > S P L I T e v a l m e t r i c s / {SPLIT}_eval_metrics/ SPLITevalmetrics/{SPLIT}_eval_config.pbtxt

复制代码

然后运行评估程序:

复制代码
# From tensorflow/models/research/butterfly
SPLIT=validation  # or test

PYTHONPATH= P Y T H O N P A T H : PYTHONPATH: PYTHONPATH:(readlink -f …)
python -m object_detection/metrics/offline_eval_map_corloc
–eval_dir=KaTeX parse error: Expected 'EOF', got '\ ' at position 22: …}_eval_metrics \̲ ̲     #结果保存的路径 …{SPLIT}_eval_metrics/KaTeX parse error: Expected 'EOF', got '\ ' at position 27: …l_config.pbtxt \̲ ̲  </span>--in…{SPLIT}_eval_metrics/${SPLIT}_input_config.pbtxt  #输入的路径

复制代码

首先来看下主程序  models/research/object_detection/metrics/offline_eval_map_corloc.py

复制代码

  import csv
  import os
  import re
  import tensorflow as tf

 
 

  from object_detection import evaluator
  from object_detection.core import standard_fields
  from object_detection.metrics import tf_example_parser
  from object_detection.utils import config_util
  from object_detection.utils import label_map_util

...
def read_data_and_evaluate(input_config, eval_config):

def write_metrics(metrics, output_dir):
...
def main(argv): del argv required_flags = ['input_config_path', 'eval_config_path', 'eval_dir'] #对应输入的三个参数 for flag_name in required_flags: if not getattr(FLAGS, flag_name): raise ValueError('Flag --{} is required'.format(flag_name))

configs = config_util.get_configs_from_multiple_files(
eval_input_config_path=FLAGS.input_config_path,
eval_config_path=FLAGS.eval_config_path)

eval_config = configs[‘eval_config’]
input_config = configs[‘eval_input_config’]

metrics = read_data_and_evaluate(input_config, eval_config)    #主要实现部分在这里

# Save metrics
write_metrics(metrics, FLAGS.eval_dir)

复制代码

具体来看下

  read_data_and_evaluate(input_config, eval_config):

复制代码
def read_data_and_evaluate(input_config, eval_config):
  """Reads pre-computed object detections and groundtruth from tf_record.

Args:
input_config: input config proto of type  输入配置文件
object_detection.protos.InputReader.
eval_config: evaluation config proto of type 评估配置文件
object_detection.protos.EvalConfig.

Returns:
Evaluated detections metrics.  返回:评估结果

Raises:
ValueError: if input_reader type is not supported or metric type is unknown.
“”"
if input_config.WhichOneof(‘input_reader’) == ‘tf_record_input_reader’:
input_paths = input_config.tf_record_input_reader.input_path

label_map </span>=<span style="color: #000000;"> label_map_util.load_labelmap(input_config.label_map_path)#载入label_map
max_num_classes </span>= max([item.id <span style="color: #0000ff;">for</span> item <span style="color: #0000ff;">in</span><span style="color: #000000;"> label_map.item])      #获得最大的类别对应id (545)
categories </span>=<span style="color: #000000;"> label_map_util.convert_label_map_to_categories(
    label_map, max_num_classes)                       #list类型,eg. categories[110]={'id':111,'name':'Butterfly'}

object_detection_evaluators </span>=<span style="color: #000000;"> evaluator.get_evaluators(
    eval_config, categories)
</span><span style="color: #008000;">#</span><span style="color: #008000;"> Support a single evaluator</span>
object_detection_evaluator =<span style="color: #000000;"> object_detection_evaluators[0]      #对应object_detection_evaluation.OpenImagesDetectionEvaluator

skipped_images </span>=<span style="color: #000000;"> 0
processed_images </span>=<span style="color: #000000;"> 0
</span><span style="color: #0000ff;">for</span> input_path <span style="color: #0000ff;">in</span><span style="color: #000000;"> _generate_filenames(input_paths):
  tf.logging.info(</span><span style="color: #800000;">'</span><span style="color: #800000;">Processing file: {0}</span><span style="color: #800000;">'</span><span style="color: #000000;">.format(input_path))

  record_iterator </span>= tf.python_io.tf_record_iterator(path=<span style="color: #000000;">input_path)  #读取 validation_detection.tfrecord
  data_parser </span>=<span style="color: #000000;"> tf_example_parser.TfExampleDetectionAndGTParser()

  </span><span style="color: #0000ff;">for</span> string_record <span style="color: #0000ff;">in</span><span style="color: #000000;"> record_iterator:                   #迭代器,一共238个测试样本,每次读取一个样本检测结果
    tf.logging.log_every_n(tf.logging.INFO, </span><span style="color: #800000;">'</span><span style="color: #800000;">Processed %d images...</span><span style="color: #800000;">'</span>, 1000<span style="color: #000000;">,
                           processed_images)
    processed_images </span>+= 1<span style="color: #000000;">

    example </span>=<span style="color: #000000;"> tf.train.Example()
    example.ParseFromString(string_record)                #解析TFrecord--&gt; example.features.feature 中以字典形式存放数据
    decoded_dict </span>=<span style="color: #000000;"> data_parser.parse(example)              #对TFrecord进一步解析,还原:groundtruth_boxes、groundtruth_classes、detection_boxes、detection_classes、detection_scores<br>                                          <br>

    </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> decoded_dict:<br>     <strong>#对应 object_detection/utils/object_detection_evaluation.py 中的 class OpenImagesDetectionEvaluator(),默认iou_threshold=0.5</strong>
      object_detection_evaluator.add_single_ground_truth_image_info( 
          decoded_dict[standard_fields.DetectionResultFields.key],
          decoded_dict)
      object_detection_evaluator.add_single_detected_image_info(
          decoded_dict[standard_fields.DetectionResultFields.key],
          decoded_dict)
    </span><span style="color: #0000ff;">else</span><span style="color: #000000;">:
      skipped_images </span>+= 1<span style="color: #000000;">
      tf.logging.info(</span><span style="color: #800000;">'</span><span style="color: #800000;">Skipped images: {0}</span><span style="color: #800000;">'</span><span style="color: #000000;">.format(skipped_images))

</span><span style="color: #0000ff;">return</span><em><strong><span style="color: #000000;"> object_detection_evaluator.evaluate()

raise ValueError(‘Unsupported input_reader_config.’)

复制代码

可以看出,主要的评测过程又放在了 

  object_detection/utils/object_detection_evaluation.py

第一个是 class OpenImagesDetectionEvaluator(ObjectDetectionEvaluator): 继承自  class ObjectDetectionEvaluator(DetectionEvaluator) ,而这个class 又继承自 class DetectionEvaluator(object)

所以我们从上往下看这几个函数,先是基类 DetectionEvaluator(object):Line:42

复制代码
class DetectionEvaluator(object):
  """Interface for object detection evalution classes.

Example usage of the Evaluator:

evaluator = DetectionEvaluator(categories)
                            即挨个添加 GT和detections ,最后一起evaluate()

Detections and groundtruth for image 1.

evaluator.add_single_groundtruth_image_info(…)
evaluator.add_single_detected_image_info(…)

Detections and groundtruth for image 2.

evaluator.add_single_groundtruth_image_info(…)
evaluator.add_single_detected_image_info(…)

metrics_dict = evaluator.evaluate()
“”"
metaclass = ABCMeta

def init(self, categories):
“”"Constructor.

Args:
  categories: A list of dicts, each of which has the following keys -
    'id': (required) an integer id uniquely identifying this category.
    'name': (required) string representing category name e.g., 'cat', 'dog'.
</span><span style="color: #800000;">"""</span><span style="color: #000000;">
self._categories </span>=<span style="color: #000000;"> categories

@abstractmethod
def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
“”"Adds groundtruth for a single image to be used for evaluation.

Args:
  image_id: A unique string/integer identifier for the image.
  groundtruth_dict: A dictionary of groundtruth numpy arrays required
    for evaluations.
</span><span style="color: #800000;">"""</span>
<span style="color: #0000ff;">pass</span><span style="color: #000000;">

@abstractmethod
def add_single_detected_image_info(self, image_id, detections_dict):
“”"Adds detections for a single image to be used for evaluation.

Args:
  image_id: A unique string/integer identifier for the image.
  detections_dict: A dictionary of detection numpy arrays required
    for evaluation.
</span><span style="color: #800000;">"""</span>
<span style="color: #0000ff;">pass</span><span style="color: #000000;">

@abstractmethod
def evaluate(self):
“”“Evaluates detections and returns a dictionary of metrics.”""
pass

@abstractmethod
def clear(self):
“”“Clears the state to prepare for a fresh evaluation.”""
pass

复制代码

然后class ObjectDetectionEvaluator(DetectionEvaluator)  Line:104class ObjectDetectionEvaluator(DetectionEvaluator):

复制代码
"""A class to evaluate detections."""

def init(self,
categories,
matching_iou_threshold=0.5,
evaluate_corlocs=False,
metric_prefix=None,
use_weighted_mean_ap=False,
evaluate_masks=False):
“”“Constructor.
Args:
xxxxxx
Raises:
ValueError: If the category ids are not 1-indexed.
“””
  …#这个地方是最关键的,后面会一直用到
self._evaluation = ObjectDetectionEvaluation(
num_groundtruth_classes=self._num_classes,
matching_iou_threshold=self._matching_iou_threshold,
use_weighted_mean_ap=self._use_weighted_mean_ap,
label_id_offset=self._label_id_offset)

def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
“”“Adds groundtruth for a single image to be used for evaluation.
“””
…略
  self._evaluation.add_single_ground_truth_image_info(xxx)
  
 def add_single_detected_image_info(self, image_id, detections_dict):
“”"Adds detections for a single image to be used for evaluation.

</span><span style="color: #800000;">"""</span><span style="color: #000000;">
...<br></span></pre>
  self._evaluation.add_detected_image_info(xxx)
 
 def evaluate(self): """Compute evaluation result. """ ...
  
(per_class_ap, mean_ap, _, _, per_class_corloc, mean_corloc) = (self._evaluation.evaluate())
    ...
def clear(self): """Clears the state to prepare for a fresh evaluation.""" self._evaluation = ObjectDetectionEvaluation( num_groundtruth_classes=self._num_classes, matching_iou_threshold=self._matching_iou_threshold, use_weighted_mean_ap=self._use_weighted_mean_ap, label_id_offset=self._label_id_offset) self._image_ids.clear()
复制代码

最后是OpenImagesDetectionEvaluator(ObjectDetectionEvaluator)    Line:376

复制代码
class OpenImagesDetectionEvaluator(ObjectDetectionEvaluator):
  """A class to evaluate detections using Open Images V2 metrics.
Open Images V2 introduce group_of type of bounding boxes and this metric
handles those boxes appropriately.

“”"

def init(self,
categories,
matching_iou_threshold=0.5,
evaluate_corlocs=False):
“”"Constructor.

Args:
  categories: A list of dicts, each of which has the following keys -
    'id': (required) an integer id uniquely identifying this category.
    'name': (required) string representing category name e.g., 'cat', 'dog'.
  matching_iou_threshold: IOU threshold to use for matching groundtruth
    boxes to detection boxes.
  evaluate_corlocs: if True, additionally evaluates and returns CorLoc.
</span><span style="color: #800000;">"""</span><span style="color: #000000;">
super(OpenImagesDetectionEvaluator, self).</span><span style="color: #800080;">__init__</span><span style="color: #000000;">(
    categories,
    matching_iou_threshold,
    evaluate_corlocs,
    metric_prefix</span>=<span style="color: #800000;">'</span><span style="color: #800000;">OpenImagesV2</span><span style="color: #800000;">'</span><span style="color: #000000;">)

def add_single_ground_truth_image_info(self, image_id, groundtruth_dict):
“”"Adds groundtruth for a single image to be used for evaluation.

</span><span style="color: #800000;">"""</span>
<span style="color: #0000ff;">if</span> image_id <span style="color: #0000ff;">in</span><span style="color: #000000;"> self._image_ids:
  </span><span style="color: #0000ff;">raise</span> ValueError(<span style="color: #800000;">'</span><span style="color: #800000;">Image with id {} already added.</span><span style="color: #800000;">'</span><span style="color: #000000;">.format(image_id))

groundtruth_classes </span>=<span style="color: #000000;"> (
    groundtruth_dict[standard_fields.InputDataFields.groundtruth_classes] </span>-<span style="color: #000000;">
    self._label_id_offset)
</span><span style="color: #008000;">#</span><span style="color: #008000;"> If the key is not present in the groundtruth_dict or the array is empty</span>
<span style="color: #008000;">#</span><span style="color: #008000;"> (unless there are no annotations for the groundtruth on this image)</span>
<span style="color: #008000;">#</span><span style="color: #008000;"> use values from the dictionary or insert None otherwise.</span>
<span style="color: #0000ff;">if</span> (standard_fields.InputDataFields.groundtruth_group_of <span style="color: #0000ff;">in</span><span style="color: #000000;">
    groundtruth_dict.keys() </span><span style="color: #0000ff;">and</span><span style="color: #000000;">
    (groundtruth_dict[standard_fields.InputDataFields.groundtruth_group_of]
     .size </span><span style="color: #0000ff;">or</span> <span style="color: #0000ff;">not</span><span style="color: #000000;"> groundtruth_classes.size)):
  groundtruth_group_of </span>=<span style="color: #000000;"> groundtruth_dict[
      standard_fields.InputDataFields.groundtruth_group_of]
</span><span style="color: #0000ff;">else</span><span style="color: #000000;">:
  groundtruth_group_of </span>=<span style="color: #000000;"> None
  </span><span style="color: #0000ff;">if</span> <span style="color: #0000ff;">not</span> len(self._image_ids) % 1000<span style="color: #000000;">:
    logging.warn(
        </span><span style="color: #800000;">'</span><span style="color: #800000;">image %s does not have groundtruth group_of flag specified</span><span style="color: #800000;">'</span><span style="color: #000000;">,
        image_id)
self._evaluation.add_single_ground_truth_image_info(
    image_id,
    groundtruth_dict[standard_fields.InputDataFields.groundtruth_boxes],
    groundtruth_classes,
    groundtruth_is_difficult_list</span>=<span style="color: #000000;">None,
    groundtruth_is_group_of_list</span>=<span style="color: #000000;">groundtruth_group_of)
self._image_ids.update([image_id])<br></span></pre>
复制代码

可以看出,这里只是修改了add_single_ground_truth_image_info 函数,其他都没变。而在其父类中,有把主要的工作交给了 class ObjectDetectionEvaluation(object) 来处理,这下整个代码在逐渐清晰起来。我最后会画个程序包含关系图,可能更容易理解些。

下面整个才是主要的保存 GT 和 Detection 结果的部分哦!!!

复制代码
class ObjectDetectionEvaluation(object):
  """Internal implementation of Pascal object detection metrics."""

def init(self,num_groundtruth_classes,matching_iou_threshold=0.5,nms_iou_threshold=1.0,nms_max_output_boxes=10000,use_weighted_mean_ap=False,label_id_offset=0):
if num_groundtruth_classes < 1:
raise ValueError(‘Need at least 1 groundtruth class for evaluation.’)

<span style="font-size: 15px;"><strong>self.per_image_eval </strong></span></span><span style="font-size: 15px;"><strong>=</strong></span><span style="color: #000000;"><span style="font-size: 15px;"><strong> per_image_evaluation.PerImageEvaluation</strong></span>(
    num_groundtruth_classes</span>=<span style="color: #000000;">num_groundtruth_classes,
    matching_iou_threshold</span>=<span style="color: #000000;">matching_iou_threshold,
    nms_iou_threshold</span>=<span style="color: #000000;">nms_iou_threshold,
    nms_max_output_boxes</span>=<span style="color: #000000;">nms_max_output_boxes)
</span><span style="color: #0000ff;">def</span><span style="color: #000000;"> clear_detections(self):
self._initialize_detections()

def add_single_ground_truth_image_info(self,image_key, groundtruth_boxes, groundtruth_class_labels, groundtruth_is_difficult_list=None, groundtruth_is_group_of_list=None, groundtruth_masks=None):
def add_single_detected_image_info(self, image_key, detected_boxes,detected_scores, detected_class_labels,detected_masks=None):

scores, tp_fp_labels, is_class_correctly_detected_in_image = (
self.per_image_eval.compute_object_detection_metrics(
detected_boxes=detected_boxes,
detected_scores=detected_scores,
detected_class_labels=detected_class_labels,
groundtruth_boxes=groundtruth_boxes,
groundtruth_class_labels=groundtruth_class_labels,
groundtruth_is_difficult_list=groundtruth_is_difficult_list,
groundtruth_is_group_of_list=groundtruth_is_group_of_list,
detected_masks=detected_masks,
groundtruth_masks=groundtruth_masks))

</span><span style="color: #0000ff;">for</span> i <span style="color: #0000ff;">in</span><span style="color: #000000;"> range(self.num_class):
  </span><span style="color: #0000ff;">if</span> scores[i].shape[0] &gt;<span style="color: #000000;"> 0:
    self.scores_per_class[i].append(scores[i])
    self.tp_fp_labels_per_class[i].append(tp_fp_labels[i])
(self.num_images_correctly_detected_per_class
) </span>+=<span style="color: #000000;"> is_class_correctly_detected_in_image<br><br></span>

def evaluate(self):
“”"Compute evaluation result.

Returns:
  A named tuple with the following fields -
    average_precision: float numpy array of average precision for
        each class.
    mean_ap: mean average precision of all classes, float scalar
    precisions: List of precisions, each precision is a float numpy
        array
    recalls: List of recalls, each recall is a float numpy array
    corloc: numpy float array
    mean_corloc: Mean CorLoc score for each class, float scalar
</span><span style="color: #800000;">"""</span>
scores = np.concatenate(self.scores_per_class[class_index]) tp_fp_labels = np.concatenate(self.tp_fp_labels_per_class[class_index]) precision, recall = metrics.compute_precision_recall( scores, tp_fp_labels, self.num_gt_instances_per_class[class_index]) self.precisions_per_class.append(precision) self.recalls_per_class.append(recall) average_precision = metrics.compute_average_precision(precision, recall) self.average_precision_per_class[class_index] = average_precision
self.corloc_per_class </span>=<span style="color: #000000;"> metrics.compute_cor_loc(
    self.num_gt_imgs_per_class,
    self.num_images_correctly_detected_per_class)<br></span><span style="color: #000000;">
mean_ap </span>=<span style="color: #000000;"> np.nanmean(self.average_precision_per_class)
mean_corloc </span>=<span style="color: #000000;"> np.nanmean(self.corloc_per_class)
</span><span style="color: #0000ff;">return</span><span style="color: #000000;"> ObjectDetectionEvalMetrics(
    self.average_precision_per_class, mean_ap, self.precisions_per_class,
    self.recalls_per_class, self.corloc_per_class, mean_corloc)</span></pre>
复制代码

我把不重要的部分都剃掉了,主要有两个重要的函数 1. object_detection/utils/per_image_evaluatuion.py  计算单张图的precision和recall

                        2. object_detection/utils/metrics.py          统计上述结果,并计算mAP等数值

1. object_detection/utils/per_image_evaluatuion.py  计算单张图的precision和recall

 

复制代码
scores, tp_fp_labels, is_class_correctly_detected_in_image = compute_object_detection_metrics(...)
-->
      scores, tp_fp_labels = self._compute_tp_fp(...) 
      -->for i in range(self.num_groundtruth_classes):
               scores, tp_fp_labels = self._compute_tp_fp_for_single_class(...)
               -->(iou, ioa, scores,num_detected_boxes) = self._get_overlaps_and_scores_box_mode(...)
                   -->detected_boxlist = np_box_list_ops.non_max_suppression(...)
         -->
复制代码

 

0
0
« 上一篇: 双系统,重装ubuntu后无法进入windows
» 下一篇: [ERROR] 安装完Detectron后出现 cython_nms.so: undefined symbol: PyFPE_jbuf
	</div>
	<p class="postfoot">
		posted on <span id="post-date">2018-04-10 16:11</span> <a href="https://www.cnblogs.com/caffeaoto/">caffeauto</a> 阅读(<span id="post_view_count">1518</span>) 评论(<span id="post_comment_count">2</span>)  <a href="https://i.cnblogs.com/EditPosts.aspx?postid=8758962" rel="nofollow">编辑</a> <a href="#" "AddToWz(8758962);return false;">收藏</a>
	</p>
</div>
<script type="text/javascript">var allowComments=true,cb_blogId=336677,cb_entryId=8758962,cb_blogApp=currentBlogApp,cb_blogUserGuid='5299712e-ab0b-e611-9fc1-ac853d9f53cc',cb_entryCreatedDate='2018/4/10 16:11:00';loadViewCount(cb_entryId);var cb_postType=1;var isMarkdown=false;</script>

</div><a name="!comments"></a><div id="blog-comments-placeholder"><div id="comments_pager_top"></div>

评论

		<div class="post">
			<h2>
				<a href="#3996063" class="layer">#1楼</a><a name="3996063" id="comment_anchor_3996063"></a>
				&nbsp;&nbsp;<span class="comment_actions"></span>
			</h2>
			<div id="comment_body_3996063" class="blog_comment_body">博主,我想问下,你的那个第一步生成config配置文件的eval_config.pbtxt是填写什么?还请指导下</div><div class="comment_vote"><a href="javascript:void(0);" class="comment_digg" "return voteComment(3996063,'Digg',this)">支持(0)</a><a href="javascript:void(0);" class="comment_bury" "return voteComment(3996063,'Bury',this)">反对(0)</a></div>
			<div class="postfoot">
				 <span class="comment_date">2018-06-11 19:12</span> | <a id="a_comment_author_3996063" href="http://home.cnblogs.com/u/1010645/" target="_blank">Zoe_启</a> <a href="http://msg.cnblogs.com/send/Zoe_%E5%90%AF" title="发送站内短消息" class="sendMsg2This">&nbsp;</a>
			</div>
		</div>
	
		<div class="post">
			<h2>
				<a href="#3996708" class="layer">#2楼</a><a name="3996708" id="comment_anchor_3996708"></a>[<span class="louzhu">楼主</span>]<span id="comment-maxId" style="display:none;">3996708</span><span id="comment-maxDate" style="display:none;">2018/6/12 15:57:56</span>
				&nbsp;&nbsp;<span class="comment_actions"></span>
			</h2>
			<div id="comment_body_3996708" class="blog_comment_body"><a href="#3996063" title="查看所回复的评论" "commentManager.renderComments(0,50,3996063);">@</a>

Zoe_启
metrics_set: 'open_images_detection_metrics

就是上面这个内容啊

http://pic.cnblogs.com/face/945479/20170302164420.png

2018-06-12 15:57 | caffeauto  

您可能感兴趣的与本文相关的镜像

TensorFlow-v2.15

TensorFlow-v2.15

TensorFlow

TensorFlow 是由Google Brain 团队开发的开源机器学习框架,广泛应用于深度学习研究和生产环境。 它提供了一个灵活的平台,用于构建和训练各种机器学习模型

""" mlperf inference benchmarking tool """ from __future__ import division from __future__ import print_function from __future__ import unicode_literals # from memory_profiler import profile import argparse import array import collections import json import logging import os import sys import threading import time from multiprocessing import JoinableQueue import sklearn import star_loadgen as lg import numpy as np from mindspore import context from mindspore.train.model import Model from mindspore.train.serialization import load_checkpoint, load_param_into_net from src.model_utils.device_adapter import get_device_num, get_rank_id,get_device_id import os from mindspore import Model, context from mindspore.train.serialization import load_checkpoint, load_param_into_net,\ build_searched_strategy, merge_sliced_parameter from src.wide_and_deep import PredictWithSigmoid, TrainStepWrap, NetWithLossClass, WideDeepModel from src.callbacks import LossCallBack from src.datasets import create_dataset, DataType from src.metrics import AUCMetric from src.model_utils.moxing_adapter import moxing_wrapper from src.metrics import AUCMetric from src.callbacks import EvalCallBack import src.wide_and_deep as wide_deep import src.datasets as datasets from src.model_utils.config import config # from pygcbs_client.task import Task # task = Task() # config = task.config context.set_context(mode=context.GRAPH_MODE, device_target=config.device_target , device_id = 1) print(config.device_id) batch_size = config.batch_size def add_write(file_path, print_str): with open(file_path, 'a+', encoding='utf-8') as file_out: file_out.write(print_str + '\n') def get_WideDeep_net(config): """ Get network of wide&deep model. """ WideDeep_net = WideDeepModel(config) loss_net = NetWithLossClass(WideDeep_net, config) train_net = TrainStepWrap(loss_net) eval_net = PredictWithSigmoid(WideDeep_net) return train_net, eval_net class ModelBuilder(): """ Wide and deep model builder """ def __init__(self): pass def get_hook(self): pass def get_train_hook(self): hooks = [] callback = LossCallBack() hooks.append(callback) if int(os.getenv('DEVICE_ID')) == 0: pass return hooks def get_net(self, config): return get_WideDeep_net(config) logging.basicConfig(level=logging.INFO) log = logging.getLogger("main") NANO_SEC = 1e9 MILLI_SEC = 1000 # pylint: disable=missing-docstring # the datasets we support SUPPORTED_DATASETS = { "debug": (datasets.Dataset, wide_deep.pre_process_criteo_wide_deep, wide_deep.WideDeepPostProcess(), {"randomize": 'total', "memory_map": True}), "multihot-criteo-sample": (wide_deep.WideDeepModel, wide_deep.pre_process_criteo_wide_deep, wide_deep.WideDeepPostProcess(), {"randomize": 'total', "memory_map": True}), "kaggle-criteo": (wide_deep.WideDeepModel, wide_deep.pre_process_criteo_wide_deep, wide_deep.WideDeepPostProcess(), {"randomize": 'total', "memory_map": True}), } # pre-defined command line options so simplify things. They are used as defaults and can be # overwritten from command line SUPPORTED_PROFILES = { "defaults": { "dataset": "multihot-criteo", "inputs": "continuous and categorical features", "outputs": "probability", "backend": "mindspore-native", "model": "wide_deep", "max-batchsize": 2048, }, "dlrm-debug-mindspore": { "dataset": "debug", "inputs": "continuous and categorical features", "outputs": "probability", "backend": "pytorch-native", "model": "dlrm", "max-batchsize": 128, }, "dlrm-multihot-sample-mindspore": { "dataset": "multihot-criteo-sample", "inputs": "continuous and categorical features", "outputs": "probability", "backend": "pytorch-native", "model": "dlrm", "max-batchsize": 2048, }, "dlrm-multihot-mindspore": { "dataset": "multihot-criteo", "inputs": "continuous and categorical features", "outputs": "probability", "backend": "pytorch-native", "model": "dlrm", "max-batchsize": 2048, } } SCENARIO_MAP = { "SingleStream": lg.TestScenario.SingleStream, "MultiStream": lg.TestScenario.MultiStream, "Server": lg.TestScenario.Server, "Offline": lg.TestScenario.Offline, } last_timeing = [] import copy class Item: """An item that we queue for processing by the thread pool.""" def __init__(self, query_id, content_id, features, batch_T=None, idx_offsets=None): self.query_id = query_id self.content_id = content_id self.data = features self.batch_T = batch_T self.idx_offsets = idx_offsets self.start = time.time() import mindspore.dataset as mds lock = threading.Lock() class RunnerBase: def __init__(self, model, ds, threads, post_proc=None): self.take_accuracy = False self.ds = ds self.model = model self.post_process = post_proc self.threads = threads self.result_timing = [] def handle_tasks(self, tasks_queue): pass def start_run(self, result_dict, take_accuracy): self.result_dict = result_dict self.result_timing = [] self.take_accuracy = take_accuracy self.post_process.start() def run_one_item(self, qitem): # run the prediction processed_results = [] try: lock.acquire() data = datasets._get_mindrecord_dataset(*qitem.data) t1 = time.time() results = self.model.eval(data) self.result_timing.append(time.time() - t1) print('##################',time.time()) lock.release() processed_results = self.post_process(results, qitem.batch_T, self.result_dict) # self.post_process.add_results(, ) # g_lables.extend(self.model.auc_metric.true_labels) # g_predicts.extend(self.model.auc_metric.pred_probs) g_lables.extend(self.model.auc_metric.true_labels) g_predicts.extend(self.model.auc_metric.pred_probs) except Exception as ex: # pylint: disable=broad-except log.error("thread: failed, %s", ex) # since post_process will not run, fake empty responses processed_results = [[]] * len(qitem.query_id) finally: response_array_refs = [] response = [] for idx, query_id in enumerate(qitem.query_id): # NOTE: processed_results returned by DlrmPostProcess store both # result = processed_results[idx][0] and target = processed_results[idx][1] # also each idx might be a query of samples, rather than a single sample # depending on the --samples-to-aggregate* arguments. # debug prints # print("s,e:",s_idx,e_idx, len(processed_results)) response_array = array.array("B", np.array(processed_results[0:1], np.float32).tobytes()) response_array_refs.append(response_array) bi = response_array.buffer_info() response.append(lg.QuerySampleResponse(query_id, bi[0], bi[1])) lg.QuerySamplesComplete(response) def enqueue(self, query_samples): idx = [q.index for q in query_samples] query_id = [q.id for q in query_samples] # print(idx) query_len = len(query_samples) # if query_len < self.max_batchsize: # samples = self.ds.get_samples(idx) # # batch_T = [self.ds.get_labels(sample) for sample in samples] # self.run_one_item(Item(query_id, idx, samples)) # else: bs = 1 for i in range(0, query_len, bs): ie = min(i + bs, query_len) samples = self.ds.get_samples(idx[i:ie]) # batch_T = [self.ds.get_labels(sample) for sample in samples] self.run_one_item(Item(query_id[i:ie], idx[i:ie], samples)) def finish(self): pass import threading class MyQueue: def __init__(self, *args, **kwargs): self.lock = threading.Lock() self._data = [] self.status = True def put(self, value): # self.lock.acquire() self._data.append(value) # self.lock.release() def get(self): if self.status and self._data: return self._data.pop(0) if self.status: while self._data: time.sleep(0.1) return self._data.pop(0) return None # self.lock.acquire() # return self._data.pop(0) # self.lock.release() def task_done(self, *args, **kwargs): self.status = False class QueueRunner(RunnerBase): def __init__(self, model, ds, threads, post_proc=None): super().__init__(model, ds, threads, post_proc) queue_size_multiplier = 4 # (args.samples_per_query_offline + max_batchsize - 1) // max_batchsize) self.tasks = JoinableQueue(maxsize=threads * queue_size_multiplier) self.workers = [] self.result_dict = {} for _ in range(self.threads): worker = threading.Thread(target=self.handle_tasks, args=(self.tasks,)) worker.daemon = True self.workers.append(worker) worker.start() def handle_tasks(self, tasks_queue): """Worker thread.""" while True: qitem = tasks_queue.get() if qitem is None: # None in the queue indicates the parent want us to exit tasks_queue.task_done() break self.run_one_item(qitem) tasks_queue.task_done() def enqueue(self, query_samples): idx = [q.index for q in query_samples] query_id = [q.id for q in query_samples] query_len = len(query_samples) # print(idx) # if query_len < self.max_batchsize: # samples = self.ds.get_samples(idx) # # batch_T = [self.ds.get_labels(sample) for sample in samples] # data = Item(query_id, idx, samples) # self.tasks.put(data) # else: bs = 1 for i in range(0, query_len, bs): ie = min(i + bs, query_len) samples = self.ds.get_samples(idx) # batch_T = [self.ds.get_labels(sample) for sample in samples] self.tasks.put(Item(query_id[i:ie], idx[i:ie], samples)) def finish(self): # exit all threads for _ in self.workers: self.tasks.put(None) for worker in self.workers: worker.join() def add_results(final_results, name, result_dict, result_list, took, show_accuracy=False): percentiles = [50., 80., 90., 95., 99., 99.9] buckets = np.percentile(result_list, percentiles).tolist() buckets_str = ",".join(["{}:{:.4f}".format(p, b) for p, b in zip(percentiles, buckets)]) if result_dict["total"] == 0: result_dict["total"] = len(result_list) # this is what we record for each run result = { "took": took, "mean": np.mean(result_list), "percentiles": {str(k): v for k, v in zip(percentiles, buckets)}, "qps": len(result_list) / took, "count": len(result_list), "good_items": result_dict["good"], "total_items": result_dict["total"], } acc_str = "" if show_accuracy: result["accuracy"] = 100. * result_dict["good"] / result_dict["total"] acc_str = ", acc={:.3f}%".format(result["accuracy"]) if "roc_auc" in result_dict: result["roc_auc"] = 100. * result_dict["roc_auc"] acc_str += ", auc={:.3f}%".format(result["roc_auc"]) # add the result to the result dict final_results[name] = result # to stdout print("{} qps={:.2f}, mean={:.4f}, time={:.3f}{}, queries={}, tiles={}".format( name, result["qps"], result["mean"], took, acc_str, len(result_list), buckets_str)) lock = threading.Lock() def append_file(file_path, data): with lock: my_string = ','.join(str(f) for f in data) with open(file_path, 'a+') as file: file.write(my_string + '\n') def read_file(file_path): lines_as_lists = [] with open(file_path, 'r') as file: for line in file: # 去除行尾的换行符,并将行分割成列表 lines_as_lists.extend([float(num) for num in line.strip().split(',')]) return lines_as_lists def get_score(model, quality: float, performance: float): print(model, quality,performance) try: score =0 if model["scenario"] == 'SingleStream'or model["scenario"]== 'MultiStream': if quality >= model["accuracy"]: score = model["baseline_performance"] / (performance + 1e-9) * model["base_score"] else: score ==0 elif model["scenario"]== 'Server' or model["scenario"] == 'Offline': if quality >= model["accuracy"]: score = performance * model["base_score"] / model["baseline_performance"] print(model["baseline_performance"]) else: score ==0 except Exception as e: score ==0 finally: return score def main(): global last_timeing # 初始化时清空文件 global g_lables global g_predicts # args = get_args() g_lables=[] g_predicts=[] # # dataset to use wanted_dataset, pre_proc, post_proc, kwargs = SUPPORTED_DATASETS["debug"] # # --count-samples can be used to limit the number of samples used for testing ds = wanted_dataset(directory=config.dataset_path, train_mode=False, epochs=15, line_per_sample=1000, batch_size=config.test_batch_size, data_type=DataType.MINDRECORD,total_size = config.total_size) # # load model to backend # model = backend.load(args.model_path, inputs=args.inputs, outputs=args.outputs) net_builder = ModelBuilder() train_net, eval_net = net_builder.get_net(config) # ckpt_path = config.ckpt_path param_dict = load_checkpoint(config.ckpt_path) load_param_into_net(eval_net, param_dict) train_net.set_train() eval_net.set_train(False) # acc_metric = AccMetric() # model = Model(train_net, eval_network=eval_net, metrics={"acc": acc_metric}) auc_metric1 = AUCMetric() model = Model(train_net, eval_network=eval_net, metrics={"auc": auc_metric1}) model.auc_metric = auc_metric1 # res = model.eval(ds_eval) final_results = { "runtime": "wide_deep_mindspore", "version": "v2", "time": int(time.time()), "cmdline": str(config), } mlperf_conf = os.path.abspath(config.mlperf_conf) if not os.path.exists(mlperf_conf): log.error("{} not found".format(mlperf_conf)) sys.exit(1) user_conf = os.path.abspath(config.user_conf) if not os.path.exists(user_conf): log.error("{} not found".format(user_conf)) sys.exit(1) if config.output: output_dir = os.path.abspath(config.output) os.makedirs(output_dir, exist_ok=True) os.chdir(output_dir) # # make one pass over the dataset to validate accuracy # count = ds.get_item_count() count = 5 base_score = config.config.base_score #task. accuracy = config.config.accuracy #task. baseline_performance = config.baseline_performance scenario_str = config["scenario"] #task. scenario = SCENARIO_MAP[scenario_str] runner_map = { lg.TestScenario.SingleStream: RunnerBase, lg.TestScenario.MultiStream: QueueRunner, lg.TestScenario.Server: QueueRunner, lg.TestScenario.Offline: QueueRunner } runner = runner_map[scenario](model, ds, config.threads_count, post_proc=post_proc) def issue_queries(query_samples): runner.enqueue(query_samples) def flush_queries(): pass settings = lg.TestSettings() settings.FromConfig(mlperf_conf, config.model_path, config.scenario) settings.FromConfig(user_conf, config.model_path, config.scenario) settings.scenario = scenario settings.mode = lg.TestMode.AccuracyOnly sut = lg.ConstructSUT(issue_queries, flush_queries) qsl = lg.ConstructQSL(count, config.performance_count, ds.load_query_samples, ds.unload_query_samples) log.info("starting {}".format(scenario)) result_dict = {"good": 0, "total": 0, "roc_auc": 0, "scenario": str(scenario)} runner.start_run(result_dict, config.accuracy) lg.StartTest(sut, qsl, settings) result_dict["good"] = runner.post_process.good result_dict["total"] = runner.post_process.total last_timeing = runner.result_timing post_proc.finalize(result_dict) add_results(final_results, "{}".format(scenario), result_dict, last_timeing, time.time() - ds.last_loaded, config.accuracy) runner.finish() lg.DestroyQSL(qsl) lg.DestroySUT(sut) # If multiple subprocesses are running the model send a signal to stop them if (int(os.environ.get("WORLD_SIZE", 1)) > 1): model.eval(None) from sklearn.metrics import roc_auc_score # labels = read_file(labels_path) # predicts = read_file(predicts_path) final_results['auc'] = sklearn.metrics.roc_auc_score(g_lables, g_predicts) print("auc+++++", final_results['auc']) NormMetric= { 'scenario': scenario_str, 'accuracy': accuracy, 'baseline_performance': baseline_performance, 'performance_unit': 's', 'base_score': base_score } # 打开文件 reprot_array=[] test_suit_array=[] test_suit_obj={} test_cases_array=[] test_cases_obj={} test_cases_obj["Name"] = config["task_name"] test_cases_obj["Performance Unit"] = config["performance_unit"] test_cases_obj["Total Duration"] = time.time() - ds.last_loaded test_cases_obj["Train Duration"] = None test_cases_obj["Training Info"] = { "Real Quality" :None, "Learning Rate" :None, "Base Quality" :None, "Epochs" :None, "Optimizer" :None } test_cases_obj["Software Versions"] = { "Python" :3.8, "Framework" :"Mindspore 2.2.14" } percentiles = [50., 80., 90., 95., 99., 99.9] buckets = np.percentile(last_timeing, percentiles).tolist() took = time.time() - ds.last_loaded qps = len(last_timeing) / took print(buckets) if scenario_str=="SingleStream": test_cases_obj["Performance Metric"] = buckets[2] if scenario_str=="MultiStream": test_cases_obj["Performance Metric"] = buckets[4] if scenario_str=="Server": test_cases_obj["Performance Metric"] = qps if scenario_str=="Offline": test_cases_obj["Performance Metric"] = qps score = get_score(NormMetric, final_results["auc"], test_cases_obj["Performance Metric"]) test_cases_obj["Score"] = score test_cases_array.append(test_cases_obj) test_suit_obj["Test Cases"] = test_cases_array test_suit_obj["Name"] = "Inference Suite" test_suit_array.append(test_suit_obj) test_obj = {"Test Suites": test_suit_array} reprot_array.append(test_obj) test_suit_result_obj = {"Name":"Inference Suite","Description":"inference model","Score":score } test_suit_result_array = [] test_suit_result_array.append(test_suit_result_obj) test_suit_result_obj1 = {"Test Suites Results":test_suit_result_array} reprot_array.append(test_suit_result_obj1) epoch_obj={ "epoch":None, "epoch_time":None, "train_loss":None, "metric":None, "metric_name":None } reprot_array.append(epoch_obj) result_final = {"Report Info": reprot_array } print("result_final", result_final) # task.save(result_final) if __name__ == "__main__": # try: # if task.config["is_run_infer"] == True: # main() # task.close() # else: # task.close() # raise ValueError # except Exception as e: # task.logger.error(e) # task.close(e) # task.logger.info("Finish ") # if task.config["is_run_infer"] == True: # print(config) main() # task.close() 这段代码的目的是什么?
06-17
(vlm) face8@jamesdeMac-Studio vlm % python train_vlm\ copy.py ✅ MPS设备可用 🛠️ 系统配置: - 设备: mps - 内存状态: 256.00GB 总计 - 训练参数: ScriptArguments(train_path='train.jsonl', valid_path='valid.jsonl', model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', output_dir='./output_lora_qwen25vl_instruct', per_device_train_batch_size=1, gradient_accumulation_steps=4, num_train_epochs=3, logging_steps=5, save_steps=100, eval_steps=100, image_size=672, learning_rate=2e-05, warmup_steps=50, weight_decay=0.01, lora_rank=16, lora_alpha=32, lora_dropout=0.05, fp16=False, bf16=False, max_steps=-1, gradient_checkpointing=True, seed=42, report_to='none', enable_mps_fallback=True, debug_mode=True, max_retries=3) Loading checkpoint shards: 100%|███| 5/5 [00:05<00:00, 1.03s/it] 🔧 已将模型移动到MPS设备 ✅ 已启用梯度检查点 trainable params: 35,090,432 || all params: 8,324,397,056 || trainable%: 0.4215 ✅ LoRA配置加载完成 ✅ 数据集加载完成: 路径=train.jsonl, 总行数=934, 有效样本=934 ✅ 数据集加载完成: 路径=valid.jsonl, 总行数=104, 有效样本=104 ✅ 数据集准备完成 /Users/face8/works/vlm/train_vlm copy.py:440: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Trainer.__init__`. Use `processing_class` instead. trainer = Trainer( ✅ Trainer创建完成 🔍 训练前环境检查: 🔍 收到 2 个样本 样本0: 图像=images/00316.jpg, 指令长度=141, 回答长度=4 样本1: 图像=images/00653.jpg, 指令长度=141, 回答长度=9 📊 批处理完成: 图像=2, 输入ID形状=torch.Size([2, 1024]) ✅ 环境检查通过 🚀 开始训练... 训练前内存使用: 总计=256.00GB, 已用=109.07GB, 可用=146.02GB Currently training with a batch size of: 1 ***** Running training ***** Num examples = 934 Num Epochs = 3 Instantaneous batch size per device = 1 Total train batch size (w. parallel, distributed & accumulation) = 4 Gradient Accumulation steps = 4 Total optimization steps = 702 Number of trainable parameters = 35,090,432 🚀 训练开始 训练开始时内存使用: 总计=256.00GB, 已用=108.17GB, 可用=146.92GB 🔄 开始第 0 轮训练 /Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/utils/data/dataloader.py:683: UserWarning: 'pin_memory' argument is set as true but not supported on MPS now, then device pinned memory won't be used. warnings.warn(warn_msg) 🔍 收到 1 个样本 样本0: 图像=images/00722.jpg, 指令长度=141, 回答长度=6 📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]) 🔍 收到 1 个样本 样本0: 图像=images/00689.jpg, 指令长度=141, 回答长度=3 📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]) 🔍 收到 1 个样本 样本0: 图像=images/00458.jpg, 指令长度=141, 回答长度=18 📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]) 🔍 收到 1 个样本 样本0: 图像=images/00915.jpg, 指令长度=141, 回答长度=12 📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]) 🔍 收到 1 个样本 样本0: 图像=images/00161.jpg, 指令长度=141, 回答长度=12 📊 批处理完成: 图像=1, 输入ID形状=torch.Size([1, 1024]) ❌ 训练过程中出错: 'NoneType' object is not iterable ❌ 主程序异常终止: 'NoneType' object is not iterable Traceback (most recent call last): File "/Users/face8/works/vlm/train_vlm copy.py", line 480, in <module> main() File "/Users/face8/works/vlm/train_vlm copy.py", line 466, in main trainer.train() File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/trainer.py", line 2207, in train return inner_training_loop( File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/trainer.py", line 2549, in _inner_training_loop tr_loss_step = self.training_step(model, inputs, num_items_in_batch) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/trainer.py", line 3750, in training_step loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/trainer.py", line 3837, in compute_loss outputs = model(**inputs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/peft/peft_model.py", line 1757, in forward return self.base_model( File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 193, in forward return self.model.forward(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/utils/generic.py", line 943, in wrapper output = func(self, *args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1487, in forward outputs = self.model( File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1228, in forward image_embeds = self.get_image_features(pixel_values, image_grid_thw) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1178, in get_image_features image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 441, in forward rotary_pos_emb = self.rot_pos_emb(grid_thw) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 361, in rot_pos_emb for t, h, w in grid_thw: TypeError: 'NoneType' object is not iterable
07-03
# pygcbs: # app_name: 'APP1' # master: '192.168.0.123' # port: 6789 # level: 'DEBUG' # interval: 1 # checklist: [ "System","CPU", "GPU","Mem","NPU", ] # save_path: "./" # docker: # pygcbs_image: nvidia-pygcbs:v1.0 # worker_image: nvidia-mindspore1.8.1:v1.0 # python_path: /opt/miniconda/bin/python # workers: # - '192.168.0.123:1' # socket_ifname: # - enp4s0 # tasks: #--------------------wide_deep--------------------------------- # - application_domain: "推荐" # task_framework: "Mindspore" # task_type: "推理" # task_name: "wide_deep_infer" # scenario: "SingleStream" # is_run_infer: True # project_path: '/home/gcbs/infer/wide_deep_infer' # main_path: "main.py" # dataset_path: '/home/gcbs/Dataset/wide_deep_data/' # times: 1 # 重试次数 #distribute do_eval: True is_distributed: False is_mhost: False exp_value: 0.501 #model log name: "wide_deep" Metrics: "AUC" request_auc: 0.74 dataset_name: "Criteo 1TB Click Logs Dataset" application: "推荐" standard_time: 3600 python_version: 3.8 mindspore_version: 1.8.1 # Builtin Configurations(DO NOT CHANGE THESE CONFIGURATIONS unless you know exactly what you are doing) enable_modelarts: False data_url: "" train_url: "" checkpoint_url: "" data_path: "./data" dataset_path: "/home/gcbs/Dataset/wide_deep_data/" output_path: "/cache/train" load_path: "/cache/checkpoint_path" device_target: GPU enable_profiling: False data_format: 1 total_size: 10000000 performance_count: 10 # argparse_init 'WideDeep' epochs: 15 full_batch: False batch_size: 16000 eval_batch_size: 16000 test_batch_size: 16000 field_size: 39 vocab_size: 200000 vocab_cache_size: 0 emb_dim: 80 deep_layer_dim: [1024, 512, 256, 128] deep_layer_act: 'relu' keep_prob: 1.0 dropout_flag: False ckpt_path: "./check_points" stra_ckpt: "./check_points" eval_file_name: "./output/eval.log" loss_file_name: "./output/loss.log" host_device_mix: 0 dataset_type: "mindrecord" parameter_server: 0 field_slice: False sparse: False use_sp: True deep_table_slice_mode: "column_slice" #star_logen config mlperf_conf: './test.conf' user_conf: './user.conf' output: '/tmp/code/' scenario: 'Offline' max_batchsize: 16000 threads: 4 model_path: "./check_points/widedeep_train-12_123328.ckpt" is_accuracy: False find_peak_performance: False duration: False target_qps: False count_queries: False samples_per_query_multistream: False max_latency: False samples_per_query_offline: 500 # WideDeepConfig #data_path: "./test_raw_data/" #vocab_cache_size: 100000 #stra_ckpt: './checkpoints/strategy.ckpt' weight_bias_init: ['normal', 'normal'] emb_init: 'normal' init_args: [-0.01, 0.01] l2_coef: 0.00008 # 8e-5 manual_shape: None # wide_and_deep export device_id: 1 ckpt_file: "./check_points/widedeep_train-12_123328.ckpt" file_name: "wide_and_deep" file_format: "MINDIR" # src/process_data.py "Get and Process datasets" raw_data_path: "./raw_data" # src/preprocess_data.py "Recommendation dataset" dense_dim: 13 slot_dim: 26 threshold: 100 train_line_count: 45840617 skip_id_convert: 0 # src/generate_synthetic_data.py 'Generate Synthetic Data' output_file: "./train.txt" label_dim: 2 number_examples: 4000000 vocabulary_size: 400000000 random_slot_values: 0 #get_score threads_count: 4 base_score: 1 accuracy: 0.72 baseline_performance: 1文件中的这些是什么?、
06-17
(vlm) face8@jamesdeMac-Studio vlm % python train_vlm\ copy.py 🖥️ 使用设备: mps 📊 可用内存: 181.22 GB 🛠️ 最终训练配置: ScriptArguments(train_path='train.jsonl', valid_path='valid.jsonl', model_name_or_path='Qwen/Qwen2.5-VL-7B', output_dir='./output_lora_qwen25vl', per_device_train_batch_size=1, gradient_accumulation_steps=4, num_train_epochs=3, logging_steps=5, save_steps=100, eval_steps=100, image_size=672, learning_rate=3e-05, warmup_steps=30, weight_decay=0.01, lora_rank=16, lora_alpha=32, lora_dropout=0.03, fp16=False, bf16=False, max_steps=-1, gradient_checkpointing=True) Traceback (most recent call last): File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status response.raise_for_status() File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/requests/models.py", line 1024, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/Qwen/Qwen2.5-VL-7B/resolve/main/processor_config.json The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/utils/hub.py", line 470, in cached_files hf_hub_download( File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1008, in hf_hub_download return _hf_hub_download_to_cache_dir( File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1115, in _hf_hub_download_to_cache_dir _raise_on_head_call_error(head_call_error, force_download, local_files_only) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1645, in _raise_on_head_call_error raise head_call_error File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1533, in _get_metadata_or_catch_error metadata = get_hf_file_metadata( File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1450, in get_hf_file_metadata r = _request_wrapper( File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 286, in _request_wrapper response = _request_wrapper( File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 310, in _request_wrapper hf_raise_for_status(response) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 459, in hf_raise_for_status raise _format(RepositoryNotFoundError, message, response) from e huggingface_hub.errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-6864df63-75fb4ac978a5fae71db084b2;077d5458-aa71-4ebe-886e-d756bdd6e584) Repository Not Found for url: https://huggingface.co/Qwen/Qwen2.5-VL-7B/resolve/main/processor_config.json. Please make sure you specified the correct `repo_id` and `repo_type`. If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/face8/works/vlm/train_vlm copy.py", line 303, in <module> main() File "/Users/face8/works/vlm/train_vlm copy.py", line 172, in main processor = AutoProcessor.from_pretrained(args.model_name_or_path, trust_remote_code=True) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/models/auto/processing_auto.py", line 286, in from_pretrained processor_config_file = cached_file(pretrained_model_name_or_path, PROCESSOR_NAME, **cached_file_kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/utils/hub.py", line 312, in cached_file file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs) File "/Users/face8/miniconda3/envs/vlm/lib/python3.9/site-packages/transformers/utils/hub.py", line 502, in cached_files raise OSError( OSError: Qwen/Qwen2.5-VL-7B is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
07-03
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值