有关padding和pitch

本文深入探讨了CUDA中Padding和Pitch的概念与作用,详细解释了Padding如何通过填充确保二维数组对齐,以及Pitch如何作为二维数组的领先维度来优化内存访问效率。文章还介绍了与Pitch相关的CUDA API,并通过实例说明了Padding的重要性。
部署运行你感兴趣的模型镜像

#有关padding和pitch
首先解释Padding和Pitch
##Padding:
即填充,在cuda程序中,其用在两个地方

  1. 对二维数组的逻辑行进行填充
  2. 为防止shared memory的冲突而填充shared memory

本篇只介绍第一中用法。
##Pitch(间距):
对于二维数组,c/c++使用行主导的方式,即行控制数组的访问
(如:baseAddress + N×row+col)
Pitch在这里就可以是:the leading dimension of an array A(called Ida)

##为什么要Padding
主要针对二维数组的对齐

为什么使用二维内存:
在CUDA中,二维内存在物理上也是以一维形式存储的,所以二维数组其实是逻辑上的二维数组,因为有些操作(如图片操作)与二维逻辑有关,如卷积。所以使用二维数组。

在逻辑中二维内存的访问中,为保证对齐对内存访问的对齐,我们必须保证我们的leading dimension是内存事物数(如32字节,L1:128字节)的整倍数。这样我们的内存操作就会是对齐的(这在处理跟图像行相关的问题时很有效),如31×31的二维数组,我们将其padding(填充)为32×31的二维数组(32就是pitch),定义一个add操作,第一行+1,第二行+2。。。第n行+n
如果padding后,每一行都对齐访问,内存效率是100%,但如果不处理,那么第一行是0~31,只处理0~30,而在处理第二行的时候,需要第一行剩下的那个元素,此时还需要多一个内存事物去取剩下的那一个元素,这会导致内存效率不是100%。

##pitch相关的CUDA API

申请二维内存:
cudaMallocPitch((void**)&devPtr, size_t* &pitch, size_t width*sizeof(type), size_t height);
在主机和设备之间copy二维内存:
cudaMemcpy2D(void * dst, size_t dpitch, const void * src, size_t spitch, size_t width, size_t height, enum cudaMemcpyKind kind)

##补充
内存事物的大小跟设备和起步启用一级缓存有关,而内存效率和内存事物大小和每个warp(线程束)合并的内存请求的大小和分布有关。内存效率 =
每个warp请求的内存量/实际的内存事物数×内存事物的大小

##实例
关于padding的实验及其相关内容,在下一篇实验

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch 2.5

PyTorch
Cuda

PyTorch 是一个开源的 Python 机器学习库,基于 Torch 库,底层由 C++ 实现,应用于人工智能领域,如计算机视觉和自然语言处理

<template> <div class="container"> <el-input v-model="text" type="textarea" :rows="5" placeholder="输入要转换的文本" /> <el-select v-model="selectedVoice" placeholder="选择发音人" class="mt-20" > <el-option v-for="voice in voices" :key="voice.name" :label="voice.name" :value="voice" /> </el-select> <div class="sliders mt-20"> <div class="slider-item"> <span>语速 ({{ rate }}x)</span> <el-slider v-model="rate" :min="0.5" :max="2" :step="0.1"/> </div> <div class="slider-item"> <span>音调 ({{ pitch }})</span> <el-slider v-model="pitch" :min="0" :max="2" :step="0.1"/> </div> <div class="slider-item"> <span>音量 ({{ volume }})</span> <el-slider v-model="volume" :min="0" :max="1" :step="0.1"/> </div> </div> <el-button type="primary" class="mt-20" @click="generateSpeech" > 生成语音 </el-button> </div> </template> <script setup> import { ref, reactive, onMounted } from 'vue' const text = ref('') const selectedVoice = ref(null) const rate = ref(1) const pitch = ref(1) const volume = ref(1) const voices = ref([]) onMounted(() => { // 加载可用语音列表 speechSynthesis.addEventListener('voiceschanged', () => { voices.value = speechSynthesis.getVoices() }) }) const generateSpeech = () => { if (!text.value) return const utterance = new SpeechSynthesisUtterance(text.value) utterance.voice = selectedVoice.value utterance.rate = rate.value utterance.pitch = pitch.value utterance.volume = volume.value speechSynthesis.speak(utterance) } </script> <style scoped> .container { max-width: 800px; margin: 20px auto; padding: 20px; } .mt-20 { margin-top: 20px; } .slider-item { margin: 15px 0; } </style>生成语言按钮放在文本输入框的右下角,并且是一个三角播放按钮
03-09
import argparse from enum import Enum from typing import Iterator, List import os import cv2 import numpy as np import supervision as sv from tqdm import tqdm from ultralytics import YOLO from sports.annotators.soccer import draw_pitch, draw_points_on_pitch from sports.common.ball import BallTracker, BallAnnotator from sports.common.team import TeamClassifier from sports.common.view import ViewTransformer from sports.configs.soccer import SoccerPitchConfiguration PARENT_DIR = os.path.dirname(os.path.abspath(__file__)) PLAYER_DETECTION_MODEL_PATH = os.path.join(PARENT_DIR, 'data/football-player-detection.pt') PITCH_DETECTION_MODEL_PATH = os.path.join(PARENT_DIR, 'data/football-pitch-detection.pt') BALL_DETECTION_MODEL_PATH = os.path.join(PARENT_DIR, 'data/football-ball-detection.pt') BALL_CLASS_ID = 0 GOALKEEPER_CLASS_ID = 1 PLAYER_CLASS_ID = 2 REFEREE_CLASS_ID = 3 STRIDE = 60 CONFIG = SoccerPitchConfiguration() COLORS = ['#FF1493', '#00BFFF', '#FF6347', '#FFD700'] VERTEX_LABEL_ANNOTATOR = sv.VertexLabelAnnotator( color=[sv.Color.from_hex(color) for color in CONFIG.colors], text_color=sv.Color.from_hex('#FFFFFF'), border_radius=5, text_thickness=1, text_scale=0.5, text_padding=5, ) EDGE_ANNOTATOR = sv.EdgeAnnotator( color=sv.Color.from_hex('#FF1493'), thickness=2, edges=CONFIG.edges, ) TRIANGLE_ANNOTATOR = sv.TriangleAnnotator( color=sv.Color.from_hex('#FF1493'), base=20, height=15, ) BOX_ANNOTATOR = sv.BoxAnnotator( color=sv.ColorPalette.from_hex(COLORS), thickness=2 ) ELLIPSE_ANNOTATOR = sv.EllipseAnnotator( color=sv.ColorPalette.from_hex(COLORS), thickness=2 ) BOX_LABEL_ANNOTATOR = sv.LabelAnnotator( color=sv.ColorPalette.from_hex(COLORS), text_color=sv.Color.from_hex('#FFFFFF'), text_padding=5, text_thickness=1, ) ELLIPSE_LABEL_ANNOTATOR = sv.LabelAnnotator( color=sv.ColorPalette.from_hex(COLORS), text_color=sv.Color.from_hex('#FFFFFF'), text_padding=5, text_thickness=1, text_position=sv.Position.BOTTOM_CENTER, ) window_name = "My Window" cv2.namedWindow(window_name, cv2.WINDOW_NORMAL) # 创建可调整大小的窗口 cv2.resizeWindow(window_name, 730, 400) # 设置窗口大小为 800x600 class Mode(Enum): """ Enum class representing different modes of operation for Soccer AI video analysis. """ PITCH_DETECTION = 'PITCH_DETECTION' PLAYER_DETECTION = 'PLAYER_DETECTION' BALL_DETECTION = 'BALL_DETECTION' PLAYER_TRACKING = 'PLAYER_TRACKING' TEAM_CLASSIFICATION = 'TEAM_CLASSIFICATION' RADAR = 'RADAR' def get_crops(frame: np.ndarray, detections: sv.Detections) -> List[np.ndarray]: """ Extract crops from the frame based on detected bounding boxes. Args: frame (np.ndarray): The frame from which to extract crops. detections (sv.Detections): Detected objects with bounding boxes. Returns: List[np.ndarray]: List of cropped images. """ return [sv.crop_image(frame, xyxy) for xyxy in detections.xyxy] def resolve_goalkeepers_team_id( players: sv.Detections, players_team_id: np.array, goalkeepers: sv.Detections ) -> np.ndarray: """ Resolve the team IDs for detected goalkeepers based on the proximity to team centroids. Args: players (sv.Detections): Detections of all players. players_team_id (np.array): Array containing team IDs of detected players. goalkeepers (sv.Detections): Detections of goalkeepers. Returns: np.ndarray: Array containing team IDs for the detected goalkeepers. This function calculates the centroids of the two teams based on the positions of the players. Then, it assigns each goalkeeper to the nearest team's centroid by calculating the distance between each goalkeeper and the centroids of the two teams. """ goalkeepers_xy = goalkeepers.get_anchors_coordinates(sv.Position.BOTTOM_CENTER) players_xy = players.get_anchors_coordinates(sv.Position.BOTTOM_CENTER) team_0_centroid = players_xy[players_team_id == 0].mean(axis=0) team_1_centroid = players_xy[players_team_id == 1].mean(axis=0) goalkeepers_team_id = [] for goalkeeper_xy in goalkeepers_xy: dist_0 = np.linalg.norm(goalkeeper_xy - team_0_centroid) dist_1 = np.linalg.norm(goalkeeper_xy - team_1_centroid) goalkeepers_team_id.append(0 if dist_0 < dist_1 else 1) return np.array(goalkeepers_team_id) def render_radar( detections: sv.Detections, keypoints: sv.KeyPoints, color_lookup: np.ndarray ) -> np.ndarray: mask = (keypoints.xy[0][:, 0] > 1) & (keypoints.xy[0][:, 1] > 1) transformer = ViewTransformer( source=keypoints.xy[0][mask].astype(np.float32), target=np.array(CONFIG.vertices)[mask].astype(np.float32) ) xy = detections.get_anchors_coordinates(anchor=sv.Position.BOTTOM_CENTER) transformed_xy = transformer.transform_points(points=xy) radar = draw_pitch(config=CONFIG) radar = draw_points_on_pitch( config=CONFIG, xy=transformed_xy[color_lookup == 0], face_color=sv.Color.from_hex(COLORS[0]), radius=20, pitch=radar) radar = draw_points_on_pitch( config=CONFIG, xy=transformed_xy[color_lookup == 1], face_color=sv.Color.from_hex(COLORS[1]), radius=20, pitch=radar) radar = draw_points_on_pitch( config=CONFIG, xy=transformed_xy[color_lookup == 2], face_color=sv.Color.from_hex(COLORS[2]), radius=20, pitch=radar) radar = draw_points_on_pitch( config=CONFIG, xy=transformed_xy[color_lookup == 3], face_color=sv.Color.from_hex(COLORS[3]), radius=20, pitch=radar) return radar def run_pitch_detection(source_video_path: str, device: str) -> Iterator[np.ndarray]: """ Run pitch detection on a video and yield annotated frames. Args: source_video_path (str): Path to the source video. device (str): Device to run the model on (e.g., 'cpu', 'cuda'). Yields: Iterator[np.ndarray]: Iterator over annotated frames. """ pitch_detection_model = YOLO(PITCH_DETECTION_MODEL_PATH).to(device=device) frame_generator = sv.get_video_frames_generator(source_path=source_video_path) for frame in frame_generator: result = pitch_detection_model(frame, verbose=False)[0] keypoints = sv.KeyPoints.from_ultralytics(result) annotated_frame = frame.copy() annotated_frame = VERTEX_LABEL_ANNOTATOR.annotate( annotated_frame, keypoints, CONFIG.labels) yield annotated_frame def run_player_detection(source_video_path: str, device: str) -> Iterator[np.ndarray]: """ Run player detection on a video and yield annotated frames. Args: source_video_path (str): Path to the source video. device (str): Device to run the model on (e.g., 'cpu', 'cuda'). Yields: Iterator[np.ndarray]: Iterator over annotated frames. """ player_detection_model = YOLO(PLAYER_DETECTION_MODEL_PATH).to(device=device) frame_generator = sv.get_video_frames_generator(source_path=source_video_path) for frame in frame_generator: result = player_detection_model(frame, imgsz=1280, verbose=False)[0] detections = sv.Detections.from_ultralytics(result) annotated_frame = frame.copy() annotated_frame = BOX_ANNOTATOR.annotate(annotated_frame, detections) annotated_frame = BOX_LABEL_ANNOTATOR.annotate(annotated_frame, detections) yield annotated_frame def run_ball_detection(source_video_path: str, device: str) -> Iterator[np.ndarray]: """ Run ball detection on a video and yield annotated frames. Args: source_video_path (str): Path to the source video. device (str): Device to run the model on (e.g., 'cpu', 'cuda'). Yields: Iterator[np.ndarray]: Iterator over annotated frames. """ ball_detection_model = YOLO(BALL_DETECTION_MODEL_PATH).to(device=device) frame_generator = sv.get_video_frames_generator(source_path=source_video_path) ball_tracker = BallTracker(buffer_size=20) ball_annotator = BallAnnotator(radius=6, buffer_size=10) def callback(image_slice: np.ndarray) -> sv.Detections: result = ball_detection_model(image_slice, imgsz=640, verbose=False)[0] return sv.Detections.from_ultralytics(result) slicer = sv.InferenceSlicer( callback=callback, overlap_filter_strategy=sv.OverlapFilter.NONE, slice_wh=(640, 640), ) for frame in frame_generator: detections = slicer(frame).with_nms(threshold=0.1) detections = ball_tracker.update(detections) annotated_frame = frame.copy() annotated_frame = ball_annotator.annotate(annotated_frame, detections) yield annotated_frame def run_player_tracking(source_video_path: str, device: str) -> Iterator[np.ndarray]: """ Run player tracking on a video and yield annotated frames with tracked players. Args: source_video_path (str): Path to the source video. device (str): Device to run the model on (e.g., 'cpu', 'cuda'). Yields: Iterator[np.ndarray]: Iterator over annotated frames. """ player_detection_model = YOLO(PLAYER_DETECTION_MODEL_PATH).to(device=device) frame_generator = sv.get_video_frames_generator(source_path=source_video_path) tracker = sv.ByteTrack(minimum_consecutive_frames=3) for frame in frame_generator: result = player_detection_model(frame, imgsz=1280, verbose=False)[0] detections = sv.Detections.from_ultralytics(result) detections = tracker.update_with_detections(detections) labels = [str(tracker_id) for tracker_id in detections.tracker_id] annotated_frame = frame.copy() annotated_frame = ELLIPSE_ANNOTATOR.annotate(annotated_frame, detections) annotated_frame = ELLIPSE_LABEL_ANNOTATOR.annotate( annotated_frame, detections, labels=labels) yield annotated_frame def run_team_classification(source_video_path: str, device: str) -> Iterator[np.ndarray]: """ Run team classification on a video and yield annotated frames with team colors. Args: source_video_path (str): Path to the source video. device (str): Device to run the model on (e.g., 'cpu', 'cuda'). Yields: Iterator[np.ndarray]: Iterator over annotated frames. """ player_detection_model = YOLO(PLAYER_DETECTION_MODEL_PATH).to(device=device) frame_generator = sv.get_video_frames_generator( source_path=source_video_path, stride=STRIDE) crops = [] for frame in tqdm(frame_generator, desc='collecting crops'): result = player_detection_model(frame, imgsz=1280, verbose=False)[0] detections = sv.Detections.from_ultralytics(result) crops += get_crops(frame, detections[detections.class_id == PLAYER_CLASS_ID]) team_classifier = TeamClassifier(device=device) team_classifier.fit(crops) frame_generator = sv.get_video_frames_generator(source_path=source_video_path) tracker = sv.ByteTrack(minimum_consecutive_frames=3) for frame in frame_generator: result = player_detection_model(frame, imgsz=1280, verbose=False)[0] detections = sv.Detections.from_ultralytics(result) detections = tracker.update_with_detections(detections) players = detections[detections.class_id == PLAYER_CLASS_ID] crops = get_crops(frame, players) players_team_id = team_classifier.predict(crops) goalkeepers = detections[detections.class_id == GOALKEEPER_CLASS_ID] goalkeepers_team_id = resolve_goalkeepers_team_id( players, players_team_id, goalkeepers) referees = detections[detections.class_id == REFEREE_CLASS_ID] detections = sv.Detections.merge([players, goalkeepers, referees]) color_lookup = np.array( players_team_id.tolist() + goalkeepers_team_id.tolist() + [REFEREE_CLASS_ID] * len(referees) ) labels = [str(tracker_id) for tracker_id in detections.tracker_id] annotated_frame = frame.copy() annotated_frame = ELLIPSE_ANNOTATOR.annotate( annotated_frame, detections, custom_color_lookup=color_lookup) annotated_frame = ELLIPSE_LABEL_ANNOTATOR.annotate( annotated_frame, detections, labels, custom_color_lookup=color_lookup) yield annotated_frame def run_radar(source_video_path: str, device: str) -> Iterator[np.ndarray]: player_detection_model = YOLO(PLAYER_DETECTION_MODEL_PATH).to(device=device) pitch_detection_model = YOLO(PITCH_DETECTION_MODEL_PATH).to(device=device) frame_generator = sv.get_video_frames_generator( source_path=source_video_path, stride=STRIDE) crops = [] for frame in tqdm(frame_generator, desc='collecting crops'): result = player_detection_model(frame, imgsz=1280, verbose=False)[0] detections = sv.Detections.from_ultralytics(result) crops += get_crops(frame, detections[detections.class_id == PLAYER_CLASS_ID]) team_classifier = TeamClassifier(device=device) team_classifier.fit(crops) frame_generator = sv.get_video_frames_generator(source_path=source_video_path) tracker = sv.ByteTrack(minimum_consecutive_frames=3) for frame in frame_generator: result = pitch_detection_model(frame, verbose=False)[0] keypoints = sv.KeyPoints.from_ultralytics(result) result = player_detection_model(frame, imgsz=1280, verbose=False)[0] detections = sv.Detections.from_ultralytics(result) detections = tracker.update_with_detections(detections) players = detections[detections.class_id == PLAYER_CLASS_ID] crops = get_crops(frame, players) players_team_id = team_classifier.predict(crops) goalkeepers = detections[detections.class_id == GOALKEEPER_CLASS_ID] goalkeepers_team_id = resolve_goalkeepers_team_id( players, players_team_id, goalkeepers) referees = detections[detections.class_id == REFEREE_CLASS_ID] detections = sv.Detections.merge([players, goalkeepers, referees]) color_lookup = np.array( players_team_id.tolist() + goalkeepers_team_id.tolist() + [REFEREE_CLASS_ID] * len(referees) ) labels = [str(tracker_id) for tracker_id in detections.tracker_id] annotated_frame = frame.copy() annotated_frame = ELLIPSE_ANNOTATOR.annotate( annotated_frame, detections, custom_color_lookup=color_lookup) annotated_frame = ELLIPSE_LABEL_ANNOTATOR.annotate( annotated_frame, detections, labels, custom_color_lookup=color_lookup) h, w, _ = frame.shape radar = render_radar(detections, keypoints, color_lookup) radar = sv.resize_image(radar, (w // 2, h // 2)) radar_h, radar_w, _ = radar.shape rect = sv.Rect( x=w // 2 - radar_w // 2, y=h - radar_h, width=radar_w, height=radar_h ) annotated_frame = sv.draw_image(annotated_frame, radar, opacity=0.5, rect=rect) yield annotated_frame def main(source_video_path: str, target_video_path: str, device: str, mode: Mode) -> None: if mode == Mode.PITCH_DETECTION: frame_generator = run_pitch_detection( source_video_path=source_video_path, device=device) elif mode == Mode.PLAYER_DETECTION: frame_generator = run_player_detection( source_video_path=source_video_path, device=device) elif mode == Mode.BALL_DETECTION: frame_generator = run_ball_detection( source_video_path=source_video_path, device=device) elif mode == Mode.PLAYER_TRACKING: frame_generator = run_player_tracking( source_video_path=source_video_path, device=device) elif mode == Mode.TEAM_CLASSIFICATION: frame_generator = run_team_classification( source_video_path=source_video_path, device=device) elif mode == Mode.RADAR: frame_generator = run_radar( source_video_path=source_video_path, device=device) else: raise NotImplementedError(f"Mode {mode} is not implemented.") video_info = sv.VideoInfo.from_video_path(source_video_path) with sv.VideoSink(target_video_path, video_info) as sink: for frame in frame_generator: sink.write_frame(frame) cv2.imshow(window_name, frame) if cv2.waitKey(1) & 0xFF == ord("q"): break cv2.destroyAllWindows() if __name__ == '__main__': parser = argparse.ArgumentParser(description='') parser.add_argument('--source_video_path', type=str, required=True) parser.add_argument('--target_video_path', type=str, required=True) parser.add_argument('--device', type=str, default='cpu') parser.add_argument('--mode', type=Mode, default=Mode.PLAYER_DETECTION) args = parser.parse_args() main( source_video_path=args.source_video_path, target_video_path=args.target_video_path, device=args.device, mode=args.mode )
最新发布
11-05
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值