Head First 软件开发(Software Development) 1-3 Software, Requirement, Planning

本文详细介绍了一种敏捷开发方法,包括大脑学习规则、软件开发过程、需求收集、项目规划等内容。强调了与客户密切合作的重要性,如何有效管理需求变更,以及如何进行精确的时间估算。

大脑学习规则 

大脑总是渴求新奇的事物, 搜索和等待不平常的事情出现.

1)理解越多, 需要记忆的越少 2)多做练习和笔记 3)阅读问题,解答问题 4)睡前阅读,消化处理 5)多喝水补充水分 6)讨论和谈论,解释给他人 7)注意大脑运转效率 8)感受事物,融入情景 9)Coding exercise 

Codes: http://www.headfirstlabs.com/books/hfsd/ 

---大脑学习规则 End---


Section1 Software Development

让客户满意  花多少钱, 花多长时间

Wrong (X)BigBang -> going dark

Correct 不确定用户想要什么, 问客户(即使知道也需要确认)

常常在客户大脑中并没有正确答案, 需要和客户沟通,确认细节, 将可以实现的实施方式表达给客户, 让客户决定, 需要让客户融入进开发过程,不能猜测

Keys >What is needed(requirement) >On Time(schedule) >On Budget(consuming control)

Iteration/Sprint(about 20d) Usually check with customs, demo, discuss about the draft ideas. 

Adjust the functions according to custom's new thoughts.

Keep updating in the process of development wuth clients.

Show functions to the client Implement the part of the software, do presentation and demo to client, listen to the opinion

What to be done  Each Iteration is a tini projects, finish some tasks. Each Iteration need RDCT: requirement, design, coding, testing

After each Iteration, the product is getting bigger and closer to the prefect. Set priority for each tasks and set timeline for them. Do the estimation for each task

Make the related tasks(have dependency) in same iteration

New tasks Custom added new idea during the develop process. Estimation and rearrange the priority for the tasks, put the low priority tasks in the end of the iterations

Caculate the time, remove the un-necessary tasks/functions, discussing by custom

>To deliver the product in time and match the requirement. "shoehorn" new tasks come late in the end of the iterations, you have to explain to custom: plan, resource, 

why you can't make it in time, you need more iterations or resources(devs)

Summary

>开发技术 开发循环帮助你保持在正确轨道, 需求变化时. 重新规划和均衡开发循环. 每个开发循环都产生有效的软件, 从客户那里收集反馈意见.

>开发原则 交付客户需要的软件, 按时交付, 符合预算.

>本章要点 每个开发循环获得的客户意见, 保证软件与客户需求一致的最好手段. 开发循环是完整项目的缩影.  

软件需要开发循环经常从客户那得到意见. 预订时间和预算内交付软件. 有些功能正常工作的软件比功能很多但不能正常使用的软件好. 好的开发人员开发软件, 伟大的开发人员交付软件.

Process, Customer, Ship, Released, Deadline, Overbudget, On time, Drawing board

---Section1 End---


Section 2 Collect Requirement

使用情节 User Stories 头脑风暴 Brain Storming 估计游戏 Estimation Game

Brain Storming >索引卡 每一张卡片捕捉一个软件系统需要提供的事情, 单一的

>Meeting BlueSky 每个人平等的发表意见, 尝试考虑额外需求

如果BrainStroming当时没有效果, 可以分组进行(开发和客户...), 再集中起来, 把IDEA写上白板(散会休息思考再开第二次)

理解客户的方式 >角色扮演 Role Playing 演示软件, 扮演软件, 客户提出要求, 在需求卡片上记录软件需要做的事情

>观察 Observation 多个观察者观察客户的工作流程以及方式

User Stories(US) 站在客户的角度, 你和客户都能理解

>YES -描述软件为客户做的一件事, 从用户的观点角度 -用客户能理解的语言撰写 -源自客户 -简短

>NOT -长篇大论 -使用客户不熟悉的技术术语 -提及特别的技术 

区分UserStory和DesignDecision, 询问, 提问, 修改, 保证所有的事情都讨论清晰

>标题栏 方便寻找参考 >如果有技术性的决策需要增加到US, 写在另外一组需求卡片上, 用标题交叉参考

>将需求提炼成US >项目过程中继续提炼和捕捉新的需求, 加入Iteration, 对每个US增加时间估计(设计, 编码, 测试, 交付), 然后总的时间估计

>获得可信的时间估计: 排除假设

扑克牌游戏 估计值是整个US的, 不是其中的一部分

>把US放在桌子中间 >每个人发13张牌, 每张牌上有一个估计值 0 1/2 1 2 3 5 8 13 20 40 100 ? TeaBreak

>每个人选择估计值, 把牌反过来凡在桌上 >一起吧牌反过来 >询问差距大的, 弄清假设, 在相似分布区间找到合理估计值, 讨论及统一理解, 在卡片反面记录

验证假设 不要轻信假设, 弄清那些是不知道的, 保留假设, 可以延迟低优先级的US, 直到假设得以澄清

在Estimation期间的目标是通过客户澄清假设, 排除尽可能多的假设, 任何残存下来的假设都是风险

珍惜客户的时间 组织好时间, 有技巧的提问 排除假设的会议 Assumption-busting session. 试着一次收集好所有的假设, 集中全面的澄清

>可以就新的澄清事项进行新一轮的扑克游戏

没有愚蠢的问题 40, 表明这个US包含许多工作. 100, 有误解或者错误, 可以把US分解成更小的, 方便估计的US. ?, 没有信心, 误解, 不确定

稀奇古怪的数字, 询问沟通, 是否有遗漏的信息. 私下沟通, 询问理由. 估计值应该是整个团队统一的, 整个团队有信心说满足团队中任何一个人去开发所需要的时间

Estiamtion包括文档 测试报告 Packaging 部署Deployment

>不要对假设再做假设, 一切都应该讨论清晰

>US越长, 估计越容易不准确 当US超过15天, 1) 把US分解为多个更容易估计的US, find out 'AND' of the US. 2) 与客户交流沟通, 澄清一些假设, 减小估计值

目标-收敛 得到每个US的估计值, 对估计值的信心, 排除假设, 把估计值的分布收敛在一个点

1) 与客户交流, 排除假设, 误解 2) 进行计划扑克牌游戏, 处理每个US, 挖出任何的隐含假设

3) 利用扑克牌游戏的结果, 了解团队成员是否理解US, 哪些地方需要澄清 4) 达成一致(估计值都接近, 可以相信的估计值)

>对假设采取 zero-tolerance attitude, 对于暂时无法确定的假设, 标注并且跟踪这些风险

估计值是对客户的承诺 说明团队需要多久来交付软件 >客户应该只是接收到你的问题, 看到US, 估计值应该由团队给出

1) 捕捉基本想法 2) 共筑远景的头脑风暴 3) 构建US 4) 在澄清中发现漏洞 5) 精炼US 6) 扑克游戏 7) 从客户反馈的到丢失的信息, 分解大的US [->4)] 8) 收敛估计值

估计项目 每个US增加估计值(开发功能的时间), 相加得到总估计值(交付软件的时间)

Summary

>开发技术 共筑远景, 观察, 角色扮演, US, 扑克游戏, 估计值

>开发原则 帮助客户确定他们想要什么, 需求总是面向客户, 与客户一起反复发掘提炼需求

>本章要点 与客户一起讨论可以是客户考虑更周全. US应该从客户观点出发, 一个循环完成, 语句简短, <3句, 是可估计的, 应当小于15天/人. 与客户多交流需求, 使客户参与其中

BlueSky, Assumptions, Technical, PlanningPoker, Observation, Convergence, QuestionMark, Spread, RolePlay, Cofident, Consensus 

---Section2 End---


Section 3 Project Planning

tip: 人/天 公历日

与客户一起确定优先级 等级由客户决定, 提出专业建议帮助

Mailstone 1.0 向客户发布的第一个主要版本

在系统的功能和客户的渴望之间取得平衡 推迟一些功能到2.0, 3.0. 1.0功能满足客户最重要的需求. 暂时不担心时间, figure out what is the most important firstly

将US按照优先级排列, 由客户选择需要在1.0完成, 如果功能多过开发时间安排, 重新排列优先级.

1) 缩减更多的功能, 去掉没有绝对必要的US 2) 尽早交付M1.0 3) 关注基本功能

Mailstone 发布给客户 Version 功能更新

>经过计算, 无法在预订时间内交付客户需求的全部功能, 可能要放弃这个项目, 可以增加开发人员, 这样会加大成本

增加开发人员, 新增人员需要理解软件,需求,技术, 安装开发环境, 需要时间融入, 所以并不是100%的立即生产力.

并非开发越多越好, 人数越多, 所需要的沟通通道Communication Path消耗越大, 可以使用60%的计算量, 其中包括了沟通学习会议等消耗

>扮演客户 给M1.0中的US分配优先级 10 20 ...50(偶尔可以有例如25), 其中没有可舍弃的US, 优先级代表开发次序

和客户讨论优先级, 解释功能之间的关联性, 由客户决定级别

>每个Iteration结束, Demo and get feedback from customer, 保证软件连续不断的构建并且可运行Continuous Build

Iteration保持简短 循环越短, 越有机会处理变更和细节问题

Iteration保持均衡 处理需求变更, 增加功能, 发现漏洞, 开发真实情况

将计划内的事情考虑进Iteration: Document, Vacation, Software/Binary/3rdParty Upgrating

Velocity 时间效率值 0.6 可以在Iteration迭代过程中修改

SWD × Workday × veloctiy = Real Workload (per Iteration/Sprint) × number = Total time

将多出预算时间, 优先级低的US放入Backlog, 和客户沟通, 哪些会推迟完成

Solution 1) 为Milestone 1.0增加一个循环, 完成Backlog内的US 2) 时间上的推迟, 其他US需要在下一个Milestone发布 3) 将时间计算过程解释给客户, 推迟某些US的理由, 交付产品的时间限制

不要在计划内挤时间, 如果时间内剩余, 可以选择简单的US插入Milestone

>对许诺需要信心, 谨慎承诺, 成功交付, 不可过度承诺导致失败

SW Dashboard 白板: US - In progress - Done, Burndown map: x-day, y-workload(h)

个人生活很重要: 快乐的团队有生产力, 避免加班

疲劳影响生产力: SWD每天只有三小时最有生产力

>要点 客户按照需求排US优先级, Milestone 1.0尽早交付, 开发循环尽量短, 时间不够时请客户再次排优先级, 时间效率值, 向客户解释时间需求, 设定开发计划表

Summary

>开发技术 短循环, 时间效率, 使用白板计划和监控项目, 选择Milestone 1.0的US, 安排开发循环的US, 征求客户意见

>开发原则 短循环可管理, 由客户决定哪些US在循环和Milestone中完成, 谨慎承诺, 成功交付, 诚信于客户

>本章要点 客户决定优先级, 短循环在一个月内, 软件的可构建可运行, 计算和调整时间效率, 规划切实可行的Milestone 1.0. 

Feedback, Suprised Event, Velocity, Priority/Prioritizing, Burndown, Honest, Baseline, Runnable

---Section3 End---

我想在UR5e上面复现github上的这个代码,但我不知道怎么开始。包括配置中控之类的,请你把我当成一个小白来详细教我。# Diffusion Policy [[Project page]](https://diffusion-policy.cs.columbia.edu/) [[Paper]](https://diffusion-policy.cs.columbia.edu/#paper) [[Data]](https://diffusion-policy.cs.columbia.edu/data/) [[Colab (state)]](https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing) [[Colab (vision)]](https://colab.research.google.com/drive/18GIHeOQ5DyjMN8iIRZL2EKZ0745NLIpg?usp=sharing) [Cheng Chi](http://cheng-chi.github.io/)<sup>1</sup>, [Siyuan Feng](https://www.cs.cmu.edu/~sfeng/)<sup>2</sup>, [Yilun Du](https://yilundu.github.io/)<sup>3</sup>, [Zhenjia Xu](https://www.zhenjiaxu.com/)<sup>1</sup>, [Eric Cousineau](https://www.eacousineau.com/)<sup>2</sup>, [Benjamin Burchfiel](http://www.benburchfiel.com/)<sup>2</sup>, [Shuran Song](https://www.cs.columbia.edu/~shurans/)<sup>1</sup> <sup>1</sup>Columbia University, <sup>2</sup>Toyota Research Institute, <sup>3</sup>MIT <img src="media/teaser.png" alt="drawing" width="100%"/> <img src="media/multimodal_sim.png" alt="drawing" width="100%"/> ## 🛝 Try it out! Our self-contained Google Colab notebooks is the easiest way to play with Diffusion Policy. We provide separate notebooks for [state-based environment](https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing) and [vision-based environment](https://colab.research.google.com/drive/18GIHeOQ5DyjMN8iIRZL2EKZ0745NLIpg?usp=sharing). ## 🧾 Checkout our experiment logs! For each experiment used to generate Table I,II and IV in the [paper](https://diffusion-policy.cs.columbia.edu/#paper), we provide: 1. A `config.yaml` that contains all parameters needed to reproduce the experiment. 2. Detailed training/eval `logs.json.txt` for every training step. 3. Checkpoints for the best `epoch=*-test_mean_score=*.ckpt` and last `latest.ckpt` epoch of each run. Experiment logs are hosted on our website as nested directories in format: `https://diffusion-policy.cs.columbia.edu/data/experiments/<image|low_dim>/<task>/<method>/` Within each experiment directory you may find: ``` . ├── config.yaml ├── metrics │   └── logs.json.txt ├── train_0 │   ├── checkpoints │   │   ├── epoch=0300-test_mean_score=1.000.ckpt │   │   └── latest.ckpt │   └── logs.json.txt ├── train_1 │   ├── checkpoints │   │   ├── epoch=0250-test_mean_score=1.000.ckpt │   │   └── latest.ckpt │   └── logs.json.txt └── train_2 ├── checkpoints │   ├── epoch=0250-test_mean_score=1.000.ckpt │   └── latest.ckpt └── logs.json.txt ``` The `metrics/logs.json.txt` file aggregates evaluation metrics from all 3 training runs every 50 epochs using `multirun_metrics.py`. The numbers reported in the paper correspond to `max` and `k_min_train_loss` aggregation keys. To download all files in a subdirectory, use: ```console $ wget --recursive --no-parent --no-host-directories --relative --reject="index.html*" https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/square_ph/diffusion_policy_cnn/ ``` ## 🛠️ Installation ### 🖥️ Simulation To reproduce our simulation benchmark results, install our conda environment on a Linux machine with Nvidia GPU. On Ubuntu 20.04 you need to install the following apt packages for mujoco: ```console $ sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf ``` We recommend [Mambaforge](https://github.com/conda-forge/miniforge#mambaforge) instead of the standard anaconda distribution for faster installation: ```console $ mamba env create -f conda_environment.yaml ``` but you can use conda as well: ```console $ conda env create -f conda_environment.yaml ``` The `conda_environment_macos.yaml` file is only for development on MacOS and does not have full support for benchmarks. ### 🦾 Real Robot Hardware (for Push-T): * 1x [UR5-CB3](https://www.universal-robots.com/cb3) or [UR5e](https://www.universal-robots.com/products/ur5-robot/) ([RTDE Interface](https://www.universal-robots.com/articles/ur/interface-communication/real-time-data-exchange-rtde-guide/) is required) * 2x [RealSense D415](https://www.intelrealsense.com/depth-camera-d415/) * 1x [3Dconnexion SpaceMouse](https://3dconnexion.com/us/product/spacemouse-wireless/) (for teleop) * 1x [Millibar Robotics Manual Tool Changer](https://www.millibar.com/manual-tool-changer/) (only need robot side) * 1x 3D printed [End effector](https://cad.onshape.com/documents/a818888644a15afa6cc68ee5/w/2885b48b018cda84f425beca/e/3e8771c2124cee024edd2fed?renderMode=0&uiState=63ffcba6631ca919895e64e5) * 1x 3D printed [T-block](https://cad.onshape.com/documents/f1140134e38f6ed6902648d5/w/a78cf81827600e4ff4058d03/e/f35f57fb7589f72e05c76caf?renderMode=0&uiState=63ffcbc9af4a881b344898ee) * USB-C cables and screws for RealSense Software: * Ubuntu 20.04.3 (tested) * Mujoco dependencies: `sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf` * [RealSense SDK](https://github.com/IntelRealSense/librealsense/blob/master/doc/distribution_linux.md) * Spacemouse dependencies: `sudo apt install libspnav-dev spacenavd; sudo systemctl start spacenavd` * Conda environment `mamba env create -f conda_environment_real.yaml` ## 🖥️ Reproducing Simulation Benchmark Results ### Download Training Data Under the repo root, create data subdirectory: ```console [diffusion_policy]$ mkdir data && cd data ``` Download the corresponding zip file from [https://diffusion-policy.cs.columbia.edu/data/training/](https://diffusion-policy.cs.columbia.edu/data/training/) ```console [data]$ wget https://diffusion-policy.cs.columbia.edu/data/training/pusht.zip ``` Extract training data: ```console [data]$ unzip pusht.zip && rm -f pusht.zip && cd .. ``` Grab config file for the corresponding experiment: ```console [diffusion_policy]$ wget -O image_pusht_diffusion_policy_cnn.yaml https://diffusion-policy.cs.columbia.edu/data/experiments/image/pusht/diffusion_policy_cnn/config.yaml ``` ### Running for a single seed Activate conda environment and login to [wandb](https://wandb.ai) (if you haven&#39;t already). ```console [diffusion_policy]$ conda activate robodiff (robodiff)[diffusion_policy]$ wandb login ``` Launch training with seed 42 on GPU 0. ```console (robodiff)[diffusion_policy]$ python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml training.seed=42 training.device=cuda:0 hydra.run.dir=&#39;data/outputs/${now:%Y.%m.%d}/${now:%H.%M.%S}_${name}_${task_name}&#39; ``` This will create a directory in format `data/outputs/yyyy.mm.dd/hh.mm.ss_<method_name>_<task_name>` where configs, logs and checkpoints are written to. The policy will be evaluated every 50 epochs with the success rate logged as `test/mean_score` on wandb, as well as videos for some rollouts. ```console (robodiff)[diffusion_policy]$ tree data/outputs/2023.03.01/20.02.03_train_diffusion_unet_hybrid_pusht_image -I wandb data/outputs/2023.03.01/20.02.03_train_diffusion_unet_hybrid_pusht_image ├── checkpoints │ ├── epoch=0000-test_mean_score=0.134.ckpt │ └── latest.ckpt ├── .hydra │ ├── config.yaml │ ├── hydra.yaml │ └── overrides.yaml ├── logs.json.txt ├── media │ ├── 2k5u6wli.mp4 │ ├── 2kvovxms.mp4 │ ├── 2pxd9f6b.mp4 │ ├── 2q5gjt5f.mp4 │ ├── 2sawbf6m.mp4 │ └── 538ubl79.mp4 └── train.log 3 directories, 13 files ``` ### Running for multiple seeds Launch local ray cluster. For large scale experiments, you might want to setup an [AWS cluster with autoscaling](https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/aws.html). All other commands remain the same. ```console (robodiff)[diffusion_policy]$ export CUDA_VISIBLE_DEVICES=0,1,2 # select GPUs to be managed by the ray cluster (robodiff)[diffusion_policy]$ ray start --head --num-gpus=3 ``` Launch a ray client which will start 3 training workers (3 seeds) and 1 metrics monitor worker. ```console (robodiff)[diffusion_policy]$ python ray_train_multirun.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml --seeds=42,43,44 --monitor_key=test/mean_score -- multi_run.run_dir=&#39;data/outputs/${now:%Y.%m.%d}/${now:%H.%M.%S}_${name}_${task_name}&#39; multi_run.wandb_name_base=&#39;${now:%Y.%m.%d-%H.%M.%S}_${name}_${task_name}&#39; ``` In addition to the wandb log written by each training worker individually, the metrics monitor worker will log to wandb project `diffusion_policy_metrics` for the metrics aggregated from all 3 training runs. Local config, logs and checkpoints will be written to `data/outputs/yyyy.mm.dd/hh.mm.ss_<method_name>_<task_name>` in a directory structure identical to our [training logs](https://diffusion-policy.cs.columbia.edu/data/experiments/): ```console (robodiff)[diffusion_policy]$ tree data/outputs/2023.03.01/22.13.58_train_diffusion_unet_hybrid_pusht_image -I &#39;wandb|media&#39; data/outputs/2023.03.01/22.13.58_train_diffusion_unet_hybrid_pusht_image ├── config.yaml ├── metrics │ ├── logs.json.txt │ ├── metrics.json │ └── metrics.log ├── train_0 │ ├── checkpoints │ │ ├── epoch=0000-test_mean_score=0.174.ckpt │ │ └── latest.ckpt │ ├── logs.json.txt │ └── train.log ├── train_1 │ ├── checkpoints │ │ ├── epoch=0000-test_mean_score=0.131.ckpt │ │ └── latest.ckpt │ ├── logs.json.txt │ └── train.log └── train_2 ├── checkpoints │ ├── epoch=0000-test_mean_score=0.105.ckpt │ └── latest.ckpt ├── logs.json.txt └── train.log 7 directories, 16 files ``` ### 🆕 Evaluate Pre-trained Checkpoints Download a checkpoint from the published training log folders, such as [https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/pusht/diffusion_policy_cnn/train_0/checkpoints/epoch=0550-test_mean_score=0.969.ckpt](https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/pusht/diffusion_policy_cnn/train_0/checkpoints/epoch=0550-test_mean_score=0.969.ckpt). Run the evaluation script: ```console (robodiff)[diffusion_policy]$ python eval.py --checkpoint data/0550-test_mean_score=0.969.ckpt --output_dir data/pusht_eval_output --device cuda:0 ``` This will generate the following directory structure: ```console (robodiff)[diffusion_policy]$ tree data/pusht_eval_output data/pusht_eval_output ├── eval_log.json └── media ├── 1fxtno84.mp4 ├── 224l7jqd.mp4 ├── 2fo4btlf.mp4 ├── 2in4cn7a.mp4 ├── 34b3o2qq.mp4 └── 3p7jqn32.mp4 1 directory, 7 files ``` `eval_log.json` contains metrics that is logged to wandb during training: ```console (robodiff)[diffusion_policy]$ cat data/pusht_eval_output/eval_log.json { "test/mean_score": 0.9150393806777066, "test/sim_max_reward_4300000": 1.0, "test/sim_max_reward_4300001": 0.9872969750774386, ... "train/sim_video_1": "data/pusht_eval_output//media/2fo4btlf.mp4" } ``` ## 🦾 Demo, Training and Eval on a Real Robot Make sure your UR5 robot is running and accepting command from its network interface (emergency stop button within reach at all time), your RealSense cameras plugged in to your workstation (tested with `realsense-viewer`) and your SpaceMouse connected with the `spacenavd` daemon running (verify with `systemctl status spacenavd`). Start the demonstration collection script. Press "C" to start recording. Use SpaceMouse to move the robot. Press "S" to stop recording. ```console (robodiff)[diffusion_policy]$ python demo_real_robot.py -o data/demo_pusht_real --robot_ip 192.168.0.204 ``` This should result in a demonstration dataset in `data/demo_pusht_real` with in the same structure as our example [real Push-T training dataset](https://diffusion-policy.cs.columbia.edu/data/training/pusht_real.zip). To train a Diffusion Policy, launch training with config: ```console (robodiff)[diffusion_policy]$ python train.py --config-name=train_diffusion_unet_real_image_workspace task.dataset_path=data/demo_pusht_real ``` Edit [`diffusion_policy/config/task/real_pusht_image.yaml`](./diffusion_policy/config/task/real_pusht_image.yaml) if your camera setup is different. Assuming the training has finished and you have a checkpoint at `data/outputs/blah/checkpoints/latest.ckpt`, launch the evaluation script with: ```console python eval_real_robot.py -i data/outputs/blah/checkpoints/latest.ckpt -o data/eval_pusht_real --robot_ip 192.168.0.204 ``` Press "C" to start evaluation (handing control over to the policy). Press "S" to stop the current episode. ## 🗺️ Codebase Tutorial This codebase is structured under the requirement that: 1. implementing `N` tasks and `M` methods will only require `O(N+M)` amount of code instead of `O(N*M)` 2. while retaining maximum flexibility. To achieve this requirement, we 1. maintained a simple unified interface between tasks and methods and 2. made the implementation of the tasks and the methods independent of each other. These design decisions come at the cost of code repetition between the tasks and the methods. However, we believe that the benefit of being able to add/modify task/methods without affecting the remainder and being able understand a task/method by reading the code linearly outweighs the cost of copying and pasting 😊. ### The Split On the task side, we have: * `Dataset`: adapts a (third-party) dataset to the interface. * `EnvRunner`: executes a `Policy` that accepts the interface and produce logs and metrics. * `config/task/<task_name>.yaml`: contains all information needed to construct `Dataset` and `EnvRunner`. * (optional) `Env`: an `gym==0.21.0` compatible class that encapsulates the task environment. On the policy side, we have: * `Policy`: implements inference according to the interface and part of the training process. * `Workspace`: manages the life-cycle of training and evaluation (interleaved) of a method. * `config/<workspace_name>.yaml`: contains all information needed to construct `Policy` and `Workspace`. ### The Interface #### Low Dim A [`LowdimPolicy`](./diffusion_policy/policy/base_lowdim_policy.py) takes observation dictionary: - `"obs":` Tensor of shape `(B,To,Do)` and predicts action dictionary: - `"action": ` Tensor of shape `(B,Ta,Da)` A [`LowdimDataset`](./diffusion_policy/dataset/base_dataset.py) returns a sample of dictionary: - `"obs":` Tensor of shape `(To, Do)` - `"action":` Tensor of shape `(Ta, Da)` Its `get_normalizer` method returns a [`LinearNormalizer`](./diffusion_policy/model/common/normalizer.py) with keys `"obs","action"`. The `Policy` handles normalization on GPU with its copy of the `LinearNormalizer`. The parameters of the `LinearNormalizer` is saved as part of the `Policy`&#39;s weights checkpoint. #### Image A [`ImagePolicy`](./diffusion_policy/policy/base_image_policy.py) takes observation dictionary: - `"key0":` Tensor of shape `(B,To,*)` - `"key1":` Tensor of shape e.g. `(B,To,H,W,3)` ([0,1] float32) and predicts action dictionary: - `"action": ` Tensor of shape `(B,Ta,Da)` A [`ImageDataset`](./diffusion_policy/dataset/base_dataset.py) returns a sample of dictionary: - `"obs":` Dict of - `"key0":` Tensor of shape `(To, *)` - `"key1":` Tensor fo shape `(To,H,W,3)` - `"action":` Tensor of shape `(Ta, Da)` Its `get_normalizer` method returns a [`LinearNormalizer`](./diffusion_policy/model/common/normalizer.py) with keys `"key0","key1","action"`. #### Example ``` To = 3 Ta = 4 T = 6 |o|o|o| | | |a|a|a|a| |o|o| | |a|a|a|a|a| | | | | |a|a| ``` Terminology in the paper: `varname` in the codebase - Observation Horizon: `To|n_obs_steps` - Action Horizon: `Ta|n_action_steps` - Prediction Horizon: `T|horizon` The classical (e.g. MDP) single step observation/action formulation is included as a special case where `To=1` and `Ta=1`. ## 🔩 Key Components ### `Workspace` A `Workspace` object encapsulates all states and code needed to run an experiment. * Inherits from [`BaseWorkspace`](./diffusion_policy/workspace/base_workspace.py). * A single `OmegaConf` config object generated by `hydra` should contain all information needed to construct the Workspace object and running experiments. This config correspond to `config/<workspace_name>.yaml` + hydra overrides. * The `run` method contains the entire pipeline for the experiment. * Checkpoints happen at the `Workspace` level. All training states implemented as object attributes are automatically saved by the `save_checkpoint` method. * All other states for the experiment should be implemented as local variables in the `run` method. The entrypoint for training is `train.py` which uses `@hydra.main` decorator. Read [hydra](https://hydra.cc/)&#39;s official documentation for command line arguments and config overrides. For example, the argument `task=<task_name>` will replace the `task` subtree of the config with the content of `config/task/<task_name>.yaml`, thereby selecting the task to run for this experiment. ### `Dataset` A `Dataset` object: * Inherits from `torch.utils.data.Dataset`. * Returns a sample conforming to [the interface](#the-interface) depending on whether the task has Low Dim or Image observations. * Has a method `get_normalizer` that returns a `LinearNormalizer` conforming to [the interface](#the-interface). Normalization is a very common source of bugs during project development. It is sometimes helpful to print out the specific `scale` and `bias` vectors used for each key in the `LinearNormalizer`. Most of our implementations of `Dataset` uses a combination of [`ReplayBuffer`](#replaybuffer) and [`SequenceSampler`](./diffusion_policy/common/sampler.py) to generate samples. Correctly handling padding at the beginning and the end of each demonstration episode according to `To` and `Ta` is important for good performance. Please read our [`SequenceSampler`](./diffusion_policy/common/sampler.py) before implementing your own sampling method. ### `Policy` A `Policy` object: * Inherits from `BaseLowdimPolicy` or `BaseImagePolicy`. * Has a method `predict_action` that given observation dict, predicts actions conforming to [the interface](#the-interface). * Has a method `set_normalizer` that takes in a `LinearNormalizer` and handles observation/action normalization internally in the policy. * (optional) Might has a method `compute_loss` that takes in a batch and returns the loss to be optimized. * (optional) Usually each `Policy` class correspond to a `Workspace` class due to the differences of training and evaluation process between methods. ### `EnvRunner` A `EnvRunner` object abstracts away the subtle differences between different task environments. * Has a method `run` that takes a `Policy` object for evaluation, and returns a dict of logs and metrics. Each value should be compatible with `wandb.log`. To maximize evaluation speed, we usually vectorize environments using our modification of [`gym.vector.AsyncVectorEnv`](./diffusion_policy/gym_util/async_vector_env.py) which runs each individual environment in a separate process (workaround python GIL). ⚠️ Since subprocesses are launched using `fork` on linux, you need to be specially careful for environments that creates its OpenGL context during initialization (e.g. robosuite) which, once inherited by the child process memory space, often causes obscure bugs like segmentation fault. As a workaround, you can provide a `dummy_env_fn` that constructs an environment without initializing OpenGL. ### `ReplayBuffer` The [`ReplayBuffer`](./diffusion_policy/common/replay_buffer.py) is a key data structure for storing a demonstration dataset both in-memory and on-disk with chunking and compression. It makes heavy use of the [`zarr`](https://zarr.readthedocs.io/en/stable/index.html) format but also has a `numpy` backend for lower access overhead. On disk, it can be stored as a nested directory (e.g. `data/pusht_cchi_v7_replay.zarr`) or a zip file (e.g. `data/robomimic/datasets/square/mh/image_abs.hdf5.zarr.zip`). Due to the relative small size of our datasets, it&#39;s often possible to store the entire image-based dataset in RAM with [`Jpeg2000` compression](./diffusion_policy/codecs/imagecodecs_numcodecs.py) which eliminates disk IO during training at the expense increasing of CPU workload. Example: ``` data/pusht_cchi_v7_replay.zarr ├── data │ ├── action (25650, 2) float32 │ ├── img (25650, 96, 96, 3) float32 │ ├── keypoint (25650, 9, 2) float32 │ ├── n_contacts (25650, 1) float32 │ └── state (25650, 5) float32 └── meta └── episode_ends (206,) int64 ``` Each array in `data` stores one data field from all episodes concatenated along the first dimension (time). The `meta/episode_ends` array stores the end index for each episode along the fist dimension. ### `SharedMemoryRingBuffer` The [`SharedMemoryRingBuffer`](./diffusion_policy/shared_memory/shared_memory_ring_buffer.py) is a lock-free FILO data structure used extensively in our [real robot implementation](./diffusion_policy/real_world) to utilize multiple CPU cores while avoiding pickle serialization and locking overhead for `multiprocessing.Queue`. As an example, we would like to get the most recent `To` frames from 5 RealSense cameras. We launch 1 realsense SDK/pipeline per process using [`SingleRealsense`](./diffusion_policy/real_world/single_realsense.py), each continuously writes the captured images into a `SharedMemoryRingBuffer` shared with the main process. We can very quickly get the last `To` frames in the main process due to the FILO nature of `SharedMemoryRingBuffer`. We also implemented [`SharedMemoryQueue`](./diffusion_policy/shared_memory/shared_memory_queue.py) for FIFO, which is used in [`RTDEInterpolationController`](./diffusion_policy/real_world/rtde_interpolation_controller.py). ### `RealEnv` In contrast to [OpenAI Gym](https://gymnasium.farama.org/), our polices interact with the environment asynchronously. In [`RealEnv`](./diffusion_policy/real_world/real_env.py), the `step` method in `gym` is split into two methods: `get_obs` and `exec_actions`. The `get_obs` method returns the latest observation from `SharedMemoryRingBuffer` as well as their corresponding timestamps. This method can be call at any time during an evaluation episode. The `exec_actions` method accepts a sequence of actions and timestamps for the expected time of execution for each step. Once called, the actions are simply enqueued to the `RTDEInterpolationController`, and the method returns without blocking for execution. ## 🩹 Adding a Task Read and imitate: * `diffusion_policy/dataset/pusht_image_dataset.py` * `diffusion_policy/env_runner/pusht_image_runner.py` * `diffusion_policy/config/task/pusht_image.yaml` Make sure that `shape_meta` correspond to input and output shapes for your task. Make sure `env_runner._target_` and `dataset._target_` point to the new classes you have added. When training, add `task=<your_task_name>` to `train.py`&#39;s arguments. ## 🩹 Adding a Method Read and imitate: * `diffusion_policy/workspace/train_diffusion_unet_image_workspace.py` * `diffusion_policy/policy/diffusion_unet_image_policy.py` * `diffusion_policy/config/train_diffusion_unet_image_workspace.yaml` Make sure your workspace yaml&#39;s `_target_` points to the new workspace class you created. ## 🏷️ License This repository is released under the MIT license. See [LICENSE](LICENSE) for additional details. ## 🙏 Acknowledgement * Our [`ConditionalUnet1D`](./diffusion_policy/model/diffusion/conditional_unet1d.py) implementation is adapted from [Planning with Diffusion](https://github.com/jannerm/diffuser). * Our [`TransformerForDiffusion`](./diffusion_policy/model/diffusion/transformer_for_diffusion.py) implementation is adapted from [MinGPT](https://github.com/karpathy/minGPT). * The [BET](./diffusion_policy/model/bet) baseline is adapted from [its original repo](https://github.com/notmahi/bet). * The [IBC](./diffusion_policy/policy/ibc_dfo_lowdim_policy.py) baseline is adapted from [Kevin Zakka&#39;s reimplementation](https://github.com/kevinzakka/ibc). * The [Robomimic](https://github.com/ARISE-Initiative/robomimic) tasks and [`ObservationEncoder`](https://github.com/ARISE-Initiative/robomimic/blob/master/robomimic/models/obs_nets.py) are used extensively in this project. * The [Push-T](./diffusion_policy/env/pusht) task is adapted from [IBC](https://github.com/google-research/ibc). * The [Block Pushing](./diffusion_policy/env/block_pushing) task is adapted from [BET](https://github.com/notmahi/bet) and [IBC](https://github.com/google-research/ibc). * The [Kitchen](./diffusion_policy/env/kitchen) task is adapted from [BET](https://github.com/notmahi/bet) and [Relay Policy Learning](https://github.com/google-research/relay-policy-learning). * Our [shared_memory](./diffusion_policy/shared_memory) data structures are heavily inspired by [shared-ndarray2](https://gitlab.com/osu-nrsg/shared-ndarray2).
最新发布
06-29
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值