Decision (Episode 101)

The American Companion, Da'an (Leni Parker), was due to arrive in a Midwestern city on an official visit where Police Captain William Boone (Kevin Kilner) was prevented by the Companions from implementing all the security arrangements he requested. The alien leader arrived in a Taelon shuttle from Washington, chauffeured by Marine pilot Captain Lili Marquette (Lisa Howard). Da'an had a special announcement to make about a new venture with one of the Earth's biggest industrialists, the eccentric billionaire Jonathan Doors (David Hemblen).

The Taelon front man, FBI Agent Ronald Sandoval (Von Flores), discounted Boone's apprehensions about controlling the huge crowd and securing neighboring buildings so when Da'an arrived, Boone enlisted the help of his second in command, Lieutenant Bob Morovsky (John Evans). Despite their precautions, a shot was fired from a nearby building and Doors took the bullet meant for Da'an. Agent Sandoval used his skrill (an alien life form bioengineered by the Taelons for use as a weapon), to dissolve the building wall, revealing a lone gunman. Boone took off in hot pursuit, only to come face to face with his old war buddy, Eddie Jordan (Paul Boretski). Stunned, Boone was unable to stop Jordan's escape.

The Taelons declared Boone a hero, but Boone was more interested in finding out why his best friend turned into a murderer. Boone's conflict with Sandoval continued, even as the FBI agent extended an offer from Da'an to make him their Commander of Security and Interspecies Relations. Boone declined -- his priorities were to catch Jordan and to start a family with his beloved wife Kate (Lisa Ryder).

Jordan, it turned out, was part of an underground resistance, which became evident when he secretly met with Lili, who was also part of the group. Boone began making connections between the shooting and an organized movement against the Taelons, but suddenly his world came apart. A car bomb ended Kate's life in a horrible explosion.

Devastated by his wife's death, Boone threw himself into the investigation. Thinking it might help the case, Boone entertained Da'an's job offer. But before he accepted, Lili took him to the Flat Planet Cafe, where he met his fugitive friend Jordan and was presented with a second surprise. Jonathan Doors faked his own death, thus clearing the way to become leader of the Resistance.

Convincing Boone that Kate's murder was arranged by the Taelons, the Resistance devised a way for Boone to infiltrate the Companions. Dr. Julianne Belman (Majel Barrett Roddenberry), secretly a Resistance sympathizer, successfully implanted Boone with a re-engineered Cyber-Viral Implant (CVI). A CVI is part germ, part computer and is implanted by the Companions into the brains of their most trusted operatives. It greatly increases mental capabilities, but also alters the recipients' motivations so that their sole focus is the well-being of the Companions. Boone's re-engineered CVI gave him the enhanced mental powers without altering his motivations.

Before Boone started his new job, Eddie Jordan turned up dead. Did Sandoval kill him or was it Doors, or even the beautiful Lili Marquette? Boone doesn't know who to trust. Was one of them involved in the death of his best friend and his wife?

So Boone began his search for the truth.

<think>好的,用户想要了解如何用MetaDrive生成离线强化学习的离线数据集,需要完整的代码。首先,我需要回忆MetaDrive的基本用法,它是一个基于OpenAI Gym的自动驾驶模拟环境。离线强化学习通常需要大量状态、动作、奖励、下一状态的数据元组,可能还需要终止标志。 用户需要的是完整的代码,所以需要包括环境初始化、数据收集循环、数据存储结构等部分。首先,要导入必要的库,比如metadrive、gym、numpy,可能还需要pickle或h5py来保存数据。 接下来,初始化环境。MetaDrive的环境配置可能需要设置环境参数,比如场景数量、交通密度等。然后,创建一个数据存储的结构,通常是列表或字典,记录每一步的状态、动作、奖励等信息。 然后,主循环部分。在每一步中,智能体采取动作,环境返回下一个状态、奖励、是否终止等信息。这里需要确定动作空间是离散还是连续的,MetaDrive默认是连续动作,所以可能需要使用随机策略或者预定义策略来生成动作。 数据存储方面,每个episode的数据可能需要分开存储,或者按步存储。同时,需要注意episode终止时的处理,比如达到最大步数或发生碰撞。最后,将收集到的数据保存到文件中,如pkl或h5格式。 还需要考虑数据的大小,用户可能需要调整episode数量和每episode的最大步数来控制数据集规模。另外,数据预处理如归一化可能也需要,但用户没有提到,可能暂时不需要包含。 代码中可能需要处理环境重置,异常处理等。例如,在捕获异常时中断循环。此外,确保使用的MetaDrive版本兼容当前的API,比如是否有start和step方法。 最后,测试代码是否能正常运行,是否成功生成数据集文件。可能还需要提醒用户安装依赖库,如metadrive、numpy等。如果有常见问题,比如环境初始化失败,可以给出提示。 总结步骤:导入库、初始化环境、配置参数、创建存储结构、主循环(生成动作、执行环境步进、存储数据)、处理终止、保存数据。同时,给出使用随机策略的示例,并建议用户根据需求修改策略部分。</think>以下是用MetaDrive生成离线强化学习数据集的完整代码实现,包含关键步骤说明和注意事项: ```python import gym import metadrive import numpy as np import pickle from tqdm import tqdm def collect_metadrive_dataset( episode_num=100, # 收集的episode数量 max_steps=1000, # 每个episode最大步数 save_path="./metadrive_offline_data.pkl" ): # 初始化环境 config = dict( use_render=False, # 非渲染模式加速收集 traffic_density=0.6, # 交通密度设置 environment_num=200, # 环境多样性设置 start_seed=42, # 随机种子 ) env = gym.make("MetaDrive-v0", config=config) dataset = { "observations": [], "actions": [], "rewards": [], "next_observations": [], "terminals": [], "infos": [] } for ep in tqdm(range(episode_num)): obs = env.reset() episode_data = {k: [] for k in dataset.keys()} for _ in range(max_steps): # 生成随机动作(这里可以替换为其他策略) action = env.action_space.sample() # 环境交互 next_obs, reward, done, info = env.step(action) # 存储数据 episode_data["observations"].append(obs) episode_data["actions"].append(action) episode_data["rewards"].append(reward) episode_data["next_observations"].append(next_obs) episode_data["terminals"].append(done) episode_data["infos"].append(info) obs = next_obs if done: break # 合并数据 for k in dataset.keys(): dataset[k].extend(episode_data[k]) # 保存数据集 with open(save_path, "wb") as f: pickle.dump(dataset, f) env.close() print(f"数据集已保存至 {save_path}") print(f"总数据量:{len(dataset['observations'])} 条") # 执行数据收集 if __name__ == "__main__": collect_metadrive_dataset( episode_num=50, # 初始建议收集50个episode max_steps=500 # 每个episode最多500步 ) ``` 关键功能说明: 1. **环境配置**: - 设置非渲染模式 (`use_render=False`) 加速数据收集 - 通过 `traffic_density` 控制交通复杂度 - `environment_num` 确保场景多样性 2. **数据存储结构**: ```python dataset = { "observations": [], # 当前状态 "actions": [], # 执行动作 "rewards": [], # 即时奖励 "next_observations": [], # 下一状态 "terminals": [], # 终止标志 "infos": [] # 附加信息(位置、碰撞等) } ``` 3. **动作生成策略**: - 当前使用随机策略 `env.action_space.sample()` - 可替换为专家策略(需自行实现): ```python # 示例:简单启发式策略 def expert_policy(obs): steering = np.clip(obs[4], -0.1, 0.1) # 基于横向偏移调整 throttle = 0.8 if obs[6] > 5 else 0.3 # 基于前车距离控制速度 return np.array([steering, throttle]) ``` 注意事项: 1. **硬件要求**: - 单个episode(500步)约占用30MB内存 - 推荐使用多进程并行收集(需修改代码) 2. **数据质量优化**: ```python # 在环境配置中增加: config.update({ "vehicle_config": {"lidar": {"num_lasers": 128}}, # 更精细的传感器配置 "decision_repeat": 5, # 控制动作频率 "accident_prob": 0.3 # 调整事故发生率 }) ``` 3. **数据格式转换**: 如需转换为HDF5格式: ```python import h5py def convert_to_hdf5(pkl_path, hdf5_path): with open(pkl_path, "rb") as f: data = pickle.load(f) with h5py.File(hdf5_path, "w") as hf: for k, v in data.items(): hf.create_dataset(k, data=np.array(v)) ``` 典型工作流程: 1. 安装依赖:`pip install metadrive==0.2.4 numpy tqdm` 2. 运行收集脚本 3. 验证数据完整性: ```python with open("metadrive_offline_data.pkl", "rb") as f: data = pickle.load(f) print("状态维度:", data["observations"][0].shape) print("动作范围:", np.min(data["actions"]), np.max(data["actions"])) ``` 扩展建议: - 添加传感器数据(摄像头、LiDAR) - 记录环境元数据(天气、时间) - 实现优先级采样(优先保存高风险场景数据) 该代码生成的数据集可直接用于主流离线RL算法(如CQL、BCQ等),建议初始收集50万至100万条transition作为基线数据集。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值