GYM 101102 F.Exchange(水~)

本文介绍了一种算法,该算法通过将字符串中的两种字符X和Y进行互换,来寻找所有可能组合中字典序最小的字符串。通过记录每个字符是否已出现及是否可用于交换,实现高效求解。

Description
给出一个字符串,可以把这个字符串中两种字符X和Y互换,即所有X字符变成Y字符,所有Y字符变成X字符,求所有互换中字典序最小的串
Input
第一行一整数T表示用例组数,每组用例输入一个串长不超过1e5的全由小写字母组成的串
Output
输出经过互换后字典序最小的串
Sample Input
3
hamza
racecar
mca
Sample Output
ahmzh
arcecra
acm
Solution
以a[i]记录第i个字符是否可以用来交换,以b[i]记录第i个字符是否在之前已经出现过,对于第i个字符,如果b[i]=1说明之前已经出现过这个字符,之前就没换现在肯定也不会换,如果b[i]=0说明这个字符可能可以换,那么就找最小的j满足a[j]=1表示第i个字符能够用来交换,那么交换s[i]和j+’a’即可
Code

#include<cstdio>
#include<iostream>
#include<cstring>
#include<algorithm>
#include<cmath>
#include<vector>
#include<queue>
#include<map>
#include<set>
#include<ctime>
using namespace std;
typedef long long ll;
#define INF 0x3f3f3f3f
#define maxn 111111
int T,n,a[33],b[33];
char s[maxn];
int main()
{
    scanf("%d",&T);
    while(T--)
    {
        scanf("%s",s);
        n=strlen(s);
        memset(a,-1,sizeof(a));
        memset(b,0,sizeof(b));
        for(int i=0;i<n;i++)a[s[i]-'a']=1;
        for(int i=0;i<n;i++)
        {
            if(b[s[i]-'a'])continue;
            int j=0;
            while(a[j]==-1)j++;
            if(j==26)break;
            if(s[i]==j+'a')a[j]=-1,b[j]=1;
            else
            {
                char x=s[i],y=j+'a';
                for(int k=0;k<n;k++)
                    if(s[k]==x)s[k]=y;
                    else if(s[k]==y)s[k]=x;
                break;
            }
        }
        printf("%s\n",s);

    }
    return 0;
}
我想在UR5e上面复现github上的这个代码,但我不知道怎么开始。包括配置中控之类的,请你把我当成一个小白来详细教我。# Diffusion Policy [[Project page]](https://diffusion-policy.cs.columbia.edu/) [[Paper]](https://diffusion-policy.cs.columbia.edu/#paper) [[Data]](https://diffusion-policy.cs.columbia.edu/data/) [[Colab (state)]](https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing) [[Colab (vision)]](https://colab.research.google.com/drive/18GIHeOQ5DyjMN8iIRZL2EKZ0745NLIpg?usp=sharing) [Cheng Chi](http://cheng-chi.github.io/)<sup>1</sup>, [Siyuan Feng](https://www.cs.cmu.edu/~sfeng/)<sup>2</sup>, [Yilun Du](https://yilundu.github.io/)<sup>3</sup>, [Zhenjia Xu](https://www.zhenjiaxu.com/)<sup>1</sup>, [Eric Cousineau](https://www.eacousineau.com/)<sup>2</sup>, [Benjamin Burchfiel](http://www.benburchfiel.com/)<sup>2</sup>, [Shuran Song](https://www.cs.columbia.edu/~shurans/)<sup>1</sup> <sup>1</sup>Columbia University, <sup>2</sup>Toyota Research Institute, <sup>3</sup>MIT <img src="media/teaser.png" alt="drawing" width="100%"/> <img src="media/multimodal_sim.png" alt="drawing" width="100%"/> ## 🛝 Try it out! Our self-contained Google Colab notebooks is the easiest way to play with Diffusion Policy. We provide separate notebooks for [state-based environment](https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing) and [vision-based environment](https://colab.research.google.com/drive/18GIHeOQ5DyjMN8iIRZL2EKZ0745NLIpg?usp=sharing). ## 🧾 Checkout our experiment logs! For each experiment used to generate Table I,II and IV in the [paper](https://diffusion-policy.cs.columbia.edu/#paper), we provide: 1. A `config.yaml` that contains all parameters needed to reproduce the experiment. 2. Detailed training/eval `logs.json.txt` for every training step. 3. Checkpoints for the best `epoch=*-test_mean_score=*.ckpt` and last `latest.ckpt` epoch of each run. Experiment logs are hosted on our website as nested directories in format: `https://diffusion-policy.cs.columbia.edu/data/experiments/<image|low_dim>/<task>/<method>/` Within each experiment directory you may find: ``` . ├── config.yaml ├── metrics │   └── logs.json.txt ├── train_0 │   ├── checkpoints │   │   ├── epoch=0300-test_mean_score=1.000.ckpt │   │   └── latest.ckpt │   └── logs.json.txt ├── train_1 │   ├── checkpoints │   │   ├── epoch=0250-test_mean_score=1.000.ckpt │   │   └── latest.ckpt │   └── logs.json.txt └── train_2 ├── checkpoints │   ├── epoch=0250-test_mean_score=1.000.ckpt │   └── latest.ckpt └── logs.json.txt ``` The `metrics/logs.json.txt` file aggregates evaluation metrics from all 3 training runs every 50 epochs using `multirun_metrics.py`. The numbers reported in the paper correspond to `max` and `k_min_train_loss` aggregation keys. To download all files in a subdirectory, use: ```console $ wget --recursive --no-parent --no-host-directories --relative --reject="index.html*" https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/square_ph/diffusion_policy_cnn/ ``` ## 🛠️ Installation ### 🖥️ Simulation To reproduce our simulation benchmark results, install our conda environment on a Linux machine with Nvidia GPU. On Ubuntu 20.04 you need to install the following apt packages for mujoco: ```console $ sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf ``` We recommend [Mambaforge](https://github.com/conda-forge/miniforge#mambaforge) instead of the standard anaconda distribution for faster installation: ```console $ mamba env create -f conda_environment.yaml ``` but you can use conda as well: ```console $ conda env create -f conda_environment.yaml ``` The `conda_environment_macos.yaml` file is only for development on MacOS and does not have full support for benchmarks. ### 🦾 Real Robot Hardware (for Push-T): * 1x [UR5-CB3](https://www.universal-robots.com/cb3) or [UR5e](https://www.universal-robots.com/products/ur5-robot/) ([RTDE Interface](https://www.universal-robots.com/articles/ur/interface-communication/real-time-data-exchange-rtde-guide/) is required) * 2x [RealSense D415](https://www.intelrealsense.com/depth-camera-d415/) * 1x [3Dconnexion SpaceMouse](https://3dconnexion.com/us/product/spacemouse-wireless/) (for teleop) * 1x [Millibar Robotics Manual Tool Changer](https://www.millibar.com/manual-tool-changer/) (only need robot side) * 1x 3D printed [End effector](https://cad.onshape.com/documents/a818888644a15afa6cc68ee5/w/2885b48b018cda84f425beca/e/3e8771c2124cee024edd2fed?renderMode=0&uiState=63ffcba6631ca919895e64e5) * 1x 3D printed [T-block](https://cad.onshape.com/documents/f1140134e38f6ed6902648d5/w/a78cf81827600e4ff4058d03/e/f35f57fb7589f72e05c76caf?renderMode=0&uiState=63ffcbc9af4a881b344898ee) * USB-C cables and screws for RealSense Software: * Ubuntu 20.04.3 (tested) * Mujoco dependencies: `sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf` * [RealSense SDK](https://github.com/IntelRealSense/librealsense/blob/master/doc/distribution_linux.md) * Spacemouse dependencies: `sudo apt install libspnav-dev spacenavd; sudo systemctl start spacenavd` * Conda environment `mamba env create -f conda_environment_real.yaml` ## 🖥️ Reproducing Simulation Benchmark Results ### Download Training Data Under the repo root, create data subdirectory: ```console [diffusion_policy]$ mkdir data && cd data ``` Download the corresponding zip file from [https://diffusion-policy.cs.columbia.edu/data/training/](https://diffusion-policy.cs.columbia.edu/data/training/) ```console [data]$ wget https://diffusion-policy.cs.columbia.edu/data/training/pusht.zip ``` Extract training data: ```console [data]$ unzip pusht.zip && rm -f pusht.zip && cd .. ``` Grab config file for the corresponding experiment: ```console [diffusion_policy]$ wget -O image_pusht_diffusion_policy_cnn.yaml https://diffusion-policy.cs.columbia.edu/data/experiments/image/pusht/diffusion_policy_cnn/config.yaml ``` ### Running for a single seed Activate conda environment and login to [wandb](https://wandb.ai) (if you haven't already). ```console [diffusion_policy]$ conda activate robodiff (robodiff)[diffusion_policy]$ wandb login ``` Launch training with seed 42 on GPU 0. ```console (robodiff)[diffusion_policy]$ python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml training.seed=42 training.device=cuda:0 hydra.run.dir='data/outputs/${now:%Y.%m.%d}/${now:%H.%M.%S}_${name}_${task_name}' ``` This will create a directory in format `data/outputs/yyyy.mm.dd/hh.mm.ss_<method_name>_<task_name>` where configs, logs and checkpoints are written to. The policy will be evaluated every 50 epochs with the success rate logged as `test/mean_score` on wandb, as well as videos for some rollouts. ```console (robodiff)[diffusion_policy]$ tree data/outputs/2023.03.01/20.02.03_train_diffusion_unet_hybrid_pusht_image -I wandb data/outputs/2023.03.01/20.02.03_train_diffusion_unet_hybrid_pusht_image ├── checkpoints │ ├── epoch=0000-test_mean_score=0.134.ckpt │ └── latest.ckpt ├── .hydra │ ├── config.yaml │ ├── hydra.yaml │ └── overrides.yaml ├── logs.json.txt ├── media │ ├── 2k5u6wli.mp4 │ ├── 2kvovxms.mp4 │ ├── 2pxd9f6b.mp4 │ ├── 2q5gjt5f.mp4 │ ├── 2sawbf6m.mp4 │ └── 538ubl79.mp4 └── train.log 3 directories, 13 files ``` ### Running for multiple seeds Launch local ray cluster. For large scale experiments, you might want to setup an [AWS cluster with autoscaling](https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/aws.html). All other commands remain the same. ```console (robodiff)[diffusion_policy]$ export CUDA_VISIBLE_DEVICES=0,1,2 # select GPUs to be managed by the ray cluster (robodiff)[diffusion_policy]$ ray start --head --num-gpus=3 ``` Launch a ray client which will start 3 training workers (3 seeds) and 1 metrics monitor worker. ```console (robodiff)[diffusion_policy]$ python ray_train_multirun.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml --seeds=42,43,44 --monitor_key=test/mean_score -- multi_run.run_dir='data/outputs/${now:%Y.%m.%d}/${now:%H.%M.%S}_${name}_${task_name}' multi_run.wandb_name_base='${now:%Y.%m.%d-%H.%M.%S}_${name}_${task_name}' ``` In addition to the wandb log written by each training worker individually, the metrics monitor worker will log to wandb project `diffusion_policy_metrics` for the metrics aggregated from all 3 training runs. Local config, logs and checkpoints will be written to `data/outputs/yyyy.mm.dd/hh.mm.ss_<method_name>_<task_name>` in a directory structure identical to our [training logs](https://diffusion-policy.cs.columbia.edu/data/experiments/): ```console (robodiff)[diffusion_policy]$ tree data/outputs/2023.03.01/22.13.58_train_diffusion_unet_hybrid_pusht_image -I 'wandb|media' data/outputs/2023.03.01/22.13.58_train_diffusion_unet_hybrid_pusht_image ├── config.yaml ├── metrics │ ├── logs.json.txt │ ├── metrics.json │ └── metrics.log ├── train_0 │ ├── checkpoints │ │ ├── epoch=0000-test_mean_score=0.174.ckpt │ │ └── latest.ckpt │ ├── logs.json.txt │ └── train.log ├── train_1 │ ├── checkpoints │ │ ├── epoch=0000-test_mean_score=0.131.ckpt │ │ └── latest.ckpt │ ├── logs.json.txt │ └── train.log └── train_2 ├── checkpoints │ ├── epoch=0000-test_mean_score=0.105.ckpt │ └── latest.ckpt ├── logs.json.txt └── train.log 7 directories, 16 files ``` ### 🆕 Evaluate Pre-trained Checkpoints Download a checkpoint from the published training log folders, such as [https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/pusht/diffusion_policy_cnn/train_0/checkpoints/epoch=0550-test_mean_score=0.969.ckpt](https://diffusion-policy.cs.columbia.edu/data/experiments/low_dim/pusht/diffusion_policy_cnn/train_0/checkpoints/epoch=0550-test_mean_score=0.969.ckpt). Run the evaluation script: ```console (robodiff)[diffusion_policy]$ python eval.py --checkpoint data/0550-test_mean_score=0.969.ckpt --output_dir data/pusht_eval_output --device cuda:0 ``` This will generate the following directory structure: ```console (robodiff)[diffusion_policy]$ tree data/pusht_eval_output data/pusht_eval_output ├── eval_log.json └── media ├── 1fxtno84.mp4 ├── 224l7jqd.mp4 ├── 2fo4btlf.mp4 ├── 2in4cn7a.mp4 ├── 34b3o2qq.mp4 └── 3p7jqn32.mp4 1 directory, 7 files ``` `eval_log.json` contains metrics that is logged to wandb during training: ```console (robodiff)[diffusion_policy]$ cat data/pusht_eval_output/eval_log.json { "test/mean_score": 0.9150393806777066, "test/sim_max_reward_4300000": 1.0, "test/sim_max_reward_4300001": 0.9872969750774386, ... "train/sim_video_1": "data/pusht_eval_output//media/2fo4btlf.mp4" } ``` ## 🦾 Demo, Training and Eval on a Real Robot Make sure your UR5 robot is running and accepting command from its network interface (emergency stop button within reach at all time), your RealSense cameras plugged in to your workstation (tested with `realsense-viewer`) and your SpaceMouse connected with the `spacenavd` daemon running (verify with `systemctl status spacenavd`). Start the demonstration collection script. Press "C" to start recording. Use SpaceMouse to move the robot. Press "S" to stop recording. ```console (robodiff)[diffusion_policy]$ python demo_real_robot.py -o data/demo_pusht_real --robot_ip 192.168.0.204 ``` This should result in a demonstration dataset in `data/demo_pusht_real` with in the same structure as our example [real Push-T training dataset](https://diffusion-policy.cs.columbia.edu/data/training/pusht_real.zip). To train a Diffusion Policy, launch training with config: ```console (robodiff)[diffusion_policy]$ python train.py --config-name=train_diffusion_unet_real_image_workspace task.dataset_path=data/demo_pusht_real ``` Edit [`diffusion_policy/config/task/real_pusht_image.yaml`](./diffusion_policy/config/task/real_pusht_image.yaml) if your camera setup is different. Assuming the training has finished and you have a checkpoint at `data/outputs/blah/checkpoints/latest.ckpt`, launch the evaluation script with: ```console python eval_real_robot.py -i data/outputs/blah/checkpoints/latest.ckpt -o data/eval_pusht_real --robot_ip 192.168.0.204 ``` Press "C" to start evaluation (handing control over to the policy). Press "S" to stop the current episode. ## 🗺️ Codebase Tutorial This codebase is structured under the requirement that: 1. implementing `N` tasks and `M` methods will only require `O(N+M)` amount of code instead of `O(N*M)` 2. while retaining maximum flexibility. To achieve this requirement, we 1. maintained a simple unified interface between tasks and methods and 2. made the implementation of the tasks and the methods independent of each other. These design decisions come at the cost of code repetition between the tasks and the methods. However, we believe that the benefit of being able to add/modify task/methods without affecting the remainder and being able understand a task/method by reading the code linearly outweighs the cost of copying and pasting 😊. ### The Split On the task side, we have: * `Dataset`: adapts a (third-party) dataset to the interface. * `EnvRunner`: executes a `Policy` that accepts the interface and produce logs and metrics. * `config/task/<task_name>.yaml`: contains all information needed to construct `Dataset` and `EnvRunner`. * (optional) `Env`: an `gym==0.21.0` compatible class that encapsulates the task environment. On the policy side, we have: * `Policy`: implements inference according to the interface and part of the training process. * `Workspace`: manages the life-cycle of training and evaluation (interleaved) of a method. * `config/<workspace_name>.yaml`: contains all information needed to construct `Policy` and `Workspace`. ### The Interface #### Low Dim A [`LowdimPolicy`](./diffusion_policy/policy/base_lowdim_policy.py) takes observation dictionary: - `"obs":` Tensor of shape `(B,To,Do)` and predicts action dictionary: - `"action": ` Tensor of shape `(B,Ta,Da)` A [`LowdimDataset`](./diffusion_policy/dataset/base_dataset.py) returns a sample of dictionary: - `"obs":` Tensor of shape `(To, Do)` - `"action":` Tensor of shape `(Ta, Da)` Its `get_normalizer` method returns a [`LinearNormalizer`](./diffusion_policy/model/common/normalizer.py) with keys `"obs","action"`. The `Policy` handles normalization on GPU with its copy of the `LinearNormalizer`. The parameters of the `LinearNormalizer` is saved as part of the `Policy`'s weights checkpoint. #### Image A [`ImagePolicy`](./diffusion_policy/policy/base_image_policy.py) takes observation dictionary: - `"key0":` Tensor of shape `(B,To,*)` - `"key1":` Tensor of shape e.g. `(B,To,H,W,3)` ([0,1] float32) and predicts action dictionary: - `"action": ` Tensor of shape `(B,Ta,Da)` A [`ImageDataset`](./diffusion_policy/dataset/base_dataset.py) returns a sample of dictionary: - `"obs":` Dict of - `"key0":` Tensor of shape `(To, *)` - `"key1":` Tensor fo shape `(To,H,W,3)` - `"action":` Tensor of shape `(Ta, Da)` Its `get_normalizer` method returns a [`LinearNormalizer`](./diffusion_policy/model/common/normalizer.py) with keys `"key0","key1","action"`. #### Example ``` To = 3 Ta = 4 T = 6 |o|o|o| | | |a|a|a|a| |o|o| | |a|a|a|a|a| | | | | |a|a| ``` Terminology in the paper: `varname` in the codebase - Observation Horizon: `To|n_obs_steps` - Action Horizon: `Ta|n_action_steps` - Prediction Horizon: `T|horizon` The classical (e.g. MDP) single step observation/action formulation is included as a special case where `To=1` and `Ta=1`. ## 🔩 Key Components ### `Workspace` A `Workspace` object encapsulates all states and code needed to run an experiment. * Inherits from [`BaseWorkspace`](./diffusion_policy/workspace/base_workspace.py). * A single `OmegaConf` config object generated by `hydra` should contain all information needed to construct the Workspace object and running experiments. This config correspond to `config/<workspace_name>.yaml` + hydra overrides. * The `run` method contains the entire pipeline for the experiment. * Checkpoints happen at the `Workspace` level. All training states implemented as object attributes are automatically saved by the `save_checkpoint` method. * All other states for the experiment should be implemented as local variables in the `run` method. The entrypoint for training is `train.py` which uses `@hydra.main` decorator. Read [hydra](https://hydra.cc/)'s official documentation for command line arguments and config overrides. For example, the argument `task=<task_name>` will replace the `task` subtree of the config with the content of `config/task/<task_name>.yaml`, thereby selecting the task to run for this experiment. ### `Dataset` A `Dataset` object: * Inherits from `torch.utils.data.Dataset`. * Returns a sample conforming to [the interface](#the-interface) depending on whether the task has Low Dim or Image observations. * Has a method `get_normalizer` that returns a `LinearNormalizer` conforming to [the interface](#the-interface). Normalization is a very common source of bugs during project development. It is sometimes helpful to print out the specific `scale` and `bias` vectors used for each key in the `LinearNormalizer`. Most of our implementations of `Dataset` uses a combination of [`ReplayBuffer`](#replaybuffer) and [`SequenceSampler`](./diffusion_policy/common/sampler.py) to generate samples. Correctly handling padding at the beginning and the end of each demonstration episode according to `To` and `Ta` is important for good performance. Please read our [`SequenceSampler`](./diffusion_policy/common/sampler.py) before implementing your own sampling method. ### `Policy` A `Policy` object: * Inherits from `BaseLowdimPolicy` or `BaseImagePolicy`. * Has a method `predict_action` that given observation dict, predicts actions conforming to [the interface](#the-interface). * Has a method `set_normalizer` that takes in a `LinearNormalizer` and handles observation/action normalization internally in the policy. * (optional) Might has a method `compute_loss` that takes in a batch and returns the loss to be optimized. * (optional) Usually each `Policy` class correspond to a `Workspace` class due to the differences of training and evaluation process between methods. ### `EnvRunner` A `EnvRunner` object abstracts away the subtle differences between different task environments. * Has a method `run` that takes a `Policy` object for evaluation, and returns a dict of logs and metrics. Each value should be compatible with `wandb.log`. To maximize evaluation speed, we usually vectorize environments using our modification of [`gym.vector.AsyncVectorEnv`](./diffusion_policy/gym_util/async_vector_env.py) which runs each individual environment in a separate process (workaround python GIL). ⚠️ Since subprocesses are launched using `fork` on linux, you need to be specially careful for environments that creates its OpenGL context during initialization (e.g. robosuite) which, once inherited by the child process memory space, often causes obscure bugs like segmentation fault. As a workaround, you can provide a `dummy_env_fn` that constructs an environment without initializing OpenGL. ### `ReplayBuffer` The [`ReplayBuffer`](./diffusion_policy/common/replay_buffer.py) is a key data structure for storing a demonstration dataset both in-memory and on-disk with chunking and compression. It makes heavy use of the [`zarr`](https://zarr.readthedocs.io/en/stable/index.html) format but also has a `numpy` backend for lower access overhead. On disk, it can be stored as a nested directory (e.g. `data/pusht_cchi_v7_replay.zarr`) or a zip file (e.g. `data/robomimic/datasets/square/mh/image_abs.hdf5.zarr.zip`). Due to the relative small size of our datasets, it's often possible to store the entire image-based dataset in RAM with [`Jpeg2000` compression](./diffusion_policy/codecs/imagecodecs_numcodecs.py) which eliminates disk IO during training at the expense increasing of CPU workload. Example: ``` data/pusht_cchi_v7_replay.zarr ├── data │ ├── action (25650, 2) float32 │ ├── img (25650, 96, 96, 3) float32 │ ├── keypoint (25650, 9, 2) float32 │ ├── n_contacts (25650, 1) float32 │ └── state (25650, 5) float32 └── meta └── episode_ends (206,) int64 ``` Each array in `data` stores one data field from all episodes concatenated along the first dimension (time). The `meta/episode_ends` array stores the end index for each episode along the fist dimension. ### `SharedMemoryRingBuffer` The [`SharedMemoryRingBuffer`](./diffusion_policy/shared_memory/shared_memory_ring_buffer.py) is a lock-free FILO data structure used extensively in our [real robot implementation](./diffusion_policy/real_world) to utilize multiple CPU cores while avoiding pickle serialization and locking overhead for `multiprocessing.Queue`. As an example, we would like to get the most recent `To` frames from 5 RealSense cameras. We launch 1 realsense SDK/pipeline per process using [`SingleRealsense`](./diffusion_policy/real_world/single_realsense.py), each continuously writes the captured images into a `SharedMemoryRingBuffer` shared with the main process. We can very quickly get the last `To` frames in the main process due to the FILO nature of `SharedMemoryRingBuffer`. We also implemented [`SharedMemoryQueue`](./diffusion_policy/shared_memory/shared_memory_queue.py) for FIFO, which is used in [`RTDEInterpolationController`](./diffusion_policy/real_world/rtde_interpolation_controller.py). ### `RealEnv` In contrast to [OpenAI Gym](https://gymnasium.farama.org/), our polices interact with the environment asynchronously. In [`RealEnv`](./diffusion_policy/real_world/real_env.py), the `step` method in `gym` is split into two methods: `get_obs` and `exec_actions`. The `get_obs` method returns the latest observation from `SharedMemoryRingBuffer` as well as their corresponding timestamps. This method can be call at any time during an evaluation episode. The `exec_actions` method accepts a sequence of actions and timestamps for the expected time of execution for each step. Once called, the actions are simply enqueued to the `RTDEInterpolationController`, and the method returns without blocking for execution. ## 🩹 Adding a Task Read and imitate: * `diffusion_policy/dataset/pusht_image_dataset.py` * `diffusion_policy/env_runner/pusht_image_runner.py` * `diffusion_policy/config/task/pusht_image.yaml` Make sure that `shape_meta` correspond to input and output shapes for your task. Make sure `env_runner._target_` and `dataset._target_` point to the new classes you have added. When training, add `task=<your_task_name>` to `train.py`'s arguments. ## 🩹 Adding a Method Read and imitate: * `diffusion_policy/workspace/train_diffusion_unet_image_workspace.py` * `diffusion_policy/policy/diffusion_unet_image_policy.py` * `diffusion_policy/config/train_diffusion_unet_image_workspace.yaml` Make sure your workspace yaml's `_target_` points to the new workspace class you created. ## 🏷️ License This repository is released under the MIT license. See [LICENSE](LICENSE) for additional details. ## 🙏 Acknowledgement * Our [`ConditionalUnet1D`](./diffusion_policy/model/diffusion/conditional_unet1d.py) implementation is adapted from [Planning with Diffusion](https://github.com/jannerm/diffuser). * Our [`TransformerForDiffusion`](./diffusion_policy/model/diffusion/transformer_for_diffusion.py) implementation is adapted from [MinGPT](https://github.com/karpathy/minGPT). * The [BET](./diffusion_policy/model/bet) baseline is adapted from [its original repo](https://github.com/notmahi/bet). * The [IBC](./diffusion_policy/policy/ibc_dfo_lowdim_policy.py) baseline is adapted from [Kevin Zakka's reimplementation](https://github.com/kevinzakka/ibc). * The [Robomimic](https://github.com/ARISE-Initiative/robomimic) tasks and [`ObservationEncoder`](https://github.com/ARISE-Initiative/robomimic/blob/master/robomimic/models/obs_nets.py) are used extensively in this project. * The [Push-T](./diffusion_policy/env/pusht) task is adapted from [IBC](https://github.com/google-research/ibc). * The [Block Pushing](./diffusion_policy/env/block_pushing) task is adapted from [BET](https://github.com/notmahi/bet) and [IBC](https://github.com/google-research/ibc). * The [Kitchen](./diffusion_policy/env/kitchen) task is adapted from [BET](https://github.com/notmahi/bet) and [Relay Policy Learning](https://github.com/google-research/relay-policy-learning). * Our [shared_memory](./diffusion_policy/shared_memory) data structures are heavily inspired by [shared-ndarray2](https://gitlab.com/osu-nrsg/shared-ndarray2).
06-29
void CappDlg::OnBnClickedAdmittanceControl() { // TODO: 在此添加控件通知处理程序代码 if (!m_bStiffnessControlActive) { if (!PathFileExists(_T("initial_stiffness_params.txt"))) { AfxMessageBox(_T("请先运行Python程序生成初始刚度参数")); return; } // 通知Python std::ofstream doneFile("control_done_signal.txt"); doneFile << "1"; doneFile.close(); StartStiffnessControl(); } else { StopStiffnessControl(); } } void CappDlg::OnBnClickedRotateyaxisIncremental() { //TODO: 在此添加控件通知处理程序代码 short ncid; ncid = TestOpenComm(0, 0); if (ncid < 0) { AfxMessageBox(_T("无法打开设备通信!")); return; } double deltaPose[12] = { 0,0,0,tx,ty,tz,0,0,0,0,0,0 }; // 初始化全0 // 2. 以最小角速度(0.1°/s)执行增量运动 short rv = BscImov( ncid, "VR", // 角速度模式(单位°/s) 0.1, // 0.1°/s(文档允许的最小值) "BASE", // 基于工具坐标系旋转 sToolno, // 工具号 deltaPose // 增量位姿 ); // 3. 错误处理 if (rv != 0) { char errMsg[256] = { 0 }; BscGetError(ncid); } rv = TestCloseComm(ncid); } void CappDlg::StartFileWatcher() { m_fileWatcherThread = std::thread([this]() { while (m_bFileWatcherRunning) { // 1. 检查初始参数 if (!m_bInitialParamsLoaded && PathFileExists(_T("initial_stiffness_params.txt"))) { std::lock_guard<std::mutex> lock(m_fileMutex); CStdioFile file; if (file.Open(_T("initial_stiffness_params.txt"), CFile::modeRead)) { CString strParams; file.ReadString(strParams); CT2CA convertedString(strParams); if (sscanf_s(convertedString, "%lf,%lf,%lf", &m_dStiffness[0], &m_dStiffness[1], &m_dStiffness[2]) == 3) { m_bInitialParamsLoaded = true; m_bParamsUpdated = true; CString strMsg; strMsg.Format(_T("初始刚度参数加载成功: %.1f, %.1f, %.1f"), m_dStiffness[0], m_dStiffness[1], m_dStiffness[2]); OutputDebugString(strMsg); // 通知Python std::ofstream doneFile("control_done_signal.txt"); doneFile << "1"; doneFile.close(); } file.Close(); //DeleteFile(_T("initial_stiffness_params.txt")); } } // 2. 检查新参数 if (PathFileExists(_T("new_stiffness_params.txt"))) { std::lock_guard<std::mutex> lock(m_fileMutex); CStdioFile file; if (file.Open(_T("new_stiffness_params.txt"), CFile::modeRead)) { CString strParams; file.ReadString(strParams); CT2CA convertedString(strParams); if (sscanf_s(convertedString, "%lf,%lf,%lf", &m_dStiffness[0], &m_dStiffness[1], &m_dStiffness[2]) == 3) { m_bParamsUpdated = true; CString strMsg; strMsg.Format(_T("新刚度参数接收: %.1f, %.1f, %.1f"), m_dStiffness[0], m_dStiffness[1], m_dStiffness[2]); OutputDebugString(strMsg); // 通知Python std::ofstream doneFile("control_done_signal.txt"); doneFile << "1"; doneFile.close(); } file.Close(); DeleteFile(_T("new_stiffness_params.txt")); } } // 3. 检查重置信号 if (PathFileExists(_T("reset_signal.txt"))) { std::lock_guard<std::mutex> lock(m_fileMutex); // 输出重置信息 OutputDebugString(_T("[INFO] 收到重置信号,开始重置机器人")); DeleteFile(_T("reset_signal.txt")); /*if (m_bStiffnessControlActive) { m_bStiffnessControlActive = false; if (m_stiffnessControlThread.joinable()) { m_stiffnessControlThread.join(); } OutputDebugString(_T("[INFO] 控制线程已停止")); }*/ ResetRobot(); SaveRobotState(); // 通知Python重置完成 std::ofstream doneFile("control_done_signal.txt"); doneFile << "1"; doneFile.close(); //// 更新UI显示 //GetDlgItem(IDC_ADMITTANCE_CONTROL)->SetWindowText(_T("启动变刚度控制")); //OutputDebugString(_T("[INFO] 机器人复位流程完成")); } Sleep(100); // 100ms检查一次 } }); } void CappDlg::SaveRobotStateAsync(const double* torque, const double* pose) { // 复制数据到局部变量 double localTorque[3], localPose[3]; memcpy(localTorque, torque, sizeof(localTorque)); memcpy(localPose, pose, sizeof(localPose)); // 异步保存 std::thread([=]() { std::lock_guard<std::mutex> lock(m_fileMutex); // 先写入临时文件 CString tempFile = _T("robot_state.tmp"); std::ofstream outfile(tempFile); if (outfile.is_open()) { outfile << localTorque[0] << "," << localTorque[1] << "," << localTorque[2] << "," << localPose[0] << "," << localPose[1] << "," << localPose[2]; outfile.close(); // 原子性替换文件 if (!::MoveFileEx(tempFile, _T("robot_state.csv"), MOVEFILE_REPLACE_EXISTING | MOVEFILE_WRITE_THROUGH)) { OutputDebugString(_T("[ERROR] 无法更新状态文件")); } // 通知Python状态已就绪 std::ofstream readyFile("state_ready_signal.txt"); readyFile << "1"; readyFile.close(); } }).detach(); } void CappDlg::SaveRobotState() { // 使用互斥锁保护共享数据 std::lock_guard<std::mutex> lock(m_mutex); // 如果通信端口忙,则跳过本次保存 if (m_bCommInUse) { OutputDebugString(_T("[INFO] 延迟保存状态:通信端口正忙")); return; } // 获取当前状态 double torque[3], pose[3]; GetTorqueData(torque); memcpy(pose, current_orientation, sizeof(pose)); // 异步保存 SaveRobotStateAsync(torque, pose); } void CappDlg::StartStiffnessControl() { if (!m_bStiffnessControlActive) { m_bStiffnessControlActive = true; m_stiffnessControlThread = std::thread(&CappDlg::StiffnessControlThread, this); // 更新按钮文本 GetDlgItem(IDC_ADMITTANCE_CONTROL)->SetWindowText(_T("停止变刚度控制")); } } void CappDlg::StopStiffnessControl() { if (m_bStiffnessControlActive) { m_bStiffnessControlActive = false; if (m_stiffnessControlThread.joinable()) { m_stiffnessControlThread.join(); } // 更新按钮文本 GetDlgItem(IDC_ADMITTANCE_CONTROL)->SetWindowText(_T("启动变刚度控制")); } } void CappDlg::StiffnessControlThread() { const double dt = 0.005; // 控制周期5ms double de[3] = { 0 }; double dde[3] = { 0 }; double last_orientation[3] = { 0 }; // 线程启动时打开连接 short nCid = SafeOpenComm(); if (nCid < 0) { OutputDebugString(_T("[ERROR] 控制线程启动失败:无法连接机器人")); return; } while (m_bStiffnessControlActive) { // 增加连接状态检查 if (m_nRobotCid < 0) { SafeCloseComm(); m_nRobotCid = SafeOpenComm(); if (m_nRobotCid < 0) { Sleep(1000); continue; } } // 检查连接有效性 if (nCid < 0) { nCid = SafeOpenComm(); if (nCid < 0) { Sleep(static_cast<DWORD>(dt * 1000)); continue; } } // 带重试的姿态获取 double dPos[12] = { 0 }; WORD rconf, toolno; bool bSuccess = false; for (int i = 0; i < 3 && !bSuccess; i++) { if (BscIsRobotPos(nCid, "BASE", 0, &rconf, &toolno, dPos) == 0) { bSuccess = true; // 更新当前姿态 current_orientation[0] = dPos[3]; current_orientation[1] = dPos[4]; current_orientation[2] = dPos[5]; // 获取力矩数据 GetTorqueData(dPos_force); // 计算调整量 bool orientation_changed = ( fabs(current_orientation[0] - last_orientation[0]) > 0.1 || fabs(current_orientation[1] - last_orientation[1]) > 0.1 || fabs(current_orientation[2] - last_orientation[2]) > 0.1); if (m_bParamsUpdated || orientation_changed) { double adjust[3]; for (int i = 0; i < 3; i++) { adjust[i] = OrientationControl(i); adjust[i] = clamp(adjust[i], -0.5, 0.5); dPos[3 + i] += adjust[i]; } // 发送运动指令 BscMovj(nCid, 0.3, "BASE", 0, 0, dPos); m_bParamsUpdated = false; memcpy(last_orientation, current_orientation, sizeof(last_orientation)); // 异步保存状态 SaveRobotStateAsync(dPos_force + 3, current_orientation); } } else if (i == 2) { // 最后一次尝试失败 OutputDebugString(_T("[ERROR] 连续获取姿态失败")); SafeCloseComm(); nCid = -1; } } Sleep(static_cast<DWORD>(dt * 1000)); } // 线程结束时关闭连接 SafeCloseComm(); } double CappDlg::OrientationControl(int axis) { // 1. 获取当前力矩 double tau = dPos_force[3 + axis]; // tx,ty,tz // 2. 计算姿态误差 double e = current_orientation[axis] - m_dTargetOrientation[axis]; // 3. 二阶系统计算 de[axis] += dt * dde[axis]; e += dt * de[axis]; dde[axis] = (tau - m_dDamping[axis] * de[axis] - m_dStiffness[axis] * e) / m_dInertia[axis]; return e; } void CappDlg::GetTorqueData(double* torque) { // 从传感器获取力矩数据 Sensor sensor; std::vector<float> data = sensor.getMeasurement(); for (int i = 0; i < 3; i++) { torque[i] = data[i+3] / 1000000.0; //数据需要转换 } } void CappDlg::OnDestroy() { // 停止所有线程 m_bRunning = false; m_bStiffnessControlActive = false; m_bFileWatcherRunning = false; // 等待线程结束 if (m_stiffnessControlThread.joinable()) { m_stiffnessControlThread.join(); } if (m_fileWatcherThread.joinable()) { m_fileWatcherThread.join(); } // 清理信号文件 DeleteFile(_T("control_done_signal.txt")); DeleteFile(_T("state_ready_signal.txt")); DeleteFile(_T("reset_signal.txt")); CDialogEx::OnDestroy(); } void CappDlg::OnBnClickedBtnReset() { // TODO: 在此添加控件通知处理程序代码 double dPos[12] = { 896, 13, -259, 175, 5, 0 }; // 重置位置 short nCid = TestOpenComm(0, 0); if (nCid >= 0) { BscMovj(nCid, 0.1, "BASE", 0, 0, dPos); TestCloseComm(nCid); } } //void CappDlg::ResetRobot() //{ // double dPos[12] = { 896, 13, -259, 175, 5, 0 }; // 重置位置 // short nCid = TestOpenComm(0, 0); // if (nCid >= 0) { // BscMovj(nCid, 0.1, "BASE", 0, 0, dPos); // TestCloseComm(nCid); // } //} void CappDlg::ResetRobot() { // 设置复位状态 m_bIsResetting = true; m_bResetCompleted = false; OutputDebugString(_T("[INFO] 开始机器人复位流程")); // 先停止所有使用通信端口的线程 StopStiffnessControl(); // 确保通信端口关闭 SafeCloseComm(); // 定义安全的初始位置(根据实际需求调整) double dPos[12] = { 896, 13, -259, 180, 0, 0 }; // 位置和姿态值 // 使用C++11的随机数生成器,比rand()更安全 std::random_device rd; std::mt19937 gen(rd()); std::uniform_real_distribution<> dis_rx(175.0, 179.0); std::uniform_real_distribution<> dis_ry(1.0, 5.0); dPos[3] = dis_rx(gen); // rx dPos[4] = dis_ry(gen); // ry dPos[5] = 0; // rz (固定) // 记录随机生成的位置 CString strRandomPos; strRandomPos.Format(_T("[INFO] 随机生成的复位位置: [%.2f, %.2f, %.2f, %.2f, %.2f, %.2f]"), dPos[0], dPos[1], dPos[2], dPos[3], dPos[4], dPos[5]); OutputDebugString(strRandomPos); // 打开通信连接 short nCid = TestOpenComm(0, 0); if (nCid >= 0) { // 以较低速度移动到初始位置,确保安全 short result = BscMovj(nCid, 0.1, "BASE", 0, 0, dPos); if (result == 0) { OutputDebugString(_T("[INFO] 复位命令已发送,等待机器人移动到初始位置")); // 等待机器人完成移动(根据实际机器人速度调整等待时间) Sleep(15000); // 等待15秒 // 验证复位后的位置 double currentPos[12] = { 0 }; WORD rconf, toolno; if (BscIsRobotPos(nCid, "BASE", 0, &rconf, &toolno, currentPos) == 0) { CString strMsg; strMsg.Format(_T("[INFO] 机器人已复位到位置: [%.2f, %.2f, %.2f, %.2f, %.2f, %.2f]"), currentPos[0], currentPos[1], currentPos[2], currentPos[3], currentPos[4], currentPos[5]); OutputDebugString(strMsg); // 检查复位是否成功(角度误差是否在容差范围内) double reset_errors[3] = { CalculateAngleError(currentPos[3], m_dTargetOrientation[0]), CalculateAngleError(currentPos[4], m_dTargetOrientation[1]), CalculateAngleError(currentPos[5], m_dTargetOrientation[2]) }; bool reset_successful = true; for (int i = 0; i < 3; i++) { if (reset_errors[i] > 5.0) { // 复位后允许的最大误差(度) reset_successful = false; break; } } if (reset_successful) { OutputDebugString(_T("[INFO] 机器人复位成功")); m_bResetCompleted = true; } else { OutputDebugString(_T("[WARN] 机器人复位后角度误差仍较大")); m_bResetCompleted = false; } } else { OutputDebugString(_T("[WARN] 复位后无法获取当前位置")); m_bResetCompleted = false; } } else { OutputDebugString(_T("[ERROR] 复位命令发送失败")); m_bResetCompleted = false; } // 关闭通信连接 SafeCloseComm(); // 复位完成后立即关闭 } else { OutputDebugString(_T("[ERROR] 无法打开通信连接进行复位")); m_bResetCompleted = false; } // 无论复位是否成功,都完成复位流程 m_bIsResetting = false; if (m_bStiffnessControlActive) { StartStiffnessControl(); } } // 安全打开连接(确保全局唯一) //short CappDlg::SafeOpenComm() //{ // // 如果已有有效连接,直接返回 // if (m_nRobotCid >= 0) { // return m_nRobotCid; // } // // // 设置连接使用标志 // if (m_bCommInUse.exchange(true)) { // OutputDebugString(_T("[WARN] 通信端口正忙,等待...")); // return -1; // } // // // 带重试的连接逻辑 // short nCid = -1; // for (int i = 0; i < MAX_RETRY; i++) { // nCid = TestOpenComm(0, 0); // if (nCid >= 0) { // m_nRobotCid = nCid; // OutputDebugString(_T("[INFO] 机器人连接成功")); // break; // } // OutputDebugString(_T("[WARN] 连接尝试失败,重试...")); // Sleep(RETRY_DELAY_MS); // } // // m_bCommInUse = false; // // if (nCid < 0) { // OutputDebugString(_T("[ERROR] 连接失败:达到最大重试次数")); // } // return nCid; //} short CappDlg::SafeOpenComm() { std::lock_guard<std::mutex> lock(m_commMutex); // 硬件初始化冷却期(复位后至少等待10秒) static std::chrono::time_point<std::chrono::steady_clock> lastResetTime; auto now = std::chrono::steady_clock::now(); if (now - lastResetTime < std::chrono::seconds(10)) { OutputDebugString(_T("[INFO] 处于复位冷却期,暂不连接")); return -1; } // 指数退避重试(最大等待5秒) int retryDelay = 500; // 初始500ms for (int i = 0; i < 15; i++) { // 增加重试次数 m_nRobotCid = TestOpenComm(0, 0); if (m_nRobotCid >= 0) { OutputDebugString(_T("[INFO] 连接成功")); return m_nRobotCid; } Sleep(retryDelay); retryDelay = min(retryDelay * 2, 5000); // 上限5秒 } return -1; } void CappDlg::SafeCloseComm() { std::lock_guard<std::mutex> lock(m_commMutex); if (m_nRobotCid >= 0) { TestCloseComm(m_nRobotCid); m_nRobotCid = -1; OutputDebugString(_T("[INFO] 机器人连接已关闭")); } } double CappDlg::CalculateAngleError(double current, double target) { double error = current - target; // 将误差标准化到[-180°, 180°] while (error > 180.0) { error -= 360.0; } while (error < -180.0) { error += 360.0; } return fabs(error); }以上是我的MFC程序 import numpy as np import pandas as pd import gymnasium as gym from gymnasium import spaces import torch import torch.nn as nn import torch.optim as optim import torch.nn.functional as F from torch.distributions import Normal from watchdog.observers import Observer from watchdog.events import FileSystemEventHandler import time import skfuzzy as fuzz from skfuzzy import control as ctrl import os import logging import matplotlib.pyplot as plt from collections import deque import logging from logging.handlers import RotatingFileHandler import datetime import os # 创建日志目录 log_dir = "training_logs" os.makedirs(log_dir, exist_ok=True) # 带时间戳的日志文件名 timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") log_filename = os.path.join(log_dir, f'stiffness_control_{timestamp}.log') # 基本配置 - 控制台输出 logging.basicConfig( level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', handlers=[logging.StreamHandler()] ) # 添加文件日志 - 带轮转功能 file_handler = RotatingFileHandler( log_filename, maxBytes=10*1024*1024, # 10MB backupCount=5, encoding='utf-8' ) file_handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')) logging.getLogger().addHandler(file_handler) # 设置更详细的文件日志级别 file_handler.setLevel(logging.DEBUG) # 日志记录 logging.info("日志系统初始化完成,日志将保存到: %s", log_filename) # 配置日志 #logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(message)s') # 文件监控类 class StiffnessWatcher(FileSystemEventHandler): def __init__(self, env): super().__init__() self.env = env self.updated = False def on_modified(self, event): if event.is_directory: return if event.src_path.endswith("stiffness_state.csv"): self.updated = True logging.debug("检测到状态文件更新") # 机器人刚度控制环境 class StiffnessControlEnv(gym.Env): def __init__(self): super(StiffnessControlEnv, self).__init__() self.target_dir = r"F:\lry\weizi\app" os.makedirs(self.target_dir, exist_ok=True) # 信号文件路径 self.control_done_signal = os.path.join(self.target_dir, "control_done_signal.txt") self.state_ready_signal = os.path.join(self.target_dir, "state_ready_signal.txt") self.reset_signal = os.path.join(self.target_dir, "reset_signal.txt") self.initial_params_file = os.path.join(self.target_dir, "initial_stiffness_params.txt") self.new_params_file = os.path.join(self.target_dir, "new_stiffness_params.txt") self.state_file = os.path.join(self.target_dir, "robot_state.csv") # 清除旧信号文件 self.cleanup_previous_files() self.state_log_file = os.path.join(self.target_dir, "arm_state_log.csv") self._init_state_log() # 初始化状态日志 # 目标姿态: rx=180°, ry=0°, rz=0° self.target_orientation = np.array([180.0, 0.0, 0.0], dtype=np.float32) self.max_angle_error = 20.0 # 最大允许角度误差(度) # 动作空间: 三个轴的刚度参数[K_rx, K_ry, K_rz] self.action_space = spaces.Box( low=np.array([50, 50, 50], dtype=np.float32), high=np.array([500, 500, 500], dtype=np.float32), dtype=np.float32 ) # 状态空间: 力矩(tx, ty, tz) + 姿态(rx, ry, rz) self.observation_space = spaces.Box( low=np.array([-2.0, -2.0, -2.0, -180.0, -30.0, -10.0], dtype=np.float32), high=np.array([2.0, 2.0, 2.0, 180.0, 30.0, 10.0], dtype=np.float32), dtype=np.float32 ) self.state = np.zeros(6, dtype=np.float32) self.current_step = 0 self.max_steps = 200 self.last_error = np.inf self.last_torque = np.zeros(3, dtype=np.float32) # 创建模糊奖励系统 self.fuzzy_reward_system = self.create_fuzzy_system() # 文件监控 self.watcher = StiffnessWatcher(self) self.observer = Observer() self.observer.schedule(self.watcher, path=self.target_dir, recursive=True) self.observer.start() logging.info("环境初始化完成,开始监控文件夹") def _init_state_log(self): """初始化机械臂状态日志文件""" with open(self.state_log_file, 'w') as f: f.write("timestamp,step,tx,ty,tz,rx,ry,rz,target_rx,target_ry,target_rz\n") def _log_arm_state(self, step, state, action=None): """ 记录机械臂状态到日志文件和终端 :param step: 当前步数 :param state: 状态数组 [tx,ty,tz,rx,ry,rz] :param action: 当前动作 [K_rx, K_ry, K_rz] (可选) """ timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S.%f") tx, ty, tz, rx, ry, rz = state rx_display = self.to_180_degrees(rx) # 写入CSV日志 with open(self.state_log_file, 'a') as f: f.write(f"{timestamp},{step},{tx:.4f},{ty:.4f},{tz:.4f}," f"{rx:.2f},{ry:.2f},{rz:.2f}\n") # f"{self.target_orientation[0]}," # f"{self.target_orientation[1]}," # f"{self.target_orientation[2]}\n") # 实时控制台输出 status_msg = (f"机械臂状态 [Step {step}]:\n" f" - 力矩(tx,ty,tz): {tx:.3f}, {ty:.3f}, {tz:.3f} N·m\n" f" - 姿态(rx,ry,rz): {rx_display:.2f}, {ry:.2f}, {rz:.2f} °\n") # f" - 目标姿态: {self.target_orientation[0]:.2f}, " # f"{self.target_orientation[1]:.2f}, {self.target_orientation[2]:.2f} °") if action is not None: status_msg += f"\n - 当前刚度参数: {action[0]:.1f}, {action[1]:.1f}, {action[2]:.1f} N·m/rad" logging.info(status_msg) print(f"\n{'-'*40}\n{status_msg}\n{'-'*40}") # 控制台分隔线显示 def to_360_degrees(self, angle): """将-180°~180°的角度转换为0°~360°""" angle = float(angle) if angle < -180 or angle > 180: logging.warning(f"输入角度超出范围: {angle}°,将被标准化到-180°~180°") angle = (angle + 180) % 360 - 180 return (angle + 360) % 360 def to_180_degrees(self, angle): """将0°~360°的角度转换回-180°~180°(用于输出)""" angle = float(angle) if angle < 0 or angle > 360: logging.warning(f"输入角度超出范围: {angle}°,将被标准化到0°~360°") angle = angle % 360 return (angle + 180) % 360 - 180 def calculate_angle_error(self, current, target): """计算两个角度之间的最短误差,考虑周期性(-180°和180°是同一点)""" error = current - target # 将误差标准化到[-180°, 180°] error = (error + 180) % 360 - 180 return abs(error) def cleanup_previous_files(self): """清除之前的信号文件""" for file in [self.control_done_signal, self.state_ready_signal, self.reset_signal, self.initial_params_file, self.new_params_file]: if os.path.exists(file): try: os.remove(file) except Exception as e: logging.warning(f"无法删除旧文件 {file}: {e}") def create_fuzzy_system(self): # 定义模糊变量 torque_range = np.arange(0, 2.1, 0.1) orientation_error_range = np.arange(0, 30.1, 0.1) reward_range = np.arange(0, 2.1, 0.1) # 输入变量 torque = ctrl.Antecedent(torque_range, 'torque') error = ctrl.Antecedent(orientation_error_range, 'error') # 输出变量 reward = ctrl.Consequent(reward_range, 'reward') # 定义隶属函数 torque['VS'] = fuzz.trimf(torque_range, [0, 0, 0.5]) torque['S'] = fuzz.trimf(torque_range, [0, 0.5, 1.0]) torque['M'] = fuzz.trimf(torque_range, [0.5, 1.0, 1.5]) torque['L'] = fuzz.trimf(torque_range, [1.0, 1.5, 2.0]) torque['VL'] = fuzz.trapmf(torque_range, [1.5, 2.0, 2.0, 2.0]) error['VS'] = fuzz.trimf(orientation_error_range, [0, 0, 3]) error['S'] = fuzz.trimf(orientation_error_range, [0, 3, 6]) error['M'] = fuzz.trimf(orientation_error_range, [3, 6, 10]) error['L'] = fuzz.trimf(orientation_error_range, [6, 10, 20]) error['VL'] = fuzz.trapmf(orientation_error_range, [10, 20, 30, 30]) reward['VL'] = fuzz.trimf(reward_range, [0, 0, 0.5]) reward['L'] = fuzz.trimf(reward_range, [0, 0.5, 1.0]) reward['M'] = fuzz.trimf(reward_range, [0.5, 1.0, 1.5]) reward['H'] = fuzz.trimf(reward_range, [1.0, 1.5, 2.0]) reward['VH'] = fuzz.trapmf(reward_range, [1.5, 2.0, 2.0, 2.0]) # 定义模糊规则 rules = [ ctrl.Rule(torque['VS'] & error['VS'], reward['VH']), ctrl.Rule(torque['S'] & error['VS'], reward['H']), ctrl.Rule(torque['M'] & error['VS'], reward['M']), ctrl.Rule(torque['L'] & error['VS'], reward['L']), ctrl.Rule(torque['VL'] & error['VS'], reward['VL']), ctrl.Rule(torque['VS'] & error['S'], reward['H']), ctrl.Rule(torque['S'] & error['S'], reward['H']), ctrl.Rule(torque['M'] & error['S'], reward['M']), ctrl.Rule(torque['L'] & error['S'], reward['L']), ctrl.Rule(torque['VL'] & error['S'], reward['VL']), ctrl.Rule(torque['VS'] & error['M'], reward['M']), ctrl.Rule(torque['S'] & error['M'], reward['M']), ctrl.Rule(torque['M'] & error['M'], reward['L']), ctrl.Rule(torque['L'] & error['M'], reward['L']), ctrl.Rule(torque['VL'] & error['M'], reward['VL']), ctrl.Rule(torque['VS'] & error['L'], reward['L']), ctrl.Rule(torque['S'] & error['L'], reward['L']), ctrl.Rule(torque['M'] & error['L'], reward['VL']), ctrl.Rule(torque['L'] & error['L'], reward['VL']), ctrl.Rule(torque['VL'] & error['L'], reward['VL']), ctrl.Rule(torque['VS'] & error['VL'], reward['VL']), ctrl.Rule(torque['S'] & error['VL'], reward['VL']), ctrl.Rule(torque['M'] & error['VL'], reward['VL']), ctrl.Rule(torque['L'] & error['VL'], reward['VL']), ctrl.Rule(torque['VL'] & error['VL'], reward['VL']), ] control_system = ctrl.ControlSystem(rules) return ctrl.ControlSystemSimulation(control_system) def reset(self): """重置环境并与MFC同步""" initial_params_path = os.path.join(self.target_dir, "initial_stiffness_params.txt") if not os.path.exists(initial_params_path): default_params = "300.0, 300.0, 300.0" # 默认刚度值 with open(initial_params_path, 'w') as f: f.write(default_params) logging.info(f"已自动生成初始参数文件: {default_params}") # 1. 发送重置信号 self.safe_write_file(self.reset_signal, "1") logging.info("已发送重置信号,等待MFC响应...") # 2. 等待MFC确认重置完成 if not self.wait_for_signal(self.control_done_signal, timeout=150.0): logging.error("MFC重置响应超时") raise TimeoutError("MFC reset timeout") # 3. 检查初始参数文件 if not os.path.exists(self.initial_params_file): logging.error("初始刚度参数文件不存在,请先生成初始参数") raise FileNotFoundError("initial_stiffness_params.txt not found") # 4. 读取初始参数(示例值,实际应从文件读取) with open(self.initial_params_file, 'r') as f: params = f.read().strip() logging.info(f"已加载初始刚度参数: {params}") #os.remove(self.initial_params_file) # 5. 读取初始状态 state = self.read_state(self.state_file) logging.info(f"初始状态: {state}") self._log_arm_state(0, state) # 初始状态记为step 0 return state, {} def safe_write_file(self, path, content): """安全写入文件,带有重试机制""" max_retries = 3 for attempt in range(max_retries): try: temp_path = path + ".tmp" with open(temp_path, "w") as f: f.write(content) if os.name == 'nt': if os.path.exists(path): os.remove(path) os.rename(temp_path, path) else: os.replace(temp_path, path) return True except Exception as e: logging.warning(f"写入文件失败 (尝试 {attempt+1}/{max_retries}): {e}") time.sleep(0.1) logging.error(f"写入文件失败: {path}") return False def read_state(self, state_path): """安全读取状态文件""" max_retries = 3 for attempt in range(max_retries): try: if not os.path.exists(state_path): raise FileNotFoundError(f"状态文件不存在: {state_path}") with open(state_path, 'r') as f: content = f.read().strip() if not content: raise ValueError("状态文件为空") data = [float(x) for x in content.split(',')] data[3] = self.to_360_degrees(data[3]) # 将-180°~180°转换为0°~360° if len(data) != 6: raise ValueError(f"无效数据长度: {len(data)}") return np.array(data, dtype=np.float32) except Exception as e: if attempt == max_retries - 1: logging.error(f"读取状态文件失败: {e}") return np.zeros(6, dtype=np.float32) time.sleep(0.5) def step(self, action): """执行一步动作""" # 1. 发送刚度参数 params_str = f"{action[0]:.2f},{action[1]:.2f},{action[2]:.2f}" self.safe_write_file(self.new_params_file, params_str) logging.info(f"已发送刚度参数: {params_str}") # 2. 等待MFC确认参数接收 if not self.wait_for_signal(self.control_done_signal, timeout=55.0): logging.warning("MFC参数接收确认超时") return self.state, -10, True, False, {} # 3. 等待状态更新 if not self.wait_for_signal(self.state_ready_signal, timeout=15.0): logging.warning("状态更新超时") return self.state, -10, True, False, {} # 4. 读取新状态 new_state = self.read_state(self.state_file) # 5. 计算角度误差(考虑rx的特殊性) rx, ry, rz = new_state[3:6] rx_error = self.calculate_angle_error(rx, self.target_orientation[0]) ry_error = self.calculate_angle_error(ry, self.target_orientation[1]) rz_error = self.calculate_angle_error(rz, self.target_orientation[2]) angle_errors = np.array([rx_error, ry_error, rz_error]) # 6. 检查角度误差是否超限 error_exceeded = np.any(angle_errors > self.max_angle_error) # 7. 计算奖励 reward = self.calculate_reward(new_state) # 8. 确定完成标志(结合原有逻辑和角度误差) done = error_exceeded or np.all(angle_errors < 0.5) or (self.current_step >= self.max_steps) # 9. 如果误差超限,执行复位 if error_exceeded: logging.warning(f"角度误差超限: {angle_errors} > {self.max_angle_error},触发复位") self._reset_robot() reward -= 100 # 惩罚误差超限 # 10. 更新状态 self.state = new_state self.current_step += 1 self._log_arm_state(self.current_step, new_state, action) # 11. 返回结果 return new_state, reward, done, False, { "error": angle_errors, "torque": new_state[:3], "orientation": new_state[3:6] } def _reset_robot(self): """执行机器人复位流程""" # 发送重置信号 self.safe_write_file("reset_signal.txt", "1") # 等待MFC响应 if not self.wait_for_signal("control_done_signal.txt", timeout=150): logging.error("等待MFC重置超时") # 重置内部计数器 self.current_step = 0 def wait_for_signal(self, signal_path, timeout=10.0): """等待信号文件出现并删除它""" start_time = time.time() while time.time() - start_time < timeout: if os.path.exists(signal_path): try: os.remove(signal_path) return True except Exception as e: logging.warning(f"删除信号文件失败: {e}") time.sleep(0.1) time.sleep(0.2) # 避免CPU占用过高 return False def wait_for_update(self): """仅通过修改时间(mtime)检测文件更新,并打印文件内容到日志""" timeout = 10.0 # 超时时间10秒 start_time = time.time() state_path = os.path.join(self.target_dir, "stiffness_state.csv") # 记录初始修改时间,处理文件不存在的情况 initial_mtime = 0 if os.path.exists(state_path): initial_mtime = os.path.getmtime(state_path) logging.debug(f"初始文件修改时间: {initial_mtime}") while time.time() - start_time < timeout: if os.path.exists(state_path): current_mtime = os.path.getmtime(state_path) # 检查修改时间是否变化(添加微小阈值避免浮点数误差) if current_mtime > initial_mtime + 1e-6: # 读取文件内容并打印到日志 try: with open(state_path, 'r') as f: content = f.read().strip() logging.info(f"状态文件已更新,内容: {content}") # 关键新增:打印文件内容 except Exception as e: logging.warning(f"读取更新后的文件内容失败: {e}") logging.debug(f"文件修改时间更新: 旧={initial_mtime} → 新={current_mtime}") time.sleep(0.1) # 等待文件系统刷新 return True time.sleep(0.2) # 轮询间隔200ms logging.warning(f"等待状态更新超时(超时时间:{timeout}秒)") return False # def wait_for_update(self): # """仅通过修改时间(mtime)检测文件更新,提高检测灵敏度""" # timeout = 10.0 # 延长超时时间至10秒,给机械臂更多响应时间 # start_time = time.time() # state_path = os.path.join(self.target_dir, "stiffness_state.csv") # # 记录初始修改时间,处理文件不存在的情况 # initial_mtime = 0 # if os.path.exists(state_path): # initial_mtime = os.path.getmtime(state_path) # logging.debug(f"初始文件修改时间: {initial_mtime}") # while time.time() - start_time < timeout: # if os.path.exists(state_path): # current_mtime = os.path.getmtime(state_path) # # 关键修改:仅检查修改时间是否变化,且设置一个小的阈值避免浮点数误差 # if current_mtime > initial_mtime + 1e-6: # 添加微小阈值,避免精度误差导致的误判 # logging.debug(f"文件已更新,新修改时间: {current_mtime}") # time.sleep(0.1) # 等待文件系统完全刷新 # return True # time.sleep(0.2) # 轮询间隔调整为200ms,平衡性能和响应速度 # logging.warning(f"等待状态更新超时(超时时间:{timeout}秒)") # return False def calculate_reward(self, state): """计算奖励,综合考虑力矩、姿态误差和误差变化""" torque = state[0:3] orientation = state[3:6] # 计算当前误差(考虑rx的周期性) rx_error = self.calculate_angle_error(orientation[0], self.target_orientation[0]) error = np.array([ rx_error, abs(orientation[1] - self.target_orientation[1]), abs(orientation[2] - self.target_orientation[2]) ]) current_error = np.linalg.norm(error) # 模糊奖励 self.fuzzy_reward_system.input['torque'] = np.mean(np.abs(torque)) self.fuzzy_reward_system.input['error'] = current_error self.fuzzy_reward_system.compute() base_reward = self.fuzzy_reward_system.output['reward'] # 初始化改进奖励 improvement_reward = 0.0 # 只有在不是第一次计算时才计算改进奖励 if hasattr(self, 'last_error') and not np.isinf(self.last_error): error_diff = self.last_error - current_error if error_diff > 0: # 误差减小 improvement_reward = 2.0 * error_diff else: # 误差增大 improvement_reward = 0.5 * error_diff # 注意这里已经是负值 # 更新last_error和last_torque self.last_error = current_error self.last_torque = torque.copy() # 其他奖励组件保持不变... goal_reward = 5.0 if np.all(error < 5.0) else 0.0 error_penalty = -0.2 * current_error if current_error > 15 else 0 step_penalty = -0.01 stability_reward = 0.2 if np.linalg.norm(torque - self.last_torque) < 0.1 else 0 # 总奖励 total_reward = base_reward + improvement_reward + goal_reward + error_penalty + step_penalty + stability_reward logging.info(f"奖励组成: 基础={base_reward:.4f}, 改进={improvement_reward:.4f}, " f"目标={goal_reward:.4f}, 误差惩罚={error_penalty:.4f}, " f"步数={step_penalty:.4f}, 稳定={stability_reward:.4f}") return total_reward def close(self): self.observer.stop() self.observer.join() logging.info("环境已关闭") # 策略网络 class PolicyNetwork(nn.Module): def __init__(self, state_dim, action_dim, hidden_dim=128): super(PolicyNetwork, self).__init__() self.fc1 = nn.Linear(state_dim, hidden_dim) self.fc2 = nn.Linear(hidden_dim, hidden_dim) # 均值和标准差层 self.mean_layer = nn.Linear(hidden_dim, action_dim) self.log_std_layer = nn.Linear(hidden_dim, action_dim) # 初始化权重 self.apply(self._init_weights) def _init_weights(self, module): if isinstance(module, nn.Linear): nn.init.kaiming_normal_(module.weight, a=0.01) if module.bias is not None: module.bias.data.zero_() def forward(self, x): x = F.leaky_relu(self.fc1(x), 0.01) x = F.leaky_relu(self.fc2(x), 0.01) # 计算均值和标准差 mean = self.mean_layer(x) log_std = self.log_std_layer(x) # 限制标准差范围,避免探索性过大或过小 log_std = torch.clamp(log_std, -20, 2) std = log_std.exp() return mean, std def get_action(self, state): state = torch.FloatTensor(state).unsqueeze(0) mean, std = self.forward(state) # 创建正态分布 dist = Normal(mean, std) action = dist.sample() # 确保动作在有效范围内 action = torch.clamp(action, torch.FloatTensor([50, 50, 50]), torch.FloatTensor([500, 500, 500])) # 计算对数概率 log_prob = dist.log_prob(action).sum(-1) return action.detach().numpy()[0], log_prob.detach().numpy()[0] def evaluate(self, state, action): mean, std = self.forward(state) dist = Normal(mean, std) # 计算对数概率和熵 action_logprobs = dist.log_prob(action).sum(-1) dist_entropy = dist.entropy().sum(-1) return action_logprobs, dist_entropy # 价值网络 class ValueNetwork(nn.Module): def __init__(self, state_dim, hidden_dim=128): super(ValueNetwork, self).__init__() self.fc1 = nn.Linear(state_dim, hidden_dim) self.fc2 = nn.Linear(hidden_dim, hidden_dim) self.value_layer = nn.Linear(hidden_dim, 1) # 初始化权重 self.apply(self._init_weights) def _init_weights(self, module): if isinstance(module, nn.Linear): nn.init.kaiming_normal_(module.weight, a=0.01) if module.bias is not None: module.bias.data.zero_() def forward(self, x): x = F.leaky_relu(self.fc1(x), 0.01) x = F.leaky_relu(self.fc2(x), 0.01) value = self.value_layer(x) return value # PPO代理 class PPO: def __init__(self, state_dim, action_dim, lr=3e-4, gamma=0.99, eps_clip=0.2, value_coef=0.5, entropy_coef=0.01, hidden_dim=128): self.gamma = gamma self.eps_clip = eps_clip self.value_coef = value_coef self.entropy_coef = entropy_coef # 策略网络和价值网络 self.policy = PolicyNetwork(state_dim, action_dim, hidden_dim) self.value = ValueNetwork(state_dim, hidden_dim) # 优化器 self.policy_optimizer = optim.Adam(self.policy.parameters(), lr=lr) self.value_optimizer = optim.Adam(self.value.parameters(), lr=lr) # 存储轨迹 self.states = [] self.actions = [] self.logprobs = [] self.rewards = [] self.is_terminals = [] def select_action(self, state): action, log_prob = self.policy.get_action(state) return action, log_prob def store_transition(self, state, action, log_prob, reward, done): self.states.append(state) self.actions.append(action) self.logprobs.append(log_prob) self.rewards.append(reward) self.is_terminals.append(done) def update(self, epochs=10): # 转换为张量 states = torch.FloatTensor(self.states) actions = torch.FloatTensor(self.actions) old_logprobs = torch.FloatTensor(self.logprobs) # 计算回报 returns = [] discounted_return = 0 for reward, is_terminal in zip(reversed(self.rewards), reversed(self.is_terminals)): if is_terminal: discounted_return = 0 discounted_return = reward + (self.gamma * discounted_return) returns.insert(0, discounted_return) returns = torch.FloatTensor(returns) # 标准化回报 returns = (returns - returns.mean()) / (returns.std() + 1e-8) # 优化策略和价值网络 for _ in range(epochs): # 获取当前策略的对数概率和熵 logprobs, entropy = self.policy.evaluate(states, actions) # 计算策略比率 ratio = torch.exp(logprobs - old_logprobs.detach()) # 计算优势函数 values = self.value(states).squeeze() advantage = returns - values.detach() # 标准化优势 advantage = (advantage - advantage.mean()) / (advantage.std() + 1e-8) # 计算策略损失 surr1 = ratio * advantage surr2 = torch.clamp(ratio, 1-self.eps_clip, 1+self.eps_clip) * advantage policy_loss = -torch.min(surr1, surr2).mean() # 计算价值损失 value_loss = F.mse_loss(values, returns) # 计算总损失 loss = policy_loss + self.value_coef * value_loss - self.entropy_coef * entropy.mean() # 更新策略网络 self.policy_optimizer.zero_grad() loss.backward() nn.utils.clip_grad_norm_(self.policy.parameters(), 0.5) # 梯度裁剪 self.policy_optimizer.step() # 更新价值网络 self.value_optimizer.zero_grad() value_loss.backward() nn.utils.clip_grad_norm_(self.value.parameters(), 0.5) # 梯度裁剪 self.value_optimizer.step() # 清空轨迹 self.states.clear() self.actions.clear() self.logprobs.clear() self.rewards.clear() self.is_terminals.clear() # 训练函数 def train_ppo(env, total_episodes=1000, max_steps_per_episode=200, update_timestep=2000, save_interval=100, log_interval=10, render_interval=50): state_dim = env.observation_space.shape[0] action_dim = env.action_space.shape[0] # 创建PPO代理 agent = PPO(state_dim, action_dim, lr=3e-4, gamma=0.99, eps_clip=0.2, value_coef=0.5, entropy_coef=0.01, hidden_dim=128) # 记录训练过程 episode_rewards = [] avg_rewards = [] best_reward = -np.inf # 训练循环 global_timestep = 0 for episode in range(total_episodes): state, _ = env.reset() episode_reward = 0 for step in range(max_steps_per_episode): global_timestep += 1 # 选择动作 action, log_prob = agent.select_action(state) # 执行动作 next_state, reward, terminated, truncated, info = env.step(action) done = terminated or truncated # 存储转换 agent.store_transition(state, action, log_prob, reward, done) # 更新状态 state = next_state episode_reward += reward # 检查是否需要更新策略 if global_timestep % update_timestep == 0: agent.update() logging.info(f"更新策略: 全局步数={global_timestep}") # 检查是否完成 if done: break # 记录奖励 episode_rewards.append(episode_reward) avg_reward = np.mean(episode_rewards[-10:]) avg_rewards.append(avg_reward) # 保存最佳模型 if avg_reward > best_reward: best_reward = avg_reward torch.save(agent.policy.state_dict(), 'best_stiffness_policy.pth') logging.info(f"保存最佳模型: 平均奖励={avg_reward:.2f}") # 定期保存模型 if (episode + 1) % save_interval == 0: torch.save(agent.policy.state_dict(), f'stiffness_policy_episode_{episode+1}.pth') # 打印训练信息 if (episode + 1) % log_interval == 0: logging.info(f"Episode {episode+1}/{total_episodes} - " f"奖励: {episode_reward:.2f}, 平均奖励: {avg_reward:.2f}, " f"步数: {step+1}, 误差: {info['error'].round(2)}") # 绘制学习曲线 if (episode + 1) % 50 == 0: plt.figure(figsize=(10, 5)) plt.plot(episode_rewards, alpha=0.5, label='单轮奖励') plt.plot(avg_rewards, label='10轮平均奖励') plt.xlabel('轮次') plt.ylabel('奖励') plt.title('PPO训练进度') plt.legend() plt.grid(True) plt.savefig('training_curve.png') plt.close() # 保存最终模型 torch.save(agent.policy.state_dict(), 'final_stiffness_policy.pth') logging.info("训练完成,模型已保存") return agent, episode_rewards, avg_rewards # 测试训练好的策略 def test_policy(env, policy_path='best_stiffness_policy.pth', episodes=5): state_dim = env.observation_space.shape[0] action_dim = env.action_space.shape[0] # 加载策略网络 policy = PolicyNetwork(state_dim, action_dim) policy.load_state_dict(torch.load(policy_path)) policy.eval() for episode in range(episodes): state, _ = env.reset() done = False episode_reward = 0 steps = 0 logging.info(f"\n===== 测试第 {episode+1}/{episodes} 轮 =====") while not done: # 选择确定性动作(均值) with torch.no_grad(): state_tensor = torch.FloatTensor(state).unsqueeze(0) mean, _ = policy(state_tensor) action = mean.squeeze().numpy() # 执行动作 next_state, reward, terminated, truncated, info = env.step(action) done = terminated or truncated episode_reward += reward steps += 1 # 打印当前状态 logging.info(f"步骤 {steps}: 动作={action.round(2)}, 奖励={reward:.4f}, " f"误差={info['error'].round(2)}, 力矩={info['torque'].round(3)}") state = next_state logging.info(f"📊 第 {episode+1} 轮完成: 总奖励={episode_reward:.4f}, 总步数={steps}") # 主函数 if __name__ == "__main__": try: logging.info("===== 开始机器人刚度控制强化学习训练 =====") # 创建环境 env = StiffnessControlEnv() # 训练模型 agent, rewards, avg_rewards = train_ppo( env, total_episodes=500, max_steps_per_episode=200, update_timestep=2000, save_interval=50, log_interval=10 ) # 测试模型 test_policy(env) except KeyboardInterrupt: logging.info("训练被用户中断") finally: # 确保环境被正确关闭 if 'env' in locals() and env is not None: env.close() logging.info("程序已退出") 这是我的python程序 修改我的程序,我要求,先启动python程序但并不开始强化学习,直到MFC程序启动变刚度导纳控制后给python力矩和姿态信息,python收到后开始强化学习,把计算的刚度参数传递给MFC使机械臂开始调整姿态,当机械臂完成一次调整后把力矩和姿态信息再传给python计算新的刚度参数,当机械臂到达目标姿态时,MFC程序把力矩和姿态信息传给python后,证明一个回合结束,该开始下一回合,MFC程序使机械臂位置重置,重置完成后给python力矩和姿态信息开始新的回合,直至训练结束,要求尽量在我的程序基础上不修改太多,保证程序运行时尽量不出现线程阻塞或者无法连接机械臂的情况,修改保持原有程序的主要结构,只调整交互流程和控制逻辑。MFC程序负责主动控制机械臂和重置,Python程序专注于强化学习计算。
07-16
<think>好的,我现在需要帮用户解决安装gym==0.21.0时遇到的metadata错误。用户之前已经遇到过类似的问题,但这次的错误信息有些不同。这次的问题提示metadata无效,特别是提到了opencv-python的版本说明符有问题,并且建议使用pip<24.1的版本。 首先,我要理解错误信息。用户运行pip install gym==0.21.0时,pip下载了gym的源码包,但在准备metadata时发现无效的元数据。具体来说,是opencv-python>=3.这个版本说明符不完整,缺少了次要版本号,导致解析失败。这可能是gym 0.21.0的setup.py中依赖声明有误,或者打包时的metadata有问题。 接下来,我需要回顾用户之前的提问和解决方案。之前的问题是关于setuptools版本不兼容,导致tests_require和extras_require的错误。用户已经尝试过降级setuptools、修改setup.py等方法。但这次的问题不同,所以需要新的解决方法。 现在用户的错误信息提到“Expected end or semicolon (after version specifier)”,指向的是opencv-python>=3.。这说明在gym的依赖声明中,opencv-python的版本被错误地写成了>=3.,而正确的应该是类似>=3.4这样的完整版本号。这可能是一个拼写错误或打包时的疏忽。 然后,用户还看到建议使用pip<24.1的版本。这可能是因为新版pip对metadata的验证更严格,旧版pip可能忽略这种错误。所以一个解决方案是降级pip到24.1之前的版本,比如pip 24.0,然后重试安装。 但用户可能不想降级pip,或者希望更彻底地解决问题。另一个方法是手动修复gym的依赖声明。比如,下载源码包,修改setup.py或其它相关文件中的错误依赖,然后从本地安装。 另外,用户可能已经尝试过之前的方法,比如降级setuptools,但这次的问题不同,所以需要不同的处理步骤。需要确保用户明确这次的问题根源是metadata中的版本说明符错误,而不是之前的setuptools兼容性问题。 可能的解决方案步骤包括: 1. 降级pip到24.0版本,绕过新版pip的严格检查。 2. 手动修复gym的metadata中的依赖错误,重新打包安装。 3. 使用较新的gym版本,避免旧版本的问题。 4. 检查是否有其他依赖问题,比如opencv-python的安装情况。 需要详细解释每个步骤的具体操作,并给出明确的命令。同时,要提醒用户如果使用旧版pip可能存在安全风险,但暂时安装旧版本只是为了完成当前任务,之后可以升级回来。 另外,用户可能更倾向于使用推荐的方法,比如升级到更新的gym版本,但需要确认他们的代码是否兼容新版本。如果不兼容,可能需要其他方法。 最后,验证安装是否成功,并提供测试代码,确保用户确认问题解决。</think>--- ### **问题根源分析** 此次报错的核心原因是 **`gym==0.21.0` 的元数据(metadata)格式不符合新版 `pip≥24.1` 的解析要求**。具体表现为: 1. **依赖项格式错误**:`opencv-python>=3.` 缺少次要版本号(如 `>=3.4`),导致元数据解析失败。 2. **新版 `pip` 的严格校验**:从 `pip 24.1` 开始,对包的元数据格式检查更加严格,旧版 `gym` 的元数据无法通过验证。 --- ### **逐步解决方案** #### **方法1:降级 pip 到 24.1 之前的版本** 新版 `pip≥24.1` 会强制校验元数据格式,而旧版 `pip` 会忽略此问题: ```bash pip install pip==24.0 # 降级 pip pip install gym==0.21.0 # 重新安装 ``` #### **方法2:手动修复 gym 的元数据** 若降级 `pip` 无效,需手动修复 `gym` 的依赖声明: 1. **下载并解压源码包**: ```bash pip download gym==0.21.0 --no-deps # 仅下载源码包 tar -xzf gym-0.21.0.tar.gz cd gym-0.21.0 ``` 2. **修复依赖声明**: - 打开 `setup.py` 文件,找到 `install_requires` 或 `extras_require` 字段。 - 将 `opencv-python>=3.` 修改为合法的版本号(如 `opencv-python>=3.4`)。 3. **重新打包并安装**: ```bash python setup.py sdist # 生成修复后的源码包 pip install dist/gym-0.21.0.tar.gz # 从本地安装 ``` #### **方法3(推荐):升级到兼容的 gym 版本** 直接安装新版 `gym`(如 `0.26.2`),已修复依赖格式问题: ```bash pip install gym==0.26.2 # 或更高版本 ``` --- ### **验证安装** 安装成功后运行以下代码测试: ```python import gym env = gym.make("CartPole-v1") observation = env.reset() print("Gym 安装成功!Observation shape:", observation.shape) ``` --- ### **注意事项** 1. **降级 pip 的风险**:`pip<24.1` 可能存在安全漏洞,建议仅在安装旧包时临时使用,完成后可升级回新版: ```bash pip install --upgrade pip ``` 2. **依赖冲突**:若安装后出现 `opencv-python` 缺失,需手动安装: ```bash pip install opencv-python>=3.4 # 确保版本兼容 ``` 如果问题仍未解决,请提供完整的错误日志以进一步分析。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值