Spaces:
Runtime error
Runtime error
| # Motion VQ-Trans | |
| Pytorch implementation of paper "Generating Human Motion from Textual Descriptions with High Quality Discrete Representation" | |
| [[Notebook Demo]](https://colab.research.google.com/drive/1tAHlmcpKcjg_zZrqKku7AfpqdVAIFrF8?usp=sharing) | |
|  | |
| If our project is helpful for your research, please consider citing : (todo) | |
| ``` | |
| @inproceedings{shen2020ransac, | |
| title={RANSAC-Flow: generic two-stage image alignment}, | |
| author={Shen, Xi and Darmon, Fran{\c{c}}ois and Efros, Alexei A and Aubry, Mathieu}, | |
| booktitle={16th European Conference on Computer Vision} | |
| year={2020} | |
| } | |
| ``` | |
| ## Table of Content | |
| * [1. Visual Results](#1-visual-results) | |
| * [2. Installation](#2-installation) | |
| * [3. Quick Start](#3-quick-start) | |
| * [4. Train](#4-train) | |
| * [5. Evaluation](#5-evaluation) | |
| * [6. Motion Render](#6-motion-render) | |
| * [7. Acknowledgement](#7-acknowledgement) | |
| * [8. ChangLog](#8-changlog) | |
| ## 1. Visual Results (More results can be found in our project page (todo)) | |
|  | |
| ## 2. Installation | |
| ### 2.1. Environment | |
| <!-- Our model can be learnt in a **single GPU GeForce GTX 1080Ti** (12G). | |
| Install Pytorch adapted to your CUDA version : | |
| * [Pytorch 1.2.0](https://pytorch.org/get-started/previous-versions/#linux-and-windows-1) | |
| * [Torchvision 0.4.0](https://pytorch.org/get-started/previous-versions/#linux-and-windows-1) | |
| Other dependencies (tqdm, visdom, pandas, kornia, opencv-python) : | |
| ``` Bash | |
| bash requirement.sh | |
| ``` --> | |
| Our model can be learnt in a **single GPU V100-32G** | |
| ```bash | |
| conda env create -f environment.yml | |
| conda activate VQTrans | |
| ``` | |
| The code was tested on Python 3.8 and PyTorch 1.8.1. | |
| ### 2.2. Dependencies | |
| ```bash | |
| bash dataset/prepare/download_glove.sh | |
| ``` | |
| ### 2.3. Datasets | |
| We are using two 3D human motion-language dataset: HumanML3D and KIT-ML. For both datasets, you could find the details as well as download link [[here]](https://github.com/EricGuo5513/HumanML3D). | |
| Take HumanML3D for an example, the file directory should look like this: | |
| ``` | |
| ./dataset/HumanML3D/ | |
| βββ new_joint_vecs/ | |
| βββ texts/ | |
| βββ Mean.npy # same as in [HumanML3D](https://github.com/EricGuo5513/HumanML3D) | |
| βββ Std.npy # same as in [HumanML3D](https://github.com/EricGuo5513/HumanML3D) | |
| βββ train.txt | |
| βββ val.txt | |
| βββ test.txt | |
| βββ train_val.txt | |
| βββall.txt | |
| ``` | |
| ### 2.4. Motion & text feature extractors: | |
| We use the same extractors provided by [t2m](https://github.com/EricGuo5513/text-to-motion) to evaluate our generated motions. Please download the extractors. | |
| ```bash | |
| bash dataset/prepare/download_extractor.sh | |
| ``` | |
| ### 2.5. Pre-trained models | |
| The pretrained model files will be stored in the 'pretrained' folder: | |
| ```bash | |
| bash dataset/prepare/download_model.sh | |
| ``` | |
| <!-- Quick download : | |
| ``` Bash | |
| cd model/pretrained | |
| bash download_model.sh | |
| ``` | |
| For more details of the pre-trained models, see [here](https://github.com/XiSHEN0220/RANSAC-Flow/blob/master/model/pretrained) --> | |
| ### 2.6. Render motion (optional) | |
| If you want to render the generated motion, you need to install: | |
| ```bash | |
| sudo sh dataset/prepare/download_smpl.sh | |
| conda install -c menpo osmesa | |
| conda install h5py | |
| conda install -c conda-forge shapely pyrender trimesh mapbox_earcut | |
| ``` | |
| ## 3. Quick Start | |
| A quick start guide of how to use our code is available in [demo.ipynb](https://colab.research.google.com/drive/1tAHlmcpKcjg_zZrqKku7AfpqdVAIFrF8?usp=sharing) | |
| <p align="center"> | |
| <img src="img/demo.png" width="400px" alt="demo"> | |
| </p> | |
| ## 4. Train | |
| Note that, for kit dataset, just need to set '--dataname kit'. | |
| ### 4.1. VQ-VAE | |
| The results are saved in the folder output_vqfinal. | |
| <details> | |
| <summary> | |
| VQ training | |
| </summary> | |
| ```bash | |
| python3 train_vq.py \ | |
| --batch-size 256 \ | |
| --lr 2e-4 \ | |
| --total-iter 300000 \ | |
| --lr-scheduler 200000 \ | |
| --nb-code 512 \ | |
| --down-t 2 \ | |
| --depth 3 \ | |
| --dilation-growth-rate 3 \ | |
| --out-dir output \ | |
| --dataname t2m \ | |
| --vq-act relu \ | |
| --quantizer ema_reset \ | |
| --loss-vel 0.5 \ | |
| --recons-loss l1_smooth \ | |
| --exp-name VQVAE | |
| ``` | |
| </details> | |
| ### 4.2. Motion-Transformer | |
| The results are saved in the folder output_transformer. | |
| <details> | |
| <summary> | |
| MoTrans training | |
| </summary> | |
| ```bash | |
| python3 train_t2m_trans.py \ | |
| --exp-name VQTransformer \ | |
| --batch-size 128 \ | |
| --num-layers 9 \ | |
| --embed-dim-gpt 1024 \ | |
| --nb-code 512 \ | |
| --n-head-gpt 16 \ | |
| --block-size 51 \ | |
| --ff-rate 4 \ | |
| --drop-out-rate 0.1 \ | |
| --resume-pth output/VQVAE/net_last.pth \ | |
| --vq-name VQVAE \ | |
| --out-dir output \ | |
| --total-iter 300000 \ | |
| --lr-scheduler 150000 \ | |
| --lr 0.0001 \ | |
| --dataname t2m \ | |
| --down-t 2 \ | |
| --depth 3 \ | |
| --quantizer ema_reset \ | |
| --eval-iter 10000 \ | |
| --pkeep 0.5 \ | |
| --dilation-growth-rate 3 \ | |
| --vq-act relu | |
| ``` | |
| </details> | |
| ## 5. Evaluation | |
| ### 5.1. VQ-VAE | |
| <details> | |
| <summary> | |
| VQ eval | |
| </summary> | |
| ```bash | |
| python3 VQ_eval.py \ | |
| --batch-size 256 \ | |
| --lr 2e-4 \ | |
| --total-iter 300000 \ | |
| --lr-scheduler 200000 \ | |
| --nb-code 512 \ | |
| --down-t 2 \ | |
| --depth 3 \ | |
| --dilation-growth-rate 3 \ | |
| --out-dir output \ | |
| --dataname t2m \ | |
| --vq-act relu \ | |
| --quantizer ema_reset \ | |
| --loss-vel 0.5 \ | |
| --recons-loss l1_smooth \ | |
| --exp-name TEST_VQVAE \ | |
| --resume-pth output/VQVAE/net_last.pth | |
| ``` | |
| </details> | |
| ### 5.2. Motion-Transformer | |
| <details> | |
| <summary> | |
| MoTrans eval | |
| </summary> | |
| ```bash | |
| python3 GPT_eval_multi.py \ | |
| --exp-name TEST_VQTransformer \ | |
| --batch-size 128 \ | |
| --num-layers 9 \ | |
| --embed-dim-gpt 1024 \ | |
| --nb-code 512 \ | |
| --n-head-gpt 16 \ | |
| --block-size 51 \ | |
| --ff-rate 4 \ | |
| --drop-out-rate 0.1 \ | |
| --resume-pth output/VQVAE/net_last.pth \ | |
| --vq-name VQVAE \ | |
| --out-dir output \ | |
| --total-iter 300000 \ | |
| --lr-scheduler 150000 \ | |
| --lr 0.0001 \ | |
| --dataname t2m \ | |
| --down-t 2 \ | |
| --depth 3 \ | |
| --quantizer ema_reset \ | |
| --eval-iter 10000 \ | |
| --pkeep 0.5 \ | |
| --dilation-growth-rate 3 \ | |
| --vq-act relu \ | |
| --resume-gpt output/VQTransformer/net_best_fid.pth | |
| ``` | |
| </details> | |
| ## 6. Motion Render | |
| <details> | |
| <summary> | |
| Motion Render | |
| </summary> | |
| You should input the npy folder address and the motion names. Here is an example: | |
| ```bash | |
| python3 render_final.py --filedir output/TEST_VQTransformer/ --motion-list 000019 005485 | |
| ``` | |
| </details> | |
| ### 7. Acknowledgement | |
| We appreciate helps from : | |
| * Public code like [text-to-motion](https://github.com/EricGuo5513/text-to-motion), [TM2T](https://github.com/EricGuo5513/TM2T) etc. | |
| ### 8. ChangLog | |
| <!-- # VQGPT | |
| ``` | |
| # VQ during training OT | |
| /apdcephfs_cq2/share_1290939/jirozhang/anaconda3/envs/motionclip/bin/python3 train_251_cnn_all.py \ | |
| --batch-size 128 \ | |
| --exp-name xxxxxx \ | |
| --lr 2e-4 \ | |
| --total-iter 300000 \ | |
| --lr-scheduler 200000 \ | |
| --nb-code 512 \ | |
| --down-t 2 \ | |
| --depth 5 \ | |
| --out-dir /apdcephfs_cq2/share_1290939/jirozhang/VQCNN_HUMAN/ \ | |
| --dataname t2m \ | |
| --vq-act relu \ | |
| --quantizer ot \ | |
| --ot-temperature 1 \ | |
| --ot-eps 0.5 \ | |
| --commit 0.001 \ | |
| ``` | |
| ``` | |
| # VQ251 training baseline | |
| /apdcephfs_cq2/share_1290939/jirozhang/anaconda3/envs/motionclip/bin/python3 train_251_cnn_all.py \ | |
| --batch-size 128 \ | |
| --exp-name VQ263_300K_512cb_down4_t2m_ema_relu_test \ | |
| --lr 2e-4 \ | |
| --total-iter 300000 \ | |
| --lr-scheduler 200000 \ | |
| --nb-code 512 \ | |
| --down-t 2 \ | |
| --depth 5 \ | |
| --out-dir /apdcephfs_cq2/share_1290939/jirozhang/VQCNN_HUMAN/ \ | |
| --dataname t2m \ | |
| --vq-act relu \ | |
| --quantizer ema \ | |
| ``` | |
| ```bash | |
| # gpt training + noise | |
| /apdcephfs_cq2/share_1290939/jirozhang/anaconda3/envs/motionclip/bin/python3 train_gpt_cnn_noise.py \ | |
| --exp-name GPT_VQ_300K_512cb_down4_t2m_ema_relu_bs128_ws64_fid_mask1_08 \ | |
| --batch-size 128 \ | |
| --num-layers 4 \ | |
| --block-size 51 \ | |
| --n-head-gpt 8 \ | |
| --ff-rate 4 \ | |
| --drop-out-rate 0.1 \ | |
| --resume-pth output_vqhuman/VQ_300K_512cb_down4_t2m_ema_relu_bs128_ws64/net_best_fid.pth \ | |
| --vq-name VQ_300K_512cb_down4_t2m_ema_relu_bs128_ws64_fid_mask1_08 \ | |
| --total-iter 300000 \ | |
| --lr-scheduler 150000 \ | |
| --lr 0.0001 \ | |
| --if-auxloss \ | |
| --dataname t2m \ | |
| --down-t 2 \ | |
| --depth 5 \ | |
| --quantizer ema \ | |
| --eval-iter 5000 \ | |
| --pkeep 0.8 | |
| ``` | |
| ### Visualize VQ (Arch Taming) in HTML | |
| * Generate motion. This will save generated motions in `./visual_results/vel05_taming_l1s` | |
| ``` | |
| python vis.py --dataname t2m --resume-pth /apdcephfs_cq2/share_1290939/jirozhang/VQ_t2m_bailando_relu_NoNorm_dilate3_vel05_taming_l1s/net_last.pth --visual-name vel05_taming_l1s --vis-gt --nb-vis 20 | |
| ``` | |
| * Make a Webpage. Go to visual_html.py, modify the name, then run : | |
| ``` | |
| python visual_html.py | |
| ``` --> | |