References
[1] Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English,
Vikram Voleti, Adam Letts, et al. Stable Video Diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint
arXiv:2311.15127, 2023.
[2] Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, BoDai, Hongsheng Li, and Ceyuan Yang.
Cameractrl: Enabling camera control for text-to-video generation. arXiv preprint
arXiv:2404.02101, 2024.
[3] Yash Jain, Anshul Nasery, Vibhav Vineet, and Harki-rat Behl. PEEKABOO: Interactive video generation via masked-diffusion. In CVPR, 2024.
[4] Haonan Qiu, Zhaoxi Chen, Zhouxia Wang, Yingqing He, Menghan Xia, and Ziwei Liu.
FreeTraj: Tuning-free trajectory control in video diffusion models. arXiv preprint
arXiv:2406.16863, 2024.
[5] Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, and Nan Duan.
DragNUWA: Fine-grained control in video generation by integrating text, image, and trajectory.
arXiv preprint arXiv:2308.08089, 2023.
[6] WeijiaWu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, Jun hao David Zhang, Shou Mike Zheng, Yan Li, Tingting Gao, and Di Zhang.
DragAnything: Motion control for anything using entity representation. In ECCV, 2024.
[7] Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman
Ra¨dle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, ChaoYuan Wu,
Ross Girshick, Piotr Dollar, and Christoph Feichtenhofer.
SAM 2: Segment anything in images and videos.
arXiv preprint arXiv:2408.00714, 2024.
[8] Shariq Farooq Bhat, Reiner Birkl, Diana Wofk, Peter Wonka, and Matthias Muller.
ZoeDepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.