Zheng, J., Liu, D., Wang, C., Hu, M., Yang, Z., Ding, C., & Tao, D. (2024). MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis. International Journal of Computer Vision, 132(9), 3537. https://doi.org/10.1007/s11263-024-02044-4
Chicago Style (17th ed.) CitationZheng, Jianbin, Daqing Liu, Chaoyue Wang, Minghui Hu, Zuopeng Yang, Changxing Ding, and Dacheng Tao. "MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis." International Journal of Computer Vision 132, no. 9 (2024): 3537. https://doi.org/10.1007/s11263-024-02044-4.
MLA (9th ed.) CitationZheng, Jianbin, et al. "MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis." International Journal of Computer Vision, vol. 132, no. 9, 2024, p. 3537, https://doi.org/10.1007/s11263-024-02044-4.