Welcome to Jianfei Cai’s Personal Homepage

Jianfei Cai

Biography

Jianfei Cai is a Professor at the Faculty of IT, Monash University, where he was the inaugural Head of the Data Science & AI Department. Before that, he was Head of the Visual and Interactive Computing Division and Head of the Computer Communications Division at Nanyang Technological University (NTU). His major research interests include visual computing, computer vision, and multimedia. He is a co-recipient of paper awards in ACCV, ICCM, IEEE ICIP, and MMSP, and a winner of Monash FIT’s Dean’s Researcher of the Year Award and Dean’s Award for Excellence in Graduate Research Supervision. He is currently on the editorial board of TPAMI and IJCV. He has served as an Associate Editor for IEEE T-IP, T-MM, and T-CSVT, as well as serving as Senior/Area Chair for CVPR, ICCV, ECCV, ACM Multimedia, IJCAI, ICME, ICIP, and ISCAS. He was the Chair of IEEE CAS VSPC-TC during 2016-2018. He also served as the leading TPC Chair for IEEE ICME 2012, the best paper award committee chair/co-chair for IEEE T-MM 2020/2019, and the leading General Chair for ACM Multimedia 2024. He is a Fellow of IEEE.

General information

Research interests: visual computing, computer vision, deep learning, vision&language, 3D vision, multimedia, visual signal processing and networking
Address: Prof. Jianfei Cai, Dept. of DSAI, Faculty of IT, Monash University, Clayton VIC 3800
Email: jianfei.cai AT monash.edu
Links: [Monash website], [Google Scholar]
index: [Recent work], [Major work on visual computing & vision], [Major work on VSPC], [Other demos]

Updates

Recent work

C. Zhang, H. Liang, D. Y. Chen, Q. Wu, K. N. Plataniotis, C. C. Gambardella, and J. Cai, “PanFlow: decoupled motion control for panoramic video generation”, AAAI 2026. [paper]
Y. Chen, M. Li, Q. Wu, W. Lin, M. Harandi, and J. Cai, “PCGS: Progressive Compression of 3D Gaussian Splatting”, AAAI 2026 (Oral). [paper&code]
D.-T. Le, T. Pham, J. Cai, and H. Rezatofighi, “Marginalized Generalized IoU (MGIoU): A Unified Objective Function for Optimizing Convex Parametric Shapes”, AAAI 2026 (Oral). [paper&code]
Z. Ma, C. Gou, Y. Hu, Y. Wang, X. Chu, B. Zhuang, and Jianfei Cai, “Where and What Matters: sensitivity-aware task vectors for many-shot multimodal in-context learning”, AAAI 2026. [paper]
X. Yang, B. Li, Y. Zhang, Z. Yin, L. Bai, L. Ma, Z. Wang, J. Cai, T.-T. Wong · H. Lu, X. Jia, “VLIPP: towards physically plausible video generation with vision and language informed physical prior”, ICCV 2025. [paper&code]
C. Zhang, H. Xu, Q. Wu, C. C. Gambardella, D. Phung and J. Cai, “PanSplat: 4K panorama synthesis with feed-forward Gaussian splatting”, CVPR 2025. [paper&code]
Z. Ma, C. Gou, H. Shi, B. Sun, S. Li, H. Rezatofighi and J. Cai, “DrVideo: document retrieval based long video understanding”, CVPR 2025. [paper]
H. Sun, Q. Ke, M. Cheng, Y. Wang, D. Li, C. Gou and J. Cai, “Point-Cache: test-time dynamic and hierarchical cache for robust and generalizable point cloud analysis”, CVPR 2025. [paper&code]
S. Chen, Z. Pan, J. Cai, and D. Phung, “PaRa: Personalizing text-to-image diffusion via parameter rank reduction”, ICLR 2025 (Spotlight). [paper]
Y. Chen, Q. Wu, M. Li, W. Lin, M. Harandi and J. Cai, “Fast feedforward 3d Gaussian splatting compression”, ICLR 2025. [paper&code]
Z. Pan, B. Zhuang, D.-A. Huang, W. Nie, Z. Yu, C. Xiao, J. Cai, and A. Anandkumar, “T-Stitch: Accelerating sampling in pre-trained diffusion models with trajectory stitching”, ICLR 2025. [paper&code]

Major work on visual computing and computer vision

Y. Chen, C. Zheng, H. Xu, B. Zhuang, A. Vedaldi, T.-J. Cham and J. Cai, “MVSplat360: benchmarking 360 generalizable 3d novel view synthesis from sparse views”, NeurIPS 2024. [paper&code]
M. Wei, Q. Wu, J. Zheng, H. Rezatofighi, and J. Cai, “Normal-GS: 3D Gaussian splatting with normal-involved rendering”, NeurIPS 2024. [paper]
H. Sun, Q. Ke, Y. Wang, W. Chen, K. Yang, D. Li, and J. Cai, “Point-PRC: A prompt learning based regulation framework for generalizable point cloud analysis”, NeurIPS 2024. [paper&code]
P. Chen, J. Ye, et al., “GMAI-MMBench: A comprehensive multimodal evaluation benchmark towards general medical AI”, NeurIPS 2024. [paper&code]
Y. Chen, H. Xu, C. Zheng, B. Zhuang, M. Pollefeys, A. Geiger, T.-J. Cham and J. Cai, “MVSplat: Efficient 3D Gaussian splatting from sparse multi-view images”, ECCV 2024 (oral). [paper&code]
Q. Wu, J. Zheng and Jianfei Cai, “Surface reconstruction for 3D Gaussian splatting via local structural hints”, ECCV 2024. [paper&code]
Y. Chen, Q. Wu, M. Harandi and J. Cai, “HAC: Hash-grid assisted context for 3D Gaussian splatting compression”, ECCV 2024. [paper&code]
Z. Pan, J. Liu, H. He, J. Cai, and B. Zhuang, “Stitched ViTs are flexible vision backbones”, ECCV 2024. [paper&code]
D. Ren, H. Shi, J. Zheng and J. Cai, “McGrids: Monte Carlo-driven adaptive grids for iso-surface extraction”, ECCV 2024. [paper]
D. Ren, H. Mei, H. Shi, J. Zheng, J. Cai and L. Yang, “Differentiable convex polyhedra optimization from multi-view images”, ECCV 2024. [paper]
D.-T. Le, H. Shi, J. Cai, H. Rezatofighi, “DifFUSER: Diffusion model for robust multi-sensor fusion in 3D object detection and BEV segmentation”, ECCV 2024. [paper]
J. Liu, R. Gong, X. Wei, Z. Dong, J. Cai and B. Zhuang, “QLLM: accurate and efficient low-bitwidth quantization for large language models”, ICLR 2024. [paper&code]
H. A. Dung, C. Pham, T. Le, J. Cai and T.-T. Do, “Sharpness-aware data generation for zero-shot quantization”, ICML 2024. [paper]
C. Zhang, Q. Wu, C. C. Gambardella, X. Huang, D. Phung, W. Ouyang and J. Cai, “Taming Stable Diffusion for Text to 360 Panorama Image Generation”, CVPR 2024 (highlight) [paper&code]
Y. Wu, X. Luo, Z. Xu, X. Guo, L. Ju. Z. Ge, W. Liao and J. Cai, “Diversified and Personalized Multi-rater Medical Image Segmentation”, CVPR 2024 (highlight). [paper]
Y. Chen, Q. Wu, M. Harandi and J. Cai, “How Far Can We Compress Instant NGP-Based NeRF?”, CVPR 2024. [paper&code]
H. He, Z. Pan, J. Liu, J. Cai and B. Zhuang, “Efficient Stitchable Task Adaptation”, CVPR 2024. [paper&code]
C. Lin, Y. Jiang, L. Qu, Z. Yuan and J. Cai, “Generative Region-Language Pretraining for Open-Ended Object Detection”, CVPR 2024. [paper&code]
D. Le, C. Gou, S. Datta, H. Shi, I. Reid, J. Cai and H. Rezatofighi, “JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments”, CVPR 2024. [paper]
H. He, J. Cai, J. Liu, Z. Pan, J. Zhang, D. Tao, and B. Zhuang, “Pruning self-attentions into convolutional layers in single path”, IEEE TPAMI, vol. 46, pp. 3910-3922, May 2024, DOI: 10.1109/TPAMI.2024.3355890. [paper]
H. He, J. Cai, J. Zhang, D. Tao and B. Zhuang, “Sensitivity-aware visual parameter-efficient fine-tuning”, ICCV 2023 (oral). [paper]
Q. Wu, K. Wang, K. Li, J. Zheng and J. Cai, “ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces”, ICCV 2023. [paper&code]
Z. Pan, J. Cai, and B. Zhuang, “Stitchable neural networks”, CVPR 2023 (highlight). [paper&code]
H. Shi, M. Hayat, and J. Cai, “Transformer scale gate for semantic segmentation”, CVPR 2023. [paper]
H. He, J. Cai, Z. Pan, J. Liu, J. Zhang, D. Tao, B. Zhuang, “Dynamic focus-aware positional queries for semantic segmentation”, CVPR 2023. [paper&code]
E. Vendrow, D. T. Le, J. Cai, H. Rezatofighi, “JRDB-Pose: A large-scale dataset for multi-person pose estimation and tracking”, CVPR 2023. [paper]
Z. Cai, S. Ghosh, K. Stefanov, A. Dhall, J. Cai, H. Rezatofighi, R. Haffari, M. Hayat, “MARLIN: Masked autoencoder for facial video representation learning”, CVPR 2023. [paper&code]
C. Lin, P. Sun, Y. Jiang, P. Luo, L. Qu, G. Haffari, Z. Yuan and J. Cai, “Learning object-language alignments for open-vocabulary object detection”, ICLR 2023. [paper&code]
Y. Wu, Z. Wu, H. Shi, B. Picker, W. Chong, J. Cai, “CoactSeg: Learning from heterogeneous data for new multiple sclerosis lesion segmentation”, MICCAI 2023. [paper&code]
T. Nguyen-Duc, T. Le, R. Bammer, H. Zhao, J. Cai, and D. Phung, “Cross-adversarial local distribution regularization for semi-supervised medical image segmentation”, MICCAI 2023.
Z Pan, J Cai, B Zhuang, “Fast vision transformers with HiLo attention”, NeurIPS 2022 (spotlight). [paper][code]
J Liu, Z Pan, H He, J Cai, B Zhuang, “EcoFormer: energy-saving attention with linear complexity”, NeurIPS 2022 (spotlight). [paper&code]
C. Zheng, L.T. Vuong, J. Cai and D. Phung, “MoVQ: modulating quantized vectors for high-fidelity image generation”, NeurIPS 2022 (spotlight). [paper]
Q. Wu, X. Liu, Y. Chen, K. Li, C. Zheng, J. Cai and J. Zheng, “Object-compositional neural implicit surfaces”, ECCV 2022. [paper]
Y. Chen, Q. Wu, C. Zheng, T.-J. Cham, and J. Cai, “Sem2NeRF: converting single-view semantic masks to neural radiance fields”, ECCV 2022. [paper&code]
C. Lin, Y. Jiang, J. Cai, L. Qu, G. Haffari, and Z. Yuan, “Multimodal transformer with variable-length memory for vision-and-language navigation”, ECCV 2022. [paper]
D. Ren, J. Zheng, J. Cai, J. Li, and J. Zhang, “ExtrudeNet: unsupervised inverse sketch-and-extrude for shape parsing”, ECCV 2022.
Z. Wu, Y. Wu, G. Lin, J. Cai, and C. Qian, “Dual adaptive transformations for weakly supervised point cloud segmentation”, ECCV 2022. [paper]
Y. Chen, X. Yang, T.-J. Cham, and J. Cai, “Towards unbiased visual emotion recognition via causal intervention”, ACM Multimedia 2022. [paper] [code]
Y. Wu, Z. Wu, Q. Wu, Z. Ge and J. Cai, “Exploring smoothness and class-separation for semi-supervised medical image segmentation”, MICCAI 2022. [paper] [code]
C. Zheng, T.-J. Cham, J. Cai and D. Phung, “Bridging global context interactions for high-fidelity image completion”, CVPR 2022.[paper&code]
H. Xu, J. Zhang, J. Cai, H. Rezatofighi and D. Tao, “GMFlow: learning optical flow via global matching”, CVPR 2022 (oral). [paper&code]
H. Shi, M. Hayat, Y. Wu and J. Cai, “ProposalCLIP: unsupervised open-category object proposal generation via exploiting CLIP cues”, CVPR 2022. [paper]
Z. Pan, B. Zhuang, H. He, J. Liu and J. Cai, “Less is more: pay less attention in vision transformers”, AAAI 2022. [paper][code]
X. Yang, H. Zhang and J. Cai, “Deconfounded image captioning: a causal retrospect”, accepted by TPAMI, 10.1109/TPAMI.2021.3121705. [paper]
H. Xu, J. Yang, J. Cai, J. Zhang and X. Tong, “High-resolution optical flow from 1d attention and correlation”, ICCV 2021 (Oral). [paper]
Z. Pan, B. Zhuang, J. Liu, H. He and J. Cai, “Scalable visual transformers with hierarchical pooling”, ICCV 2021. [paper][code]
Z. Wu, X. Shi, G. Lin and J. Cai, “Learning meta-class memory for few-shot semantic segmentation”, ICCV 2021. [paper][code]
X. Yang, C. Gao, H. Zhang and J. Cai, “Auto-parsing network for image captioning and visual question answering”, ICCV 2021.
C. Lin, Z. Yuan, S. Zhao, P. Sun. C. Wang and J. Cai, “Domain-invariant disentangled network for generalizable object detection”, ICCV 2021.
D. Ren, J. Zheng, J. Cai, et al., “CSG-Stump: A learning friendly csg-like representation for interpretable shape parsing”, ICCV 2021. [paper&code]
Y. Cai, Y. Wang, Y. Zhu, T.-J. Cham, J. Cai, et al., “A unified 3D human motion synthesis model via conditional variational auto-encoder”, ICCV 2021. [paper][code]
Y. Wu, M. Xu, Z. Ge, J. Cai, L. Zhang, “Semi-supervised Left Atrium Segmentation with Mutual Consistency Training”, MICCAI 2021. [paper]
C. Zheng, T.-J. Cham and J. Cai “The Spatially-Correlative Loss for Various Image Translation Tasks”, CVPR 2021. [paper][code][project]
X. Yang, H. Zhang, G.-J. Qi and J. Cai, “Causal Attention for Vision-Language Tasks”, CVPR 2021.
J. Wang, T. Lukasiewicz, X. Hu, J. Cai and Z. Xu, “RSG: A simple yet effective module for learning imbalanced datasets”, CVPR 2021. [code]
Q. Tao, C.-C. Loy, J. Cai, Z. Ge and S. See, “Retrospective class incremental learning”, IEEE ICME 2021. [paper]
C. Zheng, D. Dao, G. Song, T.-J. Cham and J. Cai, “Visiting the invisible: layer-by-layer completed scene decomposition”, accepted by IJCV. [paper][project]
C. Zheng, T.-J. Cham and J. Cai, “Pluralistic Free-Form Image Completion”, accepted by IJCV.
G. Song, T.-J. Cham, J. Cai and J. Zheng, “Half-body portrait relighting with overcomplete lighting representation”, CGF, https://doi.org/10.1111/cgf.14384. [paper][video][code][dataset]
Z. Shao , J. Cai, T.-J. Cham, X. Lu, and L. Ma, “Unconstrained facial action unit detection via latent feature domain”, IEEE Trans on Affective Computing, 2021, 10.1109/TAFFC.2021.3091331. [paper]
X. Yang, H. Zhang and J. Cai, “Auto-encoding and distilling scene graphs for image captioning”, accepted by TPAMI, 10.1109/TPAMI.2020.3042192.
J. Gu, J. Kuen, S. Joty, J. Cai, V. Morariu, H. Zhao and T. Sun, “Self-supervised relationship probing”, NeurIPS 2020.
Z. Shao , Z. Liu , J. Cai , and L. Ma, “JAA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention”, IJCV, vol. 129, no. 2, pp.321-340, 2021. [paper] [code]
X. Yang. C. Gao, H. Zhang and J. Cai, “Hierarchical scene graph encoder-decoder for image paragraph captioning”, ACM Multimedia 2020.
K. Chen, J. Zhang, J. Cai and J. Zheng, “Modeling caricature expressions by 3D blendshape and dynamic texture”, ACM Multimedia 2020. [paper]
X. Shi, X. Yang, J. Gu, S. Joty and J. Cai, “Finding it at another side: a viewpoint-adapted matching encoder for change captioning”, ECCV 2020.
T. Zhang, G. Lin, W. Liu, J. Cai and A. Kot, “Splitting vs. merging: mining object regions with discrepancy and intersection loss for weakly supervised semantic segmentation”, ECCV 2020.
Y. Cai, L. Huang, Y. Wang, T.-J. Cham, J. Cai, et al., “Learning progressive joint propagation for human motion prediction”, ECCV 2020.
Z. Wu, Q. Tao, G. Lin and J. Cai, “Exploring bottom-up and top-down cues with attentive learning for webly supervised object detection”, CVPR 2020.
H. Jiang, F. Yan, J. Cai, J. Zheng, J. Xiao, “End-to-end 3D point cloud instance segmentation without detection”, CVPR 2020.
T. He, L. Gao, J. Song, J. Cai, Y.-F. Li, “Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation”, IJCAI 2020.
Y. Cai, L. Ge, J. Cai, J. Yuan and N. Thalmann, “3D hand pose estimation using synthetic data and weakly labeled RGB images”, accepted by IEEE TPAMI, DOI:10.1109/TPAMI.2020.2993627.
H. Liu, Y. S. Ong, X. Shen and J. Cai, “When Gaussian process meets big data: a review of scalable GPs”, accepted by IEEE TNNLS, DOI:10.1109/TNNLS.2019.2957109.
Z. Shao, Z. Liu, J. Cai, Y. Wu, L. Ma, “Facial Action Unit Detection Using Attention and Relation Learning”, accepted by IEEE Trans. on Affective Computing, DOI: 10.1109/TAFFC.2019.2948635.
B. Jiang, J. Zhang, J. Cai and J. Zheng, “Disentangled human body embedding based on deep hierarchical neural network”, IEEE TVCG, vol. 26, no. 8, pp. 2560-2575, Aug 2020.
T. Zhang, G. Lin, J. Cai, T. Shen, C. Shen, A. C. Kot, “Decoupled spatial neural attention for weakly supervised semantic segmentation”, IEEE TMM, vol. 21 , no. 11 , Nov. 2019.
H. Jiang, J. Cai and J. Zheng, “Skeleton-aware 3d human shape reconstruction from point clouds”, ICCV 2019.
X. Yang, H. Zhang, and J. Cai, “Learning to collocate neural modules for image captioning”, ICCV 2019.[paper]
J. Gu, S. Joty, J. Cai, H. Zhao, X. Yang and G. Wang, “Unpaired image captioning via scene graph alignments”, ICCV 2019. [paper]
Y. Cai, L. Ge, J. Liu, J. Cai, T.-J. Cham, J. Yuan, N. M. Thalmann, “Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks”, ICCV 2019. [paper][code]
X. Shi, J. Cai, S. Joty and J. Gu, “Watch it twice: video captioning with a refocused video encoder”, ACM MM, 2019. [paper]
Z. Wu, G. Lin, Q. Tao and J. Cai, “M2E-Try On Net: Fashion from Model to Everyone”, ACM MM, 2019. [paper]
Q. Tao, Z. Ge, J. Cai, J. Yin and S. See, “Improving deep lesion detection using 3d contextual and spatial attention”, MICCAI 2019. [paper]
H. Xu, J. Zheng, J. Cai and J. Zhang, “Region deformer networks for unsupervised depth estimation from unconstrained monocular videos”, IJCAI 2019. [paper][code]
C. Zheng, T.-J. Cham and J. Cai, “Pluralistic image completion”, CVPR 2019.[demo-short][demo-long][demo-live&code]
X. Yang, K. Tang, H. Zhang and J. Cai, “Auto-encoding scene graphs for image captioning”, CVPR 2019 (oral). [arxiv]
L. Ge, Z. Ren, Y. Li, Z. Xue, Y. Wang, J. Cai and J. Yuan, “3D hand shape and pose estimation from a single RGB image”, CVPR 2019 (oral). [arxiv]
J. Gu, H. Zhao, Z. Lin, S. Li, J. Cai, and M. Ling, “Scene graph generation with external knowledge and image reconstruction”, CVPR 2019.
L. Sheng, J. Cai, T.-J. Cham, V. Pavlovic, and K. N. Ngan, “Visibility constrained generative model for depth-based 3D facial pose tracking”, IEEE T-PAMI, vol. 41, no. 8, pp. 1994-2007, Aug. 2019. [pdf]
Y. Guo, J. Zhang, J. Cai, B. Jiang and J. Zheng, “CNN-based real-time dense face reconstruction with inverse-rendered photo-realistic face images”, IEEE T-PAMI, vol. 41, no. 9, pp. 1294-1307, June 2019. [pdf][demo][dataset]
Q. Tao, H. Yang and J. Cai, “Exploiting web images for weakly supervised object detection”, IEEE Trans. on Multimedia (TMM), vo. 21, no. 5, pp. 1135-1146, May 2019. [arxiv][dataset]
Y. Cai, L. Ge, J. Cai and J. Yuan, “Weakly supervised 3D hand pose estimation from monocular RGB images”, ECCV 2018 (oral).
Q. Tao, H. Yang, and J. Cai, “Zero-annotation object detection with web knowledge transfer”, ECCV 2018. [arxiv]
J. Gu, J. Shafiq, J. Cai and G. Wang, “Unpaired image captioning by language pivoting”, ECCV2018. [arxiv]
Z.Shao, Z. Liu, J. Cai and L. Ma, “Deep adaptive attention for joint facial action unit detection and face alignment”, ECCV 2018. [arxiv][code]
Q. Li, Q. Tao, J. Shafiq, J. Cai, and J. Luo, “VQA-E: explaining, elaborating, and enhancing your answers for visual questions”, ECCV 2018. [arxiv]
C. Zheng, T.-J. Cham and J. Cai, “T2Net: synthetic-to-realistic translation for solving single-image depth estimation tasks”, ECCV 2018. [arxiv][code][video]
J. Pradeep, J. Mei, J. Cai and J. Zheng, “Quadtree convolutional neural networks”, ECCV 2018.
X. Yang, H. Zhang and J. Cai, “Shuffle-Then-Assemble: learning object-agnostic visual relationship features”, ECCV 2018. [arxiv]
G. Song, J. Cai, T.-J. Cham, J. Zheng, J. Zhang and H. Fuchs, “Real-time 3D face-eye performance capture of a person wearing VR headset”, ACM Multimedia 2018. [demo-HMD][demo-Avatar]
H. Liu, J. Cai, Y. Wang and Y. S. Ong, “Generalized robust Bayesian committee machine for large-scale Gaussian process regression”, ICML 2018.
J. Gu, J. Cai, J. Shafiq, L. Niu and G. Wang, “Look, imagine and match: improving textual-visual cross-modal retrieval with generative models”, CVPR 2018 (spotlight paper).
Q. Wu, J. Zhang, Y. Lai, J. Zheng and J. Cai, “Alive caricature from 2D to 3D”, CVPR 2018 (spotlight paper) [listed in The Best of the Physics arXiv by MIT Tech. Review in Mar. 31, 2018]
J. Cai, G. Wang and T. Chen, “Stack-captioning: coarse-to-fine learning for image captioning”, AAAI 2018.
D. Xu, Q. Duan, J. Zheng, J. Zhang, J. Cai and T. J. Cham, “Shading-based surface detail recovery under general unknown illumination”, IEEE T-PAMI, Feb. 2018.
J. Gu, G. Wang, J. Cai and T. Chen, “An empirical study of language CNN for image captioning”, ICCV 2017.
F. Tan, P. Fu, T. Deng, J. Cai, T. J. Cham, “FaceCollage: A rapidly deployable system for real-time head reconstruction for on-the-go 3D telepresence”, ACM Multimedia 2017. (Full paper) [paper][demo]
H. Yang, J. T. Zhou, J. Cai and Y. S. Ong, “MIML-FCN+: multi-instance multi-label learning via fully convolutional networks with privileged information”, IEEE CVPR 2017.
L. Sheng, J. Cai, T.-J. Cham, V. Pavlovic, and K. N. Ngan, “A generative model for depth-based robust 3D facial pose tracking”, IEEE CVPR 2017.
K. R. Jerripothula, J. Cai, J. Lu and J. Yuan, “Object Co-skeletonization with Co-segmentation”, IEEE CVPR 2017.
H. Yang, J. T. Zhou and J. Cai, “Improving multi-label learning with missing labels by structured semantic correlations”, ECCV 2016 (oral). [arxiv]
L. Niu, J. Cai and D. Xu, “Domain adaptive fisher vector for visual recognition”, ECCV 2016.
K. R. Jerripothula, J. Cai and J. Yuan, “CATS: co-saliency activated tracklet selection for video co-localization”, ECCV 2016.
A. Wang, J. Cai, J. Lu and T.-J. Cham, “Modality and component aware feature fusion for RGB-D scene classification”, IEEE CVPR 2016.
H. Yang, J. T. Zhou, Y. Zhang, B. Gao, J. Wu and J. Cai, “Exploit bounding box annotations for multi-label object recognition”, IEEE CVPR 2016.
H. Zhu, J. Lu, J. Cai, J. Zheng and N. Thalmann, “Multiple Human Identification and Cosegmentation: A Human-Oriented CRF Approach with Poselets”, IEEE Trans. on Multimedia (TMM), vol. 18, no. 8, 2016.
Y. Zhang, J. Wu and J. Cai, “Compact representation of high-dimensional feature vectors for large-scale image recognition and retrieval”, IEEE Trans. on Image Processing (TIP), vol. 25, no. 5, 2016.
Y. Zhang, W. Wei, J. Wu, J. Cai, J. Lu, V.-A. Nguyen and M. Do, “Weakly Supervised Fine-Grained Categorization with Part-Based Image Representation”, IEEE Trans. on Image Processing (TIP), vol. 25, no. 4, 2016.
H. Zhu, F. Meng, J. Cai and S. Lu, “Beyond pixels: a comprehensive survey from bottom-up to semantic image segmentation and cosegmentation”, Elsevier Journal of Visual Communications and Image Representation, vol. 34, Jan. 2016. [pdf]
A. Wang, J. Cai, J. Lu and T.-J. Cham, “MMSS: Multi-modal sharable and specific feature learning for RGB-D object recognition”, ICCV, 2015.
A. Wang, J. Lu, J. Cai, G. Wang and T.-J. Cham, “Unsupervised joint feature learning and encoding for RGB-D scene labeling”, IEEE Trans. on Image Processing (TIP), vol. 24, no. 11, Nov. 2015.
A. Wang, J. Lu, J. Cai, T.-J. Cham and G. Wang, “Large-margin multi-modal deep learning for RGB-D Object recognition”, IEEE Trans. on Multimedia (TMM), vol. 17, no. 11, Nov. 2015.
C. Chen, J. Cai, J. Zheng, T.-J. Cham and G. Shi, “Kinect depth recovery using a color-guided, region-adaptive, and depth-selective framework”, ACM Trans. on Intelligent Systems and Technology (TIST), vol. 6, no. 2, May 2015.
S. Xiong, J. Zhang, J. Zheng, J. Cai and L. Liu, “Robust surface reconstruction via dictionary learning”, ACM Trans. on Graphics (Proc. Siggraph Asia), 2014. [website](ICCM 2019 Best Paper Award)
F. Meng, J. Cai, and H. Li, “On multiple image group cosegmentation”, ACCV 2014 (oral). (Best Student Paper Honorable Mention Award)[pdf]
C. Chen, H. X. Pham, V. Pavlovic, J. Cai and G. Shi, “Depth recovery with face priors”, ACCV 2014 (oral). [pdf]
A. Wang, J. Lu, G. Wang, J. Cai and T.-J. Cham, “Multi-modal unsupervised feature learning for RGB-D scene labeling”, ECCV 2014.
D. Xu, Q. Duan, J. Zheng, J. Zhang, J. Cai and T. J. Cham, “Recovering surface details under general unknown illumination using shading and coarse multi-view stereo,” CVPR 2014. [pdf]
Y. Zhang, J. Wu and J. Cai, “Compact representation for image classification: to choose or to compress?” CVPR 2014.
Y. Zhang, J. Wu, J. Cai, W. Lin, “Flexible image similarity computation using hyper-spatial matching”, IEEE Trans. on Image Processing (TIP), 23(9), pp. 4112-4125, Sept. 2014. [pdf]
M. Zhao, C.-W. Fu, J. Cai and T.-J. Cham, “Real-time and Temporal-coherent Foreground Extraction with Commodity RGBD Camera”, IEEE Journal of Selected Topics in Signal Processing, no. 99, Dec 2014. [demo]
H. Zhu, J. Lu, J. Cai, J. Zheng and N. Thalmann, “Multiple foreground recognition and cosegmentation: an object-oriented CRF model with robust higher-order potentials”, WACV 2014. [pdf]
H. Zhu, J. Zheng, J. Cai, and N. M. Thalmann, “Object-level image segmentation using low level cues”, IEEE Transactions on Image Processing (TIP), vol. 22, no. 10, pp. 4019-4027, Oct. 2013. [[pdf] (ICCM 2017 Best Paper Award)
T. Nguyen, J. Cai, J. Zheng and J. Li, “Interactive object segmentation from multi-view images”, Elsevier Journal of Visual Communications and Image Representation, vol. 24, no. 4, May 2013. [demo][pdf]
H. Li, J. Cai, A. Nguyen and J. Zheng, “A benchmark for semantic image segmentation”, IEEE ICME 2013. [pdf][dataset][software]
J. Zhang, J. Zheng, C. Wu and J. Cai, “Variational mesh decomposition”, ACM Transactions on Graphics (TOG), vol. 31, no. 3, May 2012 (presented in SIGGRAPH 2012). [pdf]
T. Nguyen, J. Cai, J. Zhang and J. Zheng, “Robust interactive image segmentation using convex active contours”, IEEE Transactions on Image Processing (TIP), vol. 21, no. 8, pp.3734-3743, Aug. 2012. [pdf][software]
Q. Duan, J. Zheng, and J. Cai, “Flexible and accurate transparent-object matting and compositing using refractive vector field”, Computer Graphics Forum, Vol. 30, no. 6, pp. 1812-1824, Sept. 2011. [pdf]
J. Zhang, J. Zheng, J. Cai,“Interactive Mesh Cutting Using Constrained Random Walks”, IEEE Transactions on Visualization and Computer Graphics (TVCG), vol. 17, no. 3, pp. 357-367, March 2011. [pdf]
J. Zhang, J. Zheng and J. Cai,“A diffusion approach to seeded image segmentation”, IEEE CVPR, 2010. [pdf]
J. Zhang, C. Wu, J. Cai, J. Zheng and X. Tai, “Mesh snapping: robust interactive mesh cutting using fast geodesic curvature flow”, Proc. of Eurographics’10 (Computer Graphics Forum), 2010. [pdf]
W. Yang, J. Cai, J. Zheng, J. Luo, “User-friendly interactive image segmentation through unified combinatorial user inputs”, IEEE Transactions on Image Processing, vol. 19, no. 9, pp. 2470-2479, Sept. 2010. [pdf][software]
W. Yang, J. Zheng, J. Cai, S. Rahardja, C. W. Chen, “Natural and seamless image composition with color control”, IEEE Trans. on Image Processing, vol. 18, no. 11, pp. 2584-2592, Nov. 2009. [pdf][software]

Major work on visual signal processing and networking

G. Gao, H. Zhang, H. Hu, Y. Wen, J. Cai, C. Luo, W. Zeng, “Optimizing quality of experience for adaptive bitrate streaming via viewer interest inference”, IEEE T-MM, vol. 20, no. 12, pp. 3399-3413, Dec. 2018.
L. Wei, J. Cai, C. H. Foh, B. He, “QoS-aware resource allocation for video transcoding in clouds”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 1, 2017.
L. Wei, C. H. Foh, B. He, J. Cai, “Towards efficient resource allocation for heterogeneous workloads in IaaS clouds”, IEEE Transactions on Cloud Computing, no. 99, Sept 2015.
H. Lu, C. H. Foh, Y. Wen and J. Cai, “Delay-optimized file retrieval under LT-based cloud storage”, IEEE Transactions on Cloud Computing, no. 99, May 2015.
H. Lu, F. Lu, J. Cai and C. H. Wiedemann Solver”, IEEE Transactions on Information Theory, vol. 59, no. 12, pp. 7887-7897, Dec 2013.
J. Qureshi, C. H. Foh and J. Cai, “Optimal solution for the index coding problem using network coding over GF(2)”, IEEE SECON 2012.[pdf]
C. Deng, W. Lin and J. Cai, “Content-based image compression for arbitrary-resolution display devices”, IEEE Trans. on Multimedia, vol. 14, no. 4, pp. 1127-1139, Aug. 2012.
W. Guan, J. Cai, J. Zhang and J. Zheng, “Progressive coding and illumination and view dependent transmission of 3D meshes using R-D optimization”, IEEE Trans. Circuits and Systems for Video Technology, vol. 20, no. 4, pp. 575-586, Apr 2010. [pdf]
G. Zhai, J. Cai, W. Lin, X. Yang, W. Zhang and M. Etoh, “Cross-dimensional Perceptual Quality Assessment for Low Bitrate Videos”, IEEE Trans. on Multimedia, vol. 10, no. 7, pp. 1316-1324, Nov. 2008. [pdf]
W. Guan, J. Cai, J. Zheng and C. W. Chen, “Segmentation based View-Dependent 3D Graphics Model Transmission”, IEEE Trans. on Multimedia, special issue on multimedia applications in mobile/wireless context, vol. 10, no. 5, pp. 724-734, Aug 2008. [pdf]
C. H. Foh, Y. Zhang, Z. Ni, J. Cai and K. N. Ngan, “Optimized Cross-Layer Design for Scalable Video Transmission over the IEEE 802.11e Networks”, IEEE Trans. Circuits and Systems for Video Technology, vol. 17, no. 12, pp. 1665-1678, Dec. 2007. [pdf]
D. Tao, J. Cai, H. Yi, D. Rajan, L. T. Chia and K. N. Ngan, “Dynamic Programming Based Reverse Frame Selection for VBR Video Delivery under Constrained Resources”, IEEE Trans. Circuits and Systems for Video Technology, vol. 16, no. 11, pp. 1362-1375, Nov. 2006.
W. Yang, Y. Lu, F. Wu, J. Cai, K. N. Ngan and S. Li, “4D Wavelet-based Multi-view Video Coding”, IEEE Trans. Circuits and Systems for Video Technology, vol. 16, no. 11, pp. 1385-1396, Nov. 2006.
J. Cai, X. Li, and C. W. Chen, Layered unequal loss protection with pre-interleaving for fast progressive image transmission over packet-loss channels”, ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP), Nov. 2005.
Z. He, J. Cai and C. W. Chen, “Joint source channel rate-distortion analysis for adaptive mode selection and rate control in wireless video coding”, IEEE Trans. Circuits and Systems for Video Technology, special issue on wireless video, vol.12, no.6, pp.511-523, June 2002. [pdf]