Computer vision applications predict on digital images acquired by a camera from physical scenes through light. However, conventional robustness benchmarks rely on perturbations in digitized images, diverging from distribution shifts occurring in the image acquisition process. To bridge this gap, we introduce a new distribution shift dataset, ImageNet-ES, comprising variations in environmental and camera sensor factors by directly capturing 202k images with a real camera in a controllable testbed. With the new dataset, we evaluate out-of-distribution (OOD) detection and model robustness. We find that existing OOD detection methods do not cope with the covariate shifts in ImageNet-ES, implying that the definition and detection of OOD should be revisited to embrace real-world distribution shifts. We also observe that the model becomes more robust in both ImageNet-C and -ES by learning environment and sensor variations in addition to existing digital augmentations. Lastly, our results suggest that effective shift mitigation via camera sensor control can significantly improve performance without increasing model size. With these findings, our benchmark may aid future research on robustness, OOD, and camera sensor control for computer vision. Our code and dataset are available at this https://github.com/Edw2n/ImageNet-ES
@inproceedings{imagenet-es,author={Baek, Eunsu and Park, Keondo and Kim, Jiyoon and Kim, Hyung-Sin},title={Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains},booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},year={2024},month=jun,tags={s4d}}
We present MIRROR, an on-device video virtual try-on (VTO) system that provides realistic, private, and rapid experiences in mobile clothes shopping. Despite recent advancements in generative adversarial networks (GANs) for VTO, designing MIRROR involves two challenges: (1) data discrepancy due to restricted training data that miss various poses, body sizes, and backgrounds and (2) local computation overhead that uses up 24% of battery for converting only a single video. To alleviate the problems, we propose a generalizable VTO GAN that not only discerns intricate human body semantics but also captures domain-invariant features without requiring additional training data. In addition, we craft lightweight, reliable clothes/pose-tracking that generates refined pixel-wise warping flow without neural-net computation. As a holistic system, MIRROR integrates the new VTO GAN and tracking method with meticulous pre/post-processing, operating in two distinct phases (on/offline). Our results on Android smartphones and real-world user videos show that compared to a cutting-edge VTO GAN, MIRROR achieves 6.5x better accuracy with 20.1x faster video conversion and 16.9x less energy consumption.
@article{10.1145/3631420,author={Kang, Dong-Sig and Baek, Eunsu and Son, Sungwook and Lee, Youngki and Gong, Taesik and Kim, Hyung-Sin},title={[IMWUT 2023 / UbiComp 2024] MIRROR: Towards Generalizable On-Device Video Virtual Try-On for Mobile Shopping},year={2024},issue_date={December 2023},publisher={Association for Computing Machinery},address={New York, NY, USA},volume={7},number={4},url={https://doi.org/10.1145/3631420},doi={10.1145/3631420},journal={Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.},month=jan,articleno={163},numpages={27},tags={mirror}}
Virtual try-on (VTO) superimposes clothing over user image or video, enhancing online shopping experience. On-device VTO can preserve user privacy but most VTO techniques cannot be run on resource-constrained devices due to excessive computation overhead. In this demo, we demonstrate a novel Android application for on-device video VTO referring to MIRROR, the state-of-the-art mobile VTO system. The application minimizes video generation time by splitting the process into two phases, achieving 0.76 minutes to convert 10-second-long video on Galaxy S24 Ultra. Our application was evaluated as 78.5 score (above average) in SUS usability test. A companion video is provided at: https://youtu.be/YTExc8W5BzM
@inproceedings{10.1145/3643832.3661842,author={Ahn, Dongha and Kang, Dong-Sig and Baek, Eunsu and Kim, Hyung-Sin},title={Demo: On-Device Video Virtual Try-On for Mobile Shopping},year={2024},month=jun,isbn={9798400705816},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3643832.3661842},doi={10.1145/3643832.3661842},booktitle={Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services},pages={610–611},numpages={2},keywords={virtual try-on, video, mobile system, on-device computing},location={<conf-loc> <city>Minato-ku, Tokyo</city>, <country>Japan</country> </conf-loc>},series={MOBISYS '24},tags={mirror}}