CV

Basics

Name Cheng Chien
Email chengchien1999@gmail.com
Phone (886) 989819022
Url https://www.linkedin.com/in/cchien1999
Summary I hold a degree in Computer Science with a strong specialization in machine learning, focusing on image coding techniques and multimodal large language models.

Volunteer

  • 2021 - 2023
    Outreach Instructor
    Taiwan Happy Coding Promotion Association
    Dedicated to promoting programming skills and sparking interest in technology among Taiwanese children. Experienced in designing and delivering engaging educational programs that inspire young learners to explore the world of coding and computer science.

Education

  • 2022 - 2024

    Taiwan

    Master
    National Yang Ming Chiao Tung University, Taiwan
    Computer Sciences
  • 2018 - 2022

    Taiwan

    Bachelor
    National Sun Yat-sen University, Taiwan
    Computer Science and Engineering

Skills

Programming
Python
C++
Machine Learning
Pytorch
Multimodal Large Language Models
Image Coding for Machine
Image Compression

Languages

Mandarin
Native speaker
English
Fluent

Interests

Machine Learning
Multimodal Large Language Models
Image Coding for Machine
Image Compression

Awards

Publications

  • 2024
    ComNeck: Bridging Compressed Image Latents and Multimodal LLMs via Universal Transform-Neck
    Chia-Hao Kao, Cheng Chien, Yu-Jen Tseng, Yi-Hsin Chen, Alessandro Gnutti, Shao-Yuan Lo, Wen-Hsiao Peng, Riccardo Leonardi
    This paper presents the first-ever study of adapting compressed image latents to suit the needs of downstream vision tasks that adopt Multimodal Large Language Models (MLLMs). MLLMs have extended the success of large language models to modalities (e.g. images) beyond text, but their billion scale hinders deployment on resource-constrained end devices. While cloud-hosted MLLMs could be available, transmitting raw, uncompressed images captured by end devices to the cloud requires an efficient image compression system. To address this, we focus on emerging neural image compression and propose a novel framework with a lightweight transform-neck and a surrogate loss to adapt compressed image latents for MLLM-based vision tasks. The proposed framework is generic and applicable to multiple application scenarios, where the neural image codec can be (1) pre-trained for human perception without updating, (2) fully updated for joint human and machine perception, or (3) fully updated for only machine perception. The transform-neck trained with the surrogate loss is universal, for it can serve various downstream vision tasks enabled by a variety of MLLMs that share the same visual encoder. Our framework has the striking feature of excluding the downstream MLLMs from training the transform-neck, and potentially the neural image codec as well. This stands out from most existing coding for machine approaches that involve downstream networks in training and thus could be impractical when the networks are MLLMs. Extensive experiments on different neural image codecs and various MLLM-based vision tasks show that our method achieves great rate-accuracy performance with much less complexity, demonstrating its effectiveness.
  • 2023
    Learned hierarchical b-frame coding with adaptive feature modulation for yuv 4: 2: 0 contente
    Mu-Jung Chen , Hong-Sheng Xie , Cheng Chien, Wen-Hsiao Peng, and Hsueh-Ming Hang
    This paper introduces a learned hierarchical B-frame coding scheme in response to the Grand Challenge on Neural Network-based Video Coding at ISCAS 2023. We address specifically three issues, including (1) B-frame coding, (2) YUV 4:2:0 coding, and (3) content-adaptive variable-rate coding with only one single model. Most learned video codecs operate internally in the RGB domain for P-frame coding. B-frame coding for YUV 4:2:0 content is largely under-explored. In addition, while there have been prior works on variable-rate coding with conditional convolution, most of them fail to consider the content information. We build our scheme on conditional augmented normalized flows (CANF). It features conditional motion and inter-frame codecs for efficient B-frame coding. To cope with YUV 4:2:0 content, two conditional inter-frame codecs are used to process the Y and UV components separately, with the coding of the UV components conditioned additionally on the Y component. Moreover, we introduce adaptive feature modulation in every convolutional layer, taking into account both the content information and the coding levels of B-frames to achieve content-adaptive variable-rate coding. Experimental results show that our model outperforms x265 and the winner of last year's challenge on commonly used datasets in terms of PSNR-YUV.
  • 2023
    Transformer-based Image Compression with Variable Image Quality Objectives
    Chia-Hao Kao* , Yi-Hsin Chen* , Cheng Chien, Wei-Chen Chiu, and Wen-Hsiao Peng
    This paper presents a Transformer-based image compression system that allows for a variable image quality objective according to the user's preference. Optimizing a learned codec for different quality objectives leads to reconstructed images with varying visual characteristics. Our method provides the user with the flexibility to choose a trade-off between two image quality objectives using a single, shared model. Motivated by the success of prompt-tuning techniques, we introduce prompt tokens to condition our Transformer-based autoencoder. These prompt tokens are generated adaptively based on the user's preference and input image through learning a prompt generation network. Extensive experiments on commonly used quality metrics demonstrate the effectiveness of our method in adapting the encoding and/or decoding processes to a variable quality objective. While offering the additional flexibility, our proposed method performs comparably to the single-objective methods in terms of rate-distortion performance.
  • 2023
    TransTIC: Transferring Transformer-based Image Compression from Human Visualization to Machine Perception
    Yi-Hsin Chen, Ying-Chieh Weng, Chia-Hao Kao, Cheng Chien, Wei-Chen Chiu, and Wen-Hsiao Peng
    This work aims for transferring a Transformer-based image compression codec from human perception to machine perception without fine-tuning the codec. We propose a transferable Transformer-based image compression framework, termed TransTIC. Inspired by visual prompt tuning, TransTIC adopts an instance-specific prompt generator to inject instance-specific prompts to the encoder and task-specific prompts to the decoder. Extensive experiments show that our proposed method is capable of transferring the base codec to various machine tasks and outperforms the competing methods significantly. To our best knowledge, this work is the first attempt to utilize prompting on the low-level image compression task.