Telescopic Visual Storytelling
Distilled representative terms from a sequence of images and proposed a scoring function to find the appropriate relations between terms on knowledge graph. Developed length-controlled Transformer to generated stories with diverse length. The human evaluation showed that our model can provide better focus and detail when stories are prolonged compared to the state of the art.
[Work in Progress]

Conversational Visual Question Generation
Explored a novel scenario: a conversation agent views a set of the user's photos and asks an engaging question to initiate a conversation with the user. Introduced a two-phase framework that first generates a visual story for the photo set and then uses the story to produce an interesting question. The human evaluation shows that our framework generates more response-provoking questions for starting conversations than other vision-to-question baselines.
[Paper Link]

Multi-modal Dialog System
Proposed a multi-step joint-modality attention network based on recurrent neural network to reason on multiple modalities, including audio, vision, and language. The model jointly considered both visual and textual representations in each reasoning process to better integrate information from dynamic scenes.
[Paper Link]

Multiview Items Recommendation
Developed a GNN-based recommendation model which provides superior recommendations by describing items from user and entity angles. Designed user-oriented modules that aggregate features to make personalized recommendations and a mixing layer which contrasts layer-wise GCN to obtain comprehensive features from internal entity-entity interactions.
[Paper Link]

Stage-Wise Training for GNN-based Recommender Model
Applied stage-wise training on two state-of-the-art recommendation models, RippleNet and Knowledge Graph Convolutional Networks (KGCN), and evaluated the performance on six real world datasets. The result of the experiments showed that stage-wise training strategy can help both models to collect more information from the KG and improve the recommendation performance.
[Paper Link]

Luminance Variation Resistant Remote-PPG
Collected drivers’ facial dataset (2.7M continuous images) in different outdoor scenarios, including day time and nighttime. Developed an Adaptive Neural Network Model Selection algorithm to dynamically select personalized model and eliminate facial luminance variation noise from rPPG signal. This work successfully reduced the mean absolute error from 14.71 bpm to 4.51 bpm.
[Paper Link] [Demo Video]

Motion Robust Remote-PPG
Built a face tracking algorithm to extract heart rate signal from driver’s face in continuous images sequence. Developed machine learning approach to eliminate rPPG noise caused by driver's facial motion. This work is first of its kind as the traditional rPPG work consider only in indoor and stable environment.
[Paper Link]