Yufei Wang

Just Another Styleshout Template

About Me

My name is Yufei Wang. I’m a fourth year Ph.D. student in ECE Department at UC San Diego.

I work in Gary’s Unbelievable Research Unit. My research interests are machine learning, computer vision and deep learning. Currently, I’m working on image captioning with deep Neural Networks.

Before UC San Diego, I finished my undergraduate from School for the Gifted Young at University of Science and Technology of China(USTC), majoring in Electronic Engineering and Information Science. During my senior year, I worked on Sign Language Recognition project in Professor Houqiang Li’s lab.

See my CV here.

Back to Top

Research Projects

Event-specific Image Importance

When creating a photo album of an event, people typically select a few important images to keep or share. Modeling this selection process will assist automatic photo selection and album summarization. In this project, we show that the selection of important images is consistent among different viewers, and that this selection process is related to the event type of the album. We introduce the concept of event-specific image importance. We propose a Convolutional Neural Network (CNN) based method to predict the image importance score of a given event album, using a novel rank loss function and a progressive training scheme. Results demonstrate that our method significantly outperforms various baseline methods. We also introduce the CUration of Flickr Events Dataset (CUFED) dataset for the study of event-specific image importance. For the dataset, please visit the project homepage.

Urban Tribes Recognition

Recognition of social styles of people are an interesting but relatively unexplored task. Recognizing "style" appears to be a quite different problem than categorization; it is like recognizing a letter's font as opposed to recognizing the letter itself. We solved this problem with the features extracted from convolutional deep network pre-trained on imagenet (Caffe). Combining the results from individuals in group pictures and the group itself, with some fine-tuning of the network, we reduce the previous state of the art error by almost half, going from 46% recognition rate to 71%. To explore how the networks perform this task, we compute the mutual information between the imagenet output category activations and the urban tribe categories, and find, for example, that bikers are well-categorized as tobacco shops, and that better-recognized social groups have more highly-correlated ImageNet categories. This gives us insight into the features useful for categorizing urban tribes.

Real-time Hand Posture Recognition with Kinect

Hand posture recognition is quite a challenging task, due to both the difficulty in detecting and tracking hands with normal cameras, and the limitations of traditional manually-selected features. We proposed a two-stage HPR system for Sign Language Recognition using a Kinect sensor. I mainly worked on the hand detection and tracking stage. We proposed an effective algorithm to implement hand detection and tracking. The algorithm incorporates both color and depth information, without specific requirements on uniform-colored or stable background. It can handle the situations in which hands are very close to other parts of the body or hands are not the nearest objects to the camera, and allows for occlusion of hands caused by faces or other hands. In the second stage, we apply Deep Neural Networks to automatically learn features from hand posture images insensitive to movement, scaling and rotation. Recognition rate on 36-posture dataset is 98.12%.

Supervising: Object Classification using a Turtlebot

During the summer of 2014, I supervised Kevin Xiong and Evan Phibbs to use the Turtlebot, a robot running on ROS, for object recognition.

The robot can be placed anywhere in the room, and it will move forward until it finds the first object. With the object target localized, the Turtlebot will circle the object to take pictures from different views, and recognize the object as a new or existed class. Convolutional neural network is used for feature extraction, and SVM is used for object recognition and new object learning.

Class Projects

Back to Top


Back to Top

Get in touch

Contact Information

Office: 4154 CSE Buiding
E-mail: yuw176 at-sign ucsd dot edu


Social Media

Free Web Counter