Event-specific Image Importance
When creating a photo album of an event, people typically select a few important images to keep or share. Modeling this selection process will assist automatic photo selection and album summarization. In this project, we show that the selection of important images is consistent among different viewers, and that this selection process is related to the event type of the album. We introduce the concept of event-specific image importance. We propose a Convolutional Neural Network (CNN) based method to predict the image importance score of a given event album, using a novel rank loss function and a progressive training scheme. Results demonstrate that our method significantly outperforms various baseline methods. We also introduce the CUration of Flickr Events Dataset (CUFED) dataset for the study of event-specific image importance. For the dataset, please visit the project homepage.
Urban Tribes Recognition
Recognition of social styles of people are an interesting but relatively unexplored task. Recognizing "style" appears to be a quite different problem than categorization; it is like recognizing a letter's font as opposed to recognizing the letter itself. We solved this problem with the features extracted from convolutional deep network pre-trained on imagenet (Caffe). Combining the results from individuals in group pictures and the group itself, with some fine-tuning of the network, we reduce the previous state of the art error by almost half, going from 46% recognition rate to 71%. To explore how the networks perform this task, we compute the mutual information between the imagenet output category activations and the urban tribe categories, and find, for example, that bikers are well-categorized as tobacco shops, and that better-recognized social groups have more highly-correlated ImageNet categories. This gives us insight into the features useful for categorizing urban tribes.
Real-time Hand Posture Recognition with Kinect
Hand posture recognition is quite a challenging task, due to both the difficulty in detecting and tracking hands with normal cameras, and the limitations of traditional manually-selected features. We proposed a two-stage HPR system for Sign Language Recognition using a Kinect sensor. I mainly worked on the hand detection and tracking stage. We proposed an effective algorithm to implement hand detection and tracking. The algorithm incorporates both color and depth information, without specific requirements on uniform-colored or stable background. It can handle the situations in which hands are very close to other parts of the body or hands are not the nearest objects to the camera, and allows for occlusion of hands caused by faces or other hands. In the second stage, we apply Deep Neural Networks to automatically learn features from hand posture images insensitive to movement, scaling and rotation. Recognition rate on 36-posture dataset is 98.12%.
Supervising: Object Classification using a Turtlebot
During the summer of 2014, I supervised Kevin Xiong and Evan Phibbs to use the Turtlebot, a robot running on ROS, for object recognition.
The robot can be placed anywhere in the room, and it will move forward until it finds the first object. With the object target localized, the Turtlebot will circle the object to take pictures from different views, and recognize the object as a new or existed class. Convolutional neural network is used for feature extraction, and SVM is used for object recognition and new object learning.
Back to Top
Back to Top
- (Under Review) Wang, Y., Zhe, L., Shen, X., Cohen, S., Cottrell, G. W., "Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition", CVPR, 2017. (Supplementary)
- Wang, Y., Zhe, L., Shen, X., Měch, R., Miller, G., Cottrell, G. W., "Event-specific Image Importance.", CVPR, 2016.
- Rao, S., Wang, Y., G., Cottrell, G. W., "A Deep Siamese Neural Network Learns the Human-Perceived Similarity Structure of Facial Expressions Without Explicit Categories.", CogSic, 2016.
- Wang, Y., and Cottrell, G. W., "Bikers are like tobacco shops, formal dressers are like suits: Recognizing Urban Tribes with Caffe."IEEE Winter Conference on Applications of Computer Vision (WACV), 2015.
- Tang, A., Lu, K.,Wang, Y., Huang, J., and Li, H., "A Real-time Hand Posture Recognition System Using Deep Neural Networks." ACM Trans. Intell. Syst. Technol. 9, 4, Article 39.