Tsung-Hsien Wen (Shawn)

Dialogue System Group
Department of Engineering
University of Cambridge

Bio

Tsung-Hsien Wen (溫宗憲) is a PhD student in Dialogue System Group, University of Cambridge, United Kingdom, under the supervision of Professor Steve Young. His research focus on using deep learning and distributional semantics to solve problems and bring domain scalability to statistical dialogue systems. The application areas he particular interested in are statistical natural language generation (NLG), language modeling (LM), and spoken language understanding (SLU). His PhD is supported by Toshiba Research Europe Ltd, Cambridge Research Laboratory. He is a current member of Darwin College.

Positions

Present 2014

Ph.D. Student

University of Cambridge
2014 2013

Compulsory Second Lieutenant Chief Counselor

Republic of China Army
2013 2011

Teaching Assistant

National Taiwan University, Digital Speech Processing and Speech Special Project
2013 2011

Part-time Algorithm Developer

StorySense Computing, Inc, acquired by 电话帮 in 2014.

Education

Ph.D. Present

Ph.D. student in Engineering

University of Cambridge
M.A.2013

Master of Science in Engneering

National Taiwan University
B.A.2011

Bachelor of Science in Engineering

National Taiwan University

Honors, Awards and Grants

2014

Toshiba Research Studentship, Toshiba Research Europe Ltd

3-year studentship funded by Toshiba Research Europe Ltd, Cambridge Research Laboratory, for developing wide domain statistical dialogue systems.
2014

Government Scholarship for Stufying Overseas, MOE of Taiwan

1 of 16 selected EECS students based on outstanding academic achievements.
Aug 2013

InterSpeech 2013 Best Student Paper Nominee, ISCA

Earned 1 of 12 out of thousands of accepted papers.
Dec 2012

InterSpeech 2012 Best Student Paper Nominee, ISCA

Earned 1 of 10 out of thousands of accepted papers.
2010

Sir Zong Education Foundation Student Grant, Sir Zong Foundation

Scholoarship for outstanding college and high school students.

Research Summary

Recently, significant progress has been mode in applying statistical methods in automating the development of Spoken Dialogue Systems (SDS). However, they are still restricted in particular application domains and were found hard to scale or even extend to similar domains. One very reason is the ambiguous nature of human languages, which makes the three core components: speech recognition, spoken language understanding, and natural language generation the bottlenecks for scalability.

Deep learning sheds a light on these language problems. By implicitly mapping words into distributional, low-dimensional vectors, semantics and syntactics can be composited to form complex meanings or be used to make sophisticated predictions. Furthermore, neural networks can be trained end-to-end from given examples, which reduced the amount of handcrafting and manual feature engineering in the dilaogue development process. These benefits make the scalability of SDS possible in a near future.

Interests

Deep Learning
Distributional Semantics
Natural Language Processing
Open-domain Dialogue Systems
Language Generation
Language Modeling

Research Projects

Neural Network for Language Generation

Stochastic Language Generation using Neural Networks

to be appear
Personalised Language Modeling

Personalising language models using social network crowdsourcing

Designed a crowdsourcing platform to collect personal corpora from social network.

Built personalized language models by adopting social properties.

Compared personalization capabilities of N-gram and Recurrent Neural Network LMs.
Interactive Retrieval

Interactive retrieval system for spoken content

Cast interactive retrieval problems as an MDP decision framework.

Developed and compared various MDP models and reinforcement learning methods.

Implemented a state (retrieval quality) estimator to project retrieval indicators to state.

Publications

First author only, for a full list of publications, please see my Google Scholar page.

Filter by type:

Sort by year:

Recurrent Neural Network Based Language Model Personalization by Social Network Crowdsourcing

Tsung-Hsien Wen, Aaron Heidel, Hung-yi Lee, Yu Tsao and Lin-Shan Lee

Conference PapersIn Proceedings on InterSpeech, Lyon, France, August 2013

Abstract

Speech recognition has become an important feature in smartphones in recent years. Different from traditional au- tomatic speech recognition, the speech recognition on smartphones can take advantage of personalized language models to model the linguistic patterns and wording habits of a particular smartphone owner better. Owing to the popularity of social networks in recent years, personal texts and messages are no longer inaccessible. However, data sparseness is still an unsolved problem. In this paper, we propose a three-step adaptation approach to personalize recurrent neural network language models (RNNLMs). We believe that its capability to model word histories as distributed representations of arbitrary length can help mitigate the data sparseness problem. Furthermore, we also propose additional user-oriented features to empower the RNNLMs with stronger capabilities for personalization. The experiments on a Facebook dataset showed that the proposed method not only drastically reduced the model perplexity in preliminary experiments, but also moderately reduced the word error rate in n-best rescoring tests.

Interactive Spoken Content Retrieval by Extended Query Model and Continuous State Space Markov Decision Process

Tsung-Hsien Wen, Hung-yi Lee, Pei-hao Su, and Lin-Shan Lee

Conference PapersIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, Canada, May 2013

Abstract

Interactive retrieval is important for spoken content because the retrieved spoken items are not only difficult to be shown on the screen but also scanned and selected by the user, in addition to the speech recognition uncertainty. The user cannot playback and go through all the retrieved items to find out what he is looking for. Markov Decision Process (MDP) was used in a previous work to help the system take different actions to interact with the user based on an estimated retrieval performance, but the MDP state was represented by the less precise quantized retrieval performance metric. In this paper, we consider the retrieval performance metric as a continuous state variable in MDP and optimize the MDP by fitted value iteration (FVI). We also use query expansion with the language modeling retrieval framework to produce the next set of retrieval results. Improved performance was found in the preliminary experiments.

Personalized Language Modeling by Crowd Sourcing with Social Network Data for Voice Access of Cloud Applications

Tsung-Hsien Wen, Hung-yi Lee, Tai-Yuan Chen, and Lin-Shan Lee

Conference PapersIEEE Workshop on Spoken Language Technology (SLT), Miami, Florida, December 2012

Abstract

Voice access of cloud applications via smartphones is very attractive today, specifically because a smartphones is used by a single user, so personalized acoustic/language models become feasible. In particular, huge quantities of texts are available within the social networks over the Internet with known authors and given relationships, it is possible to train personalized language models because it is reasonable to assume users with those relationships may share some common subject topics, wording habits and linguistic patterns. In this paper, we propose an adaptation framework for building a robust personalized language model by incorporating the texts the target user and other users had posted on the social networks over the Internet to take care of the linguistic mismatch across different users. Experiments on Facebook dataset showed encouraging improvements in terms of both model perplexity and recognition accuracy with proposed approaches considering relationships among users, similarity based on latent topics, and random walk over a user graph.

Interactive Spoken Content Retrieval with Different Types of Action Optimized by a Markov Decision Process

Tsung-Hsien Wen, Hung-yi Lee, and Lin-Shan Lee

Conference PapersIn Proceedings on InterSpeech, Portland OR, USA, September 2012

Abstract

Interaction with user is specially important for spoken content retrieval, not only because of the recognition uncertainty, but because the retrieved spoken content items are difficult to be shown on the screen and difficult to be scanned and selected by the user. The user cannot playback and go through all the retrieved items and then find out they are not what he is looking for. In this paper, we propose a new approach for interactive spoken content retrieval, in which the system can estimate the quality of the retrieved results, and take different types of actions to clarify the user’s intention based on an intrinsic policy. The policy is optimized by a Markov Decision Process (MDP) trained with Reinforcement Learning based on a set of pre-defined rewards considering the extra burden given to the user.

Voice Access of Cloud Applications : Language Model Personalization and Interactive Spoken Content Retrieval

Tsung-Hsien Wen

Thesis

Abstract

This thesis considers voice access of cloud applications with two parts: (1) Personalized Language Model and (2) Interactive spoken document retrieval. Model mismatch has been a major problem in speech recognition. With hand-held devices widely used today, personalized models become possible. A huge quantities of posts and comments with known owners emerged on social network websites, personal corpora become practically available but with data sparseness problem unsolved. In the first part of this thesis, we proposed personalized language modeling approaches by estimating the language similarities between different social network users and integrating the corresponding personal corpora accordingly. We studied both N-gram language models as well as recurrent neural network language models, and the experimental results support the concept. In the second part of this thesis, we studied interactive spoken document retrieval. Interactive retrieval is helpful to spoken content retrieval because retrieved spoken items are difficult to be shown on screen and browsed by the user, in addition to the speech recognition uncertainty. We model the interaction process by a Markov Decision Process and train the policy with Reinforcement Learning. Experimental results demonstrate the retrieval performance can be improved with the interactions.

Contact Me

lab: +44 1223 332654
thw28@cam.ac.uk
tsung-hsien.shawn.wen
Tsung-Hsien Wen

Tsung-Hsien Wen

Tsung-Hsien Wen (Shawn)

Bio

Positions

Ph.D. Student

Compulsory Second Lieutenant Chief Counselor

Teaching Assistant

Part-time Algorithm Developer

Education

Honors, Awards and Grants

Research Summary

Interests

Research Projects

Neural Network for Language Generation

Personalised Language Modeling

Interactive Retrieval

Publications

Filter by type:

Recurrent Neural Network Based Language Model Personalization by Social Network Crowdsourcing

Abstract

Interactive Spoken Content Retrieval by Extended Query Model and Continuous State Space Markov Decision Process

Abstract

Personalized Language Modeling by Crowd Sourcing with Social Network Data for Voice Access of Cloud Applications

Abstract

Interactive Spoken Content Retrieval with Different Types of Action Optimized by a Markov Decision Process

Abstract

Voice Access of Cloud Applications : Language Model Personalization and Interactive Spoken Content Retrieval

Abstract

Contact Me