Keynotes
Deep Learning Theory and Applications in the Natural Sciences
Pierre Baldi
University of California Irvine, USA
Abstract: The process of learning is essential for building
natural or artificial intelligent systems. Thus, not surprisingly, machine learning
is at the center of artificial intelligence today. And deep learning--essentially
learning in complex systems comprised of multiple processing stages--is at the forefront
of machine learning. In the last few years, deep learning has led to major performance
advances in a variety of engineering disciplines from computer vision, to speech recognition,
to natural language processing, and to robotics. Deep learning systems are now deployed
ubiquitously and used by billions of people every day for instance through cell phones and web
search engines.
We will first provide a brief historical overview of artificial neural networks and deep learning,
starting from their early origins in the 1940s and their connections to biological neural networks
and learning, and ending with examples of some of the most recent successes in engineering applications.
While we do not yet have a comprehensive theory of deep learning, we will also provide a brief overview
of a growing body of theoretical results about deep learning highlighting some of the remaining gaps
and open questions in the field. We will then present various applications of deep learning to problems
in the natural sciences, such as the detection of exotic particles in high-energy physics, the prediction
of molecular properties and reactions in chemistry, and the prediction of protein structures in biology.
Bio: Pierre Baldi earned MS degrees in Mathematics and Psychology from the University of
Paris, and a PhD in Mathematics from the California Institute of Technology. He is currently Chancellor's
Professor in the Department of Computer Science, Director of the Institute for Genomics and Bioinformatics,
and Associate Director of the Center for Machine Learning and Intelligent Systems at the University of
California Irvine. The long term focus of his research is on understanding intelligence in brains and
machines. He has made several contributions to the theory of deep learning, and developed and applied
deep learning methods for problems in the natural sciences such as the detection of exotic particles in
physics, the prediction of reactions in chemistry, and the prediction of protein secondary and tertiary
structure in biology. He has written four books and over 300 peer-reviewed articles. He is the recipient
of the 1993 Lew Allen Award at JPL, the 2010 E. R. Caianiello Prize for research in machine learning, and
a 2014 Google Faculty Research Award. He is and Elected Fellow of the AAAS, AAAI, IEEE, ACM, and ISCB.
Ten Years of Deep Learning and What Lies Ahead for AI Breakthrough
Xiaodong He
Microsoft Research, Redmond, USA
Abstract: Deep learning, which exploits multiple levels of data representations that give rise to hierarchies of concept abstraction, has been the driving force in the recent resurgence of Artificial Intelligence (AI). While only over the past four years have we seen quantum leaps in a wide range of everyday AI applications, the underlying deep learning technology had been brewing since earlier time, notably about ten years ago with the birth of a generative deep learning model called Deep Belief Networks (Hinton et al., 2006). I will first reflect on how generative Deep Belief Networks rapidly evolved into discriminative Deep Neural Networks which has profoundly reshaped the landscapes of speech recognition since 2010 and of ensuing image recognition since 2012. I will then proceed to summarize subsequent rapid advances in other cognitive AI applications including comprehension, reasoning, and generation across vision and natural language, as demonstrated in the Microsoft Vision/Image Captioning Cognitive Service and the recent AI Bot (http://CaptionBot.ai). Next, some key limitations of current deep learning technology are analyzed, including the lack of structure-to-structure transformation and learning for bridging symbolic and neural processing required to achieve human-like deep understanding of and reasoning in natural language with embedded common-sense knowledge. Finally, outlooks for future AI breakthrough are discussed by looking into potential solutions to overcoming the current weaknesses of deep learning, and by examining the similarities and differences between existing deep learning-based AI systems and their neuroscientific/cognitive counterparts.
Bio: Xiaodong He is a Principal Researcher in the Deep Learning Technology Center of Microsoft
Research, Redmond, WA, USA. He is also an Affiliate Professor in the Department of Electrical Engineering at the
University of Washington (Seattle), serves in doctoral supervisory committees. His research interests are mainly in
artificial intelligence areas including deep learning, natural language, computer vision, speech, information retrieval,
and knowledge representation. He has published more than 100 papers in ACL, EMNLP, NAACL, CVPR, SIGIR, WWW, CIKM, NIPS,
ICLR, ICASSP, Proc. IEEE, IEEE TASLP, IEEE SPM, and other venues. He received several awards including the Outstanding
Paper Award at ACL 2015. He has led the development of the MSR-NRC-SRI entry and the MSR entry that won the No. 1 Place
in the 2008 NIST Machine Translation Evaluation and the 2011 IWSLT Evaluation (Chinese-to-English), respectively. More
recently, he and colleagues developed the MSR image captioning system that achieves the highest score in the Turing test
and won the first prize, tied with Google, at the COCO Captioning Challenge 2015. His work was reported by Communications
of the ACM in January 2016. He is leading the image captioning effort now is part of Microsoft Cognitive Services and
CaptionBot.ai. The work was widely covered in media including Business Insider, TechCrunch, Forbes, The Washington Post,
CNN, BBC. He has held editorial positions on several IEEE Journals, served as an area chair for NAACL-HLT 2015, and served
in the organizing committee/program committee of major speech and language processing conferences. He is an elected member
of the IEEE SLTC for the term of 2015-2017. He is a senior member of IEEE and a member of ACL. He was elected as the Chair
of the IEEE Seattle Section in 2016.
Comprehensive Human State Modeling and Its Applications
Ajay Divakaran
SRI International, USA
Abstract: We present a suite of multimodal techniques for assessment of human behavior with cameras and microphones. These techniques drive the sensing module of an interactive simulation trainer in which the trainee has lifelike interaction with a virtual character so as to learn social interaction. We recognize facial expressions, gaze behaviors, gestures, postures, speech and paralinguistics in real-time and transmit the results to the simulation environment which reacts to the trainee's behavior in a manner that serves the overall pedagogical purpose. We will describe the techniques developed and results, comparable to or better than the state of the art, obtained for each of the behavioral cues, as well as identify avenues for further research. Behavior sensing in social interactions poses a few key challenges for each of the cues including the large number of possible behaviors, the high variability in execution of the same behavior within and across individuals and real-time execution. Furthermore, we have the challenge of appropriate fusion of the multimodal cues so as to arrive at a comprehensive assessment of the behavior at multiple time scales. We will also discuss our approach to social interaction modeling using our sensing capability to monitor and model dyadic interactions. We will present a video of the demonstration of the end to end simulation trainer.
Bio: Ajay Divakaran, Ph.D., is a Program Director and leads the Vision and Multi-Sensor group in SRI
International's Vision and Learning Laboratory. Divakaran is currently the principal investigator for a number of SRI
research projects. His work includes multimodal modeling and analysis of affective, cognitive, and physiological aspects
of human behavior, interactive virtual reality-based training, tracking of individuals in dense crowds and multi-camera
tracking, technology for automatic food identification and volume estimation, and audio analysis for event detection in
open-source video. He has developed several innovative technologies for multimodal systems in both commercial and
government programs during the course of his career. Prior to joining SRI in 2008, Divakaran worked at Mitsubishi Electric
Research Labs for 10 years, where he was the lead inventor of the world's first sports highlights playback-enabled DVR.
He also oversaw a wide variety of product applications for machine learning. Divakaran was named a Fellow of the IEEE in
2011 for his contributions to multimedia content analysis. He developed techniques for recognition of agitated speech for
his work on automatic sports highlights extraction from broadcast sports video. He established a sound experimental and
theoretical framework for human perception of action in video sequences as lead-inventor of the MPEG-7 video standard motion
activity descriptor. He serves on Technical Program Committees of key multimedia conferences, and served as an associate
editor of IEEE Transactions on Multimedia from 2007 to 2010. He has authored two books and has more than 100 publications
to his credit, as well as more than 40 issued patents. He was a research associate at the ECE Dept, IISc from September 1994
to February 1995. He was a scientist with Iterated Systems Incorporated, Atlanta, GA, from 1995 to 1998. Divakaran received
his M.S. and Ph.D. degrees in electrical engineering from Rensselaer Polytechnic Institute. His B.E. in electronics and
communication engineering is from the University of Jodhpur in India.

