Divide and Conquer Methods for Big Data
Inderjit S. Dhillon
The University of Texas at Austin

Data is being generated at a tremendous rate in modern applications as diverse as internet applications, genomics, health care, energy management and social network analysis. There is a great need for developing scalable methods for analyzing these data sets. In this talk, I will present some new Divide-and-Conquer algorithms for various challenging problems in large-scale data analysis. Divide-and-Conquer has been a common paradigm that has been widely used in computer science and scientific computing, for example, in sorting, scalable computation of n-body interactions via the fast multipole method, and eigenvalue computations of symmetric matrices. However, this paradigm has not been widely employed in problems that arise in machine learning. I will introduce some recent divide-and-conquer methods that we have developed for three representative problems: (i) classification using kernel support vector machines, (ii) dimensionality reduction for large-scale social network analysis, and (iii) structure learning of graphical models. For each of these problems, we develop specialized algorithms, in particular, tailored ways of "dividing" the problem into subproblems, solving the subproblems, and finally "conquering" them. It should be noted that the subproblem solutions yield localized models for analyzing the data; an intriguing question is whether the hierarchy of localized models can be combined to yield models that are not only easier to compute, but are also statistically more robust. This is joint work with Cho-Jui Hsieh, Donghyuk Shin and Si Si.

Brief Biography:
Inderjit Dhillon is the Gottesman Family Centennial Professor of Computer Science and Mathematics at UT Austin, where he is also the Director of the ICES Center for Big Data Analytics. His main research interests are in big data, machine learning, network analysis, linear algebra and optimization. He received his B.Tech. degree from IIT Bombay, and Ph.D. from UC Berkeley. Inderjit is an IEEE Fellow as well as a SIAM Fellow. Additionally, he has received several prestigious awards, including the ICES Distinguished Research Award in 2013, the SIAM Outstanding Paper Prize in 2011, the Moncrief Grand Challenge Award in 2010, the SIAM Linear Algebra Prize in 2006, the University Research Excellence Award in 2005, and the NSF Career Award in 2001. He has published over 100 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications.

Deep Learning: Overview and Trends
Andrew Ng (videoconference)

Deep learning is the leading approach to many problems in computer vision, speech recognition, NLP, and other areas. In this presentation, I will give a broad overview of deep learning. I will discuss the key reasons for its success, and the important role that scalability plays. I will also describe unsupervised learning approaches to deep learning--such as the "Google cat" result, in which a neural network learned to recognize cats by watching unlabeled YouTube videos--and discuss why this might become increasingly important. Finally, I will discuss recent trends in deep learning, and some possible future applications.

Brief Biography:
Andrew Ng is Chief Scientist of Baidu; Chairman and Co-founder of Coursera; and an Associate Professor of Computer Science at Stanford University. In 2011 he led the development of Stanford University’s main MOOC (Massive Open Online Courses) platform and also taught an online Machine Learning class to over 100,000 students, leading to the founding of Coursera. Ng’s goal is to give everyone access to a great education, for free. Today, Coursera partners with top universities to offer online courses. With over 9 million students, it is the world's largest MOOC platform. Ng also works on machine learning, with an emphasis on deep learning. He had founded and led the “Google Brain” project, which developed massive-scale deep learning algorithms. This resulted in the “cat” result, in which a massive neural network with 1 billion parameters learned from unlabeled YouTube videos to detect cats. More recently, he is working to build up Baidu Research, which is developing applications of large scale deep learning to computer vision, speech, NLP, and other areas. Recent awards include being named to the Time 100 list of the most influential people in the world; Fortune 40 under 40; and being named by students as one of the top 10 professors across Stanford University.