Divide and Conquer Methods for Big Data
Inderjit S. Dhillon
The University of Texas at Austin
Data is being generated at a tremendous rate in modern applications as diverse
as internet applications, genomics, health care, energy management and social
network analysis. There is a great need for developing scalable methods for
analyzing these data sets. In this talk, I will present some new Divide-and-Conquer
algorithms for various challenging problems in large-scale data analysis.
Divide-and-Conquer has been a common paradigm that has been widely used in
computer science and scientific computing, for example, in sorting, scalable
computation of n-body interactions via the fast multipole method, and eigenvalue
computations of symmetric matrices. However, this paradigm has not been widely
employed in problems that arise in machine learning. I will introduce some
recent divide-and-conquer methods that we have developed for three representative
problems: (i) classification using kernel support vector machines,
(ii) dimensionality reduction for large-scale social network analysis, and
(iii) structure learning of graphical models. For each of these problems, we
develop specialized algorithms, in particular, tailored ways of "dividing" the
problem into subproblems, solving the subproblems, and finally "conquering" them.
It should be noted that the subproblem solutions yield localized models for
analyzing the data; an intriguing question is whether the hierarchy of
localized models can be combined to yield models that are not only easier to
compute, but are also statistically more robust.
This is joint work with Cho-Jui Hsieh, Donghyuk Shin and Si Si.
Inderjit Dhillon is the Gottesman Family Centennial Professor of Computer Science and Mathematics at UT Austin, where he is also the Director of the ICES Center for Big Data Analytics. His main research interests are in big data, machine learning, network analysis, linear algebra and optimization. He received his B.Tech. degree from IIT Bombay, and Ph.D. from UC Berkeley. Inderjit is an IEEE Fellow as well as a SIAM Fellow. Additionally, he has received several prestigious awards, including the ICES Distinguished Research Award in 2013, the SIAM Outstanding Paper Prize in 2011, the Moncrief Grand Challenge Award in 2010, the SIAM Linear Algebra Prize in 2006, the University Research Excellence Award in 2005, and the NSF Career Award in 2001. He has published over 100 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications.
Deep Learning: Overview and Trends
Andrew Ng (videoconference)
Deep learning is the leading approach to many problems in computer vision,
speech recognition, NLP, and other areas. In this presentation, I will give a
broad overview of deep learning. I will discuss the key reasons for its
success, and the important role that scalability plays. I will also describe
unsupervised learning approaches to deep learning--such as the "Google cat"
result, in which a neural network learned to recognize cats by watching
unlabeled YouTube videos--and discuss why this might become increasingly
important. Finally, I will discuss recent trends in deep learning, and
some possible future applications.
Andrew Ng is Chief Scientist of Baidu; Chairman and Co-founder of
Coursera; and an Associate Professor of Computer Science at Stanford
In 2011 he led the development of Stanford University’s main MOOC (Massive
Open Online Courses) platform and also taught an online Machine Learning class
to over 100,000 students, leading to the founding of Coursera. Ng’s goal is
to give everyone access to a great education, for free. Today, Coursera
partners with top universities to offer online courses. With over 9 million
students, it is the world's largest MOOC platform.
Ng also works on machine learning, with an emphasis on deep learning. He had
founded and led the “Google Brain” project, which developed massive-scale deep
learning algorithms. This resulted in the “cat” result, in which a massive
neural network with 1 billion parameters learned from unlabeled YouTube videos
to detect cats. More recently, he is working to build up Baidu Research, which
is developing applications of large scale deep learning to computer vision,
speech, NLP, and other areas.
Recent awards include being named to the Time 100 list of the most influential
people in the world; Fortune 40 under 40; and being named by students as one
of the top 10 professors across Stanford University.