Tutorial - IEEE ICMLA17

Tutorial: Machine Learning for Industrial Predictive Analytics

Evgeny Burnaev and Alexey Zaytsev

Skolkovo Institute of Science and Technology (Skoltech)

Address: Skolkovo Innovation Center, 3 Nobel Street, Moscow, 143026, Russia

Emails: e.burnaev@skoltech.ru and a.zaytsev@skoltech.ru Phone: +7 (495) 280 14 81

Abstract

Approximation problems (also known as regression problems) arise quite often in industrial design, and solutions of such problems are conventionally referred to as surrogate models. The most common application of surrogate modeling in engineering is in connection to black-box optimization. Indeed, on the one hand, design optimization plays a central role in the industrial design process; on the other hand, a single optimization step typically requires the optimizer to create or refresh a model of the response function whose optimum is sought, to be able to come up with a reasonable next design candidate. The surrogate models used in optimization range from simple local linear regression employed in the basic gradient-based optimization to complex global models employed in the so-called Surrogate-Based Optimization (SBO). Aside from optimization, surrogate modeling is used in dimension reduction, sensitivity analysis, and for visualization of response functions. In this tutorial we are going to highlight main issues on how to construct and apply surrogate models, describe both state-of-the-art techniques and a few novel approximation algorithms, demonstrate the efficiency of the surrogate modeling methodology on several industrial engineering problems.

Overview

Over the last two decades, there has been an explosion in the ability of engineers to build finite-element models to simulate how a complex product will perform. Therefore, there is a big potential for using optimization to improve an engineering design. One of the major obstacles to the use of optimization, however, is the long running time of the simulations (often overnight) and the lack of gradient information in some of the most complicated simulations (especially crashworthiness), as well as high costs of full-scale experiments. Due to the long running times and the lack of analytic gradients, almost any optimization algorithm applied directly to the simulation will be slow.

The basic idea in the “surrogate modeling” approach is to avoid the temptation to invest one’s computational budget in answering the question at hand and, instead, invest in developing fast mathematical approximations to the long running computer codes and/or costly full-scale experiments using available data. Given these approximations, one can do what-if studies, visual analysis, explore tradeoffs and gain other insights. One can then return e.g. to the long running computer code to test the ideas generated with help of surrogate models and, if necessary, update the approximations and iterate.

While the basic idea of the surrogate model approach sounds simple, the devil is in the details. Therefore, in this tutorial we are going to highlight issues that are important for surrogate modeling in industrial engineering.

How to formulate an industrial engineering problem in terms suitable for processing by machine learning methods?
What points to sample for using when building the surrogate model?
Which surrogate modeling method to use and how to take into account requirements of the subject domain?

How to use the surrogate model to suggest new, improved designs?
How to explore tradeoffs between objectives?
What to do if a simulation has numerical noise in it or experimental noise is high?
And, equally important: where to get the computer code to do all these things?

Such topic is relevant and significant not only for practitioners, but also for the whole machine learning community, since

It is a new and important area with variety of important applications,
Corresponding applied problems motivate researchers to pose new theoretical problem statements that drive ML research.

The last but not the least reason for such relevance and significance is the application of Surrogate-Based Optimization (SBO) in so-called Automatic Machine Learning. Indeed, efficiency of ML models and algorithms crucially rely on human machine learning experts, who select appropriate ML architectures and their hyperparameters. As the complexity of these tasks is often beyond non-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. Thus, considering hyperparameters tuning as a black-box optimization with an expensive objective, we apply full power of SBO to target automation of machine learning.

We will use pdf presentations for the theoretical material and jupyter notebooks in presentation mode to show examples.

Instructors

Two presenters are:

Evgeny Burnaev, Assoc. Prof., e.burnaev@skoltech.ru, page: https://goo.gl/Jumb8h
Alexey Zaytsev, Research Scientist, a.zaytsev@skoltech.ru, page: https://goo.gl/ pzErd2

Evgeny Burnaev

Evgeny is working at the interface between machine learning and applied engineering problems. After getting his MSc in 2006, and successfully defending PhD thesis in Foundations of Computer Science in 2008, Evgeny stayed with the Institute for Information Transmission Problem (IITP) as a head of the Data Analysis and Modeling group.

Evgeny carried out a number of successful industrial projects with Airbus, Eurocopter and Sahara Force India Formula 1 team among others. Data analysis algorithms, developed by Evgeny and his group at IITP, formed a core of the algorithmic software library for surrogate modeling and optimization. Thanks to the developed functionality, engineers can construct fast mathematical approximations to long running computer codes (realizing physical models) based on available data and perform design space exploration for trade-off studies. The soft- ware library passed the final Technology Readiness Level 6 certification in Airbus. According to Airbus experts, application of the library provides the reduction of up to 10% of lead time and cost in several areas of the aircraft design process. Nowadays several dozens of Airbus departments use it. Later a spin-off company DATADVANCE llc. developed a Software platform for Design Space Exploration with GUI based on this algorithmic core.

At Skoltech, Evgeny is actively engaged in the development of educational and research programs, and continues his research in the areas of regression based on Gaussian Processes, bootstrap, confidence sets and conformal predictors, rapid detection of anomalies in complex multicomponent systems and failures prediction.

Alexey Zaytsev

Alexey has expertise in numerical methods, mathematical statistics and usage of data analysis in applications. In 2017 Alexey got his PhD at IITP RAS. His PhD thesis contains new results on effectiveness of Bayesian procedures for Gaussian process regression and a first ever theoretical justification for selection of design of experiments for variable fidelity models as well as minimax errors for Gaussian process regression in the multivariate case. His research was published in a number of peer-reviewed journals and top-ranked conferences such as AISTATS.

During his studies at MIPT Alexey joined a company DATADVANCE and took part in the development of library dedicated to data analysis for engineers. He developed a first ever industry-level tool for data fusion that solves a regression problem for the case of variable fidelity data. Alexey has also completed a number of projects connected with application of data analysis in Aerospace engineering and adjacent areas for such companies as AREVA, TOTAL and Airbus.

References

The important book on surrogate modeling is [1]. Some main publications by the instructors in tutorial area are [2 , 3, 4, 5 , 6, 7, 8 , 9, 10], the full list of relevant papers is given at https://goo.gl/yo0IrN.

[1] Forrester A., Sobester A., Keane A. Engineering design via surrogate modelling: a practical guide. – John Wiley & Sons, 2008.

[2] Burnaev E., Panin I., Sudret B. Effecient Design of Experiments for Sensitivity Analysis based on Polynomial Chaos Expansions. Ann Math Artif Intell (2017). doi:10.1007/s10472- 017-9542-1. https://goo.gl/wql3k7

[3] Burnaev E., Zaytsev A. Large Scale Variable Fidelity Surrogate Modeling. Ann Math Artif Intell (2017). doi:10.1007/s10472-017-9545-y. https://goo.gl/96EKYb

[4] Burnaev E., Zaytsev A. Minimax approach to variable fidelity data interpolation. PMLR 54:652-661, Volume 54: Artificial Intelligence and Statistics, 20-22 April 2017, Fort Lauderdale, FL, USA, 2017. https://goo.gl/P9JHv0

[5] M. Belyaev, E. Burnaev, E. Kapushev, M. Panov, P. Prikhodko, D. Vetrov, D. Yarotsky. GTApprox: Surrogate modeling for industrial design. Advances in Engineering Software 102 (2016) 29-39. https://goo.gl/o8rqT6

[6] E. Burnaev, M. Panov, A. Zaytsev. Regression on the Basis of Non-stationary Gaussian Processes with Bayesian Regularization, Journal of Communications Technology and Electronics, 2016, Vol. 61, No. 6, pp. 661-671. https://goo.gl/72MWnu

[7] E. Burnaev, M. Belyaev, E. Kapushev. Computationally efficient algorithm for Gaussian Processes based regression in case of structured samples. Computational Mathematics and Mathematical Physics, 2016, Vol. 56, No. 4, pp. 499-513, 2016. https://goo.gl/Hec2YZ

[8] E. Burnaev and M. Panov. Adaptive design of experiments based on gaussian processes. In Lecture Notes in Artificial Intelligence. Proceedings of SLDS 2015. A. Gammerman et al. (Eds.), volume 9047, pages 116-126, London, UK, April 20-23 2015. Springer. https:

//goo.gl/TSNx4d

[9] Grihon S., Burnaev E.V., Belyaev M.G. and Prikhodko P.V. Surrogate Modeling of Stabil- ity Constraints for Optimization of Composite Structures. Surrogate-Based Modeling and Optimization. Engineering applications. Eds. by S. Koziel, L. Leifsson. Springer, 2013. P. 359-391. https://goo.gl/9JyLW2

[10] Zaytsev, A. Reliable surrogate modeling of engineering data with more than two levels of fidelity. In Mechanical and Aerospace Engineering (ICMAE), 2016 7th International Conference on. IEEE, 2016. P. 341-345 https://goo.gl/ASPeYi

Presentation slides and iPython notebooks:

Presentations
Presentation 1. Intro into Predictive Modeling and Industrial Analytics Presentation 2. Gaussian Processes Presentation 3. Bayesian optimization Presentation 4. GPs in case of Factorial DoE Presentation 5. Large Scale Variable Fidelity GPs
Python notebooks

References to underlying papers can be found in the last section of the first presentation