ABSTRACT: Supervised classification is a major part of machine learning that has grown in interest over years. In the literature, there are many proposals for classification paradigms and learning algorithms that can be applied to specific classification tasks. Therefore, an honest classifier evaluation and a fair comparison among classification models are key points in order to draw the right conclusions from the results achieved, as well as to choose the best model/paradigm. However, there are many researchers that focus their work on proposing new classification algorithms, leaving the fair evaluation of the results aside. This tutorial presents an overview of performance evaluation methodologies for classifiers. It is organized in five parts. In the first part, we introduce the classification problem and motivate the importance of an honest validation of classification models and model comparison. The second part is devoted to the scores that can be used to measure the goodness of a classifier. The classification error is the most studied and also the most commonly used score. However, there are other scores that may be of interest in certain application domains. The third part of the tutorial is related to estimation methods. We present and motivate the problem of estimating the value of a score for a classifier given a (finite) data set, and we elaborate on different estimation methods as well as on their properties and application domains. The fourth part of the tutorial is dedicated to classifier comparison. In this part, we introduce statistical hypothesis testing and different types of statistical tests that can be used to compare two or more classification models using one or more data sets. Finally, the last part of the tutorial presents recommendations to perform fair classifier evaluation according to specific characteristics of the problem or the data set as well as general best practices in classifier evaluation, in order to obtain fair conclusions from the results.