Home > ENGINEERING > COMPUTER > CS_TECHREP > 1209
Afshin Gholamy, The University of Texas at El PasoFollow
Vladik Kreinovich, The University of Texas at El PasoFollow
Olga Kosheleva, The University of Texas at El PasoFollow
Comments
Technical Report: UTEP-CS-18-09
Abstract
When learning a dependence from data, to avoid overfitting, it is important to divide the data into the training set and the testing set. We first train our model on the training set, and then we use the data from the testing set to gauge the accuracy of the resulting model. Empirical studies show that the best results are obtained if we use 20-30% of the data for testing, and the remaining 70-80% of the data for training. In this paper, we provide a possible explanation for this empirical result.