Thus, splitting the initial data set uniformly at random into a test set and a training set often leads to overoptimistic results when trying to estimate the predictive abilities of a machine learning model in a practical setting. This distributional shift between the initial training data set and the newly collected data set normally leads to a substantial drop in performance of the model on compared to its performance on a test set which follows the same distribution as. Unfortunately, a random uniform data split is rarely a good simulation of practical reality where a newly collected data set which is fed into a machine learning model to obtain predictions almost never follows the data distribution of the data set on which the model was originally trained. This random uniform data split is very much in accordance with the framework of classical statistical learning theory, where one assumes that a learning model is primarily built to deal with training- and test data examples that have all been sampled independently from the same underlying probability distribution. If the data split for the initial data set into training set and test set is done uniformly at random (as is usual), then both and follow the same distribution. The reason for this can be found in the distributional shift between and which frequently occurs when the data collection context (and thus the data generating process) is altered in some way. In practise, one can regularly observe a situation where a machine learning model which performs well on a randomly selected test set fails spectacularly when confronted with novel data which was collected at a later point in time, by a different lab, in a different environment, or in some other context that differs from the original context in which the initial data set was collected. Since in this scenario the model has never seen any of the examples in during training, its performance on must be indicative of its performance on novel data which it will encounter in the future. The model is then subsequently trained on the examples in the training set and afterwards its prediction abilities are measured on the untouched examples in the test set via a suitable performance metric. In supervised machine learning, the standard way to evaluate the generalisation power of a prediction model for a given task is to randomly split the whole available data set into two sets – a training set and a test set. This capability to effectively generalise is amongst the most desirable properties a prediction model (or a mind, for that matter) can have. The ability to successfully apply previously acquired knowledge to novel and unfamiliar situations is one of the main hallmarks of successful learning and general intelligence.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |