sampling-data
Parent: Data in ML
Source: google-ml-course
Sampling data in ML
Three basic assumptions
- Data points are drawn independently and identically (i.i.d.), and at random, from the data distribution, i.e. the data points don’t influence each other
- The data distribution doesn’t change over time (stationary)
- The data is pulled from the same distribution, for both training and validation sets
Situations where these assumptions may be violated:
- change in user perception, therefore resulting in different labelling of a dataset
- change in population which result in new demographics or target market