Having a large collection of data is always a wonderful thing because it allows inferences and dependencies to be gleaned. The problem with great amounts of data is the fact that using all of this data can lead to both overfitting and being struck by the curse of dimensionality. In this post I go over a basic approach to dimension reduction and some reasons why it is so important.
[Read More]Dependency Inference
Data mining and exploratory data analysis are both key weapons in any data science toolbox. An important component of exploring data is determining dependencies and conditional independence relations so as to better be able to understand the underlying structure of data.
[Read More]Handling Missing Data
Missing data is a problem which plagues all manner of science and there are a number of ways which missing data can be dealt with. In this post I introduce the general ideas behind missing data, and demonstrate a few methods with which data missingness can be attacked.
[Read More]