Principle of Parsimony


The Principle of Parsimony

Principle of Parsimony is also known as Occam’s razor is the problem-solving principle that states that “Entities should not be multiplied without necessity.”  It is a fundamental aspect of science which is often overlooked.

For statistical modelling, the principle of parsimony means that:

  • models should have as few parameters as possible.
  • experiments relying on few assumptions should be preferred to those relying on many.
  • models should be pared down until they are minimal adequate.
  • simple explanations should be preferred to complex explanations.
  • linear models should be preferred to non-linear models.

We can use parsimony principle in many scenarios or events in our day to day life including Data Science model predictions.

Lets us assume two cases: Case 1, where in there are total 8 supporting evidences to explain an event and Case 2, wherein there are 5 supporting evidences to explain an event. So, according to principle of parsimony, we tend to select Case 2, provided all the evidences are important and relevant.

Let us have a look on examples from specific fields.

1. Principle of Parsimony in route selection:

In Data Structures, we come across a theory of shortest spanning tree for simplest route selection. This route selection can be made using many algorithms available in data structures. Example: Prim’s algorithm, Krushkal’s algorithm etc. So, before we construct any algorithm, we consider a theory which would provide us the shortest and the best path without affecting much on the time and cost it takes to reach the destination.

Example: If we have to reach Delhi from Haridwar, the wise way would be to select the simplest and safest path rather than choosing a complex route which takes huge amount of time in the journey and also consumes fuel cost.

2. Principle of Parsimony in Regression technique of Machine Learning domain:

When it comes to model building in linear regression and multiple linear regression technique, we tend to see coefficient of determination, R2 for accuracy of the model we have build.

For example, consider a large dataset which has 8 attributes and 1 target variable. There are many cases when we come across collinearity between multiple variables. In such a scenario, there can be a downfall in the accuracy measure of the model. After multiple comparisons and deletion of the unnecessary variables we may be able to increase the accuracy value of the model.

Let us take an example below:

Z is the dependant variable and A, B, C, D, E, F, G, H, I are the rest of the independent variables to create a multiple linear regression model.

Note: The measure of accuracy can be found out by using any software R, python, etc.

Observe the above three models and the complexity of it in terms of number of independent variables used and its R2 value. It is quite evident that the accuracy measure of Model 2 is 0.85 and has lesser variable as compared to Model 3 and Model 1. So, we conclude that according to principle of parsimony without compromising much on the accuracy of the model we choose simplest model. Here our selection would be Model 2 as compared to other models. There are also other algorithms of Machine Learning and deep learning where we can apply principle of parsimony. For example: Neural Networks, KNN, etc.

3. Principle of Parsimony in Biology:

In the biology field, when it comes to determination of evolutionary relationships between different species; this relationship can be determined by using the application of phylogenetic trees where a tree is constructed by identifying common ancestors. Principle of parsimony is applicable here when we choose the phylogenetic tree which has the least changes.

The “law of parsimony”

The law of parsimony tells us that when there are alternative explanations of events, the simplest one is likely to be correct.

Why Principle of Parsimony is called as Oscam’s Razor?

The principle of parsimony was introduced on the early 14th-century by an english philosopher named William of Occam, who insisted that, given a set of equally good explanations for a given phenomenon, “the correct explanation is the simplest explanation”. It is called Occam’s razor because he ‘shaved’ his explanations down to the bare minimum: his point was that in explaining something, assumptions must not be needlessly multiplied.