DUMMY TRAP Explained
For Those who have started ML and are falling into the dummy variable trap or are confused with the Dummy Variable Trap : [My answer on Udemy for the above topic]
So,
Case 1:
Let there be two independent variables age and sex, now lets say we have encoded male as 0 and female as 1, so now we see that there is a strong or perfect co-relation between male and female like its either male or female so , we will omit the one hot column created for male and just keep the one hot array for female as if female = 0 then its male.
Case 2:
Let there be two independent var columns, Age and Cat_Color, now in Cat_Color we will have lets say colors { black, white, brown}
so the matrix we get is
black_cat white_cat brown_cat
0 0 1
0 1 0
1 0 0
so we find the first co-relations as when black = 0 and white = 0 then obviously it must be brown.
So we can happily remove the one hot column for brown,
now we see we are left with black and white which don't have any concrete relation that it must be white if not black , yes there is a .50% viable relationship but that ain't enough we need 0 or 1 relationship , as for the case the cat might not be black and even might not be white i.e brown, so we have established a trap free case.
Comments
Post a Comment