The first technique that we are going to learn during
our exploration of Machine Learning techniques is Decision tree, a supervised learning method. For all the machine learning techniques, I will
follow ‘Machine Learning in Action’ by Peter Harrington and will update the
readers with the other recommended books when we encounter different concepts.
Simple but effective and also the most used classification technique.
When we have the dataset
in the hand, it is important to learn about it and classify it in a most
optimal way. Decision tree takes decision at each point and splits the dataset.
It is a top down traversal and each split should provide the maximum
information. The key advantage of this technique is when the dataset is huge
and the number of features is also quite high then it is important to find the
best features to split the dataset in order to perform efficient and optimal
classification.
Let’s suppose that we have the following dataset.
Channel
|
Variance
|
Image
Type
|
Monochrome
|
low
|
BW
|
RGB
|
low
|
BW
|
RGB
|
High
|
Color
|
We may try to split the dataset without evaluating the
performance of each split,as shown below.
MATLAB CODE:
%Read and display the
dataset and the Features
dataSet=[{'Mono', 'low',
'BW'},
{'RGB', 'low', 'BW'},
{'RGB', 'High', 'Color'}];
Features={'Channel','Variance'};
display(cell2table(dataSet,'VariableNames',[Features,'ImageType']));
Now let’s try to split the dataset by measuring the
information. Measuring the information before the split and after the split
provides the ‘Information Gain’
Measure the information by means of Entropy. Entropy is the
expected value of the information.
Calculate the Entropy Before the Split
The final decision can be ‘BW’ that represents Black and
White image or the Color Image.
MATLAB CODE:
function Entropy =
Calculate_Entropy(dataSet)
Rlabel = {dataSet{:,size(dataSet,2)}}'; %Extract
the last column (i.e) Assuming that the last column is the Final decision or
Outcome
uqlabel = unique(Rlabel); %Find all the labels and eliminate
the duplicates
Entropy = 0; %Initialize the Entropy
for k=1:size(uqlabel,1)
ProbC =
sum(ismember(Rlabel,uqlabel{k}))/size(dataSet,1); %Probability
of the unique value occuring in the whole feature's column
Entropy = Entropy-(ProbC*log2(ProbC)); %Shanon
Entropy Estimation
end
end
Calculate the Entropy After the Split
Now to find the best feature to make the split, do the
following:
·
Initialize
Best Feature as NULL or zero
·
Initialize
Best Information Gain =0;
·
Extract the values of the first feature (i.e) Channel
Extract the values of the first feature (i.e) Channel
·
Find
its Entropy with respect to the feature ‘Channel’ for all the unique values
Create Subsets based on the unique values in the feature ‘Channel’ is ‘Mono’
and ‘RGB’
Subset Mono,
Image
Type
|
BW
|
Subset RGB,
Image
Type
|
BW
|
Color
|
·
Compare
the Entropy Before the split and Entropy of the first feature.
If the difference between the
Before_Entropy and Entropy of the first feature > Information Gain then Best
Information Gain = Information Gain and Best Feature is the first feature.
Information Gain= Before Entropy –
Entropy of the first feature
= 0.9183– 0.666
= 0.2516
Information Gain (0.2516) > Best
Information Gain(0)
Best Information Gain
|
0.2516
|
Best Feature
|
Channel
|
·
Now extract the values of the Second feature(i.e) Variance
Now extract the values of the Second feature(i.e) Variance
·
Find
its Entropy with respect to the feature ‘Variance’ for all the unique values
Create Subsets based on the unique values in the feature ‘Variance’ is ‘low’
and ‘High’
Subset low,
Image
Type
|
BW
|
BW
|
Subset High,
Image
Type
|
Color
|
·
Note
that Entropy is zero denotes the best
split.
·
Compare
the Entropy Before the split and Entropy of the second feature
If the difference between the
Before_Entropy and Entropy of the Second feature > Information Gain then
Best Information Gain = Information Gain and Best Feature is the Second feature
Information Gain=0.9183(Before_Entropy)
– 0(Entropy of the first feature) = 0.9183
Information Gain (0.9183) > Best
Information Gain(0.2516)
Best Information Gain
|
0.9183
|
Best Feature
|
Variance
|
MATLAB CODE:
function Best_Feature =
find_Best_feature(dataSet,Base_Entropy)
Best_Info = 0; %Initialize
the Best Information Gain
Best_Feature = 0; %Initialize
the Best Feature
for ind = 1:size(dataSet,2)-1
%Traverse through all the columns
in the dataSet
Rlabel = {dataSet{:,ind}}';
uqlabel = unique(Rlabel); %Find the
unique values in each column
New_Entropy = 0;
for k = 1:size(uqlabel,1)
ProbC =
sum(ismember(Rlabel,uqlabel{k}))/size(dataSet,1); %Find the
occurance of the unique value
SubSet =
[dataSet(ismember(Rlabel,uqlabel(k)),size(dataSet,2))]; %Find the
subset that corresponds to the unique values of that particular column
New_Entropy = New_Entropy +
(ProbC.*Calculate_Entropy(SubSet)); %Calculate the entropy with
respect to the particular column or the Feature
end
%Information Gain obtained from
the difference between the before and after split
Info_Gain = Base_Entropy-New_Entropy;
if(Info_Gain>Best_Info)
Best_Info = Info_Gain;
Best_Feature = ind;
end
end
end
Traverse through the dataset until the leaf node is reached
The First feature that will be used to split the dataset is ‘Variance’
and now the other features are traversed to either reach the final result or
split further with respect to the other feature.
·
Select
the best feature column
·
Find
the unique values, ‘low’ and ‘High’
·
Obtain
the subset based on the unique values,
·
For
the value ‘low’
Variance
|
Image
Type
|
low
|
BW
|
low
|
BW
|
Since the result is homogenous as it
is evident from the above table, it can be stated as
If variance is low then Image Type = ‘BW’
For the value ‘High’
Variance
|
Image
Type
|
High
|
Color
|
If variance is High then Image Type =
‘Color’
·
If
the result is not homogeneous, then we can further proceed with finding the
best feature until we obtain homogeneous results.
Finally, the decision tree with the best split is constructed
as shown below.
MATLAB CODE:
function
tree=Traverse_tree(dataSet,Best_Feature,Features,Ranges,tree)
%Find the best features and the
nodes
Rangeval=Ranges(Ranges~=Best_Feature);
Rlabel={dataSet{:,Best_Feature}}';
uqlabel=unique(Rlabel);
sz=size(tree,2);
Prev=tree{sz}.Prev+1;
for k=1:size(uqlabel,1)
Uprev=Prev;
SubSet =
[dataSet(ismember(Rlabel,uqlabel(k)),Rangeval)];
Tlabel=unique({SubSet{:,end}}');
if(numel(Tlabel)==1) %Homogeneous
or same result
%To store the conditions and then
correspoding leaf nodes
sz=sz+1;
tree{sz}.node=char(uqlabel(k));%Condition
Example: 'Low', 'Yes'
tree{sz}.Prev=Uprev;
sz=sz+1;
tree{sz}.node=char(Tlabel); %Final
Decision Example:'BW,'Color'
tree{sz}.Prev=Uprev+1;
else
%Not Homogeneous then Calculate
Entropy and Find the best
%Feature for the Subset and repeat
the procedure
Base_Entropy=Calculate_Entropy(SubSet);
Best_Feature=find_Best_feature(SubSet,Base_Entropy);
%To store the conditions and then
correspoding leaf nodes
sz=sz+1;
tree{sz}.node=char(uqlabel(k));
tree{sz}.Prev=Uprev;
sz=sz+1;
tree{sz}.node=char(Features(Best_Feature));
tree{sz}.Prev=Uprev+1;
Features1={Features{~ismember(Features,Features{Best_Feature})}};
%Find the homogeneous result for
the Subset
tree= Traverse_tree(SubSet,Best_Feature,Features1,[1:numel(Rangeval)],tree);
sz=size(tree,2);
end
end
end
Here is an example with the real
data sets to examine the decision tree that we created using the information
gain. ‘Classification of Black and White Image and Color Image’
Here’s another Example with somewhat messed
up dataset and let’s see how to decide on the splits. Instead of determining
whether the image is BW(Black and White) or Color, the dominating Color in the
image will also be taken into account. Here, it is clear that the number of
features and the values in each feature has increased. The ‘DominantB’ feature
indicates the amount of black or gray pixels in the image. It can be high, low
or medium.
Channel
|
Variance
|
DominantB
|
ImageType
|
Monochrome
|
low
|
High
|
Black
|
Monochrome
|
low
|
low
|
White
|
Monochrome
|
low
|
Medium
|
BW
|
RGB
|
low
|
High
|
Black
|
RGB
|
low
|
low
|
White
|
RGB
|
low
|
Medium
|
BW
|
RGB
|
High
|
High
|
Color
|
RGB
|
High
|
low
|
Color
|
MATLAB CODE:
display(cell2table(dataSet,'VariableNames',[Features,'ImageType']));
Base_Entropy =
Calculate_Entropy(dataSet); %Base Entropy = 2
Best_Feature =
find_Best_feature(dataSet,Base_Entropy); %Best Feature=3
%To store the nodes and
Features
tree={};
tree{1}.node =
char(Features{Best_Feature});
tree{1}.Prev = 0;
Features =
{Features{~ismember(Features,Features{Best_Feature})}};
tree =
Traverse_tree(dataSet,Best_Feature,Features,[1:4],tree);
draw_tree(tree);
Hey Angel, great intro to machine learning with decision trees. This is helpful to start doing image recognition and prevent trademark infringement at a large scale. I appreciate your book referral too. Peter Harrington is a pro.
ReplyDeleteAwesone tutoriall
ReplyDelete
ReplyDeleteThank you for the information, and this is very helpful for me.I highly prefer machine learning with decision trees and random forests book
@MD.ASHRAFUL ALAM
ReplyDeleteThanks for the suggestion!