Lets Learn together... Happy Reading

" Two roads diverged in a wood, and I,
I took the one less traveled by,
And that has made all the difference "-Robert Frost

Decision Tree



The first technique that we are going to learn during our exploration of Machine Learning techniques is Decision tree, a supervised learning method. For all the machine learning techniques, I will follow ‘Machine Learning in Action’ by Peter Harrington and will update the readers with the other recommended books when we encounter different concepts. Simple but effective and also the most used classification technique.
When we have the dataset in the hand, it is important to learn about it and classify it in a most optimal way. Decision tree takes decision at each point and splits the dataset. It is a top down traversal and each split should provide the maximum information. The key advantage of this technique is when the dataset is huge and the number of features is also quite high then it is important to find the best features to split the dataset in order to perform efficient and optimal classification.


Let’s suppose that we have the following dataset.

Channel
Variance
Image Type
Monochrome
low
BW
RGB
low
BW
RGB
High
Color

We may try to split the dataset without evaluating the performance of each split,as shown below.




MATLAB CODE:

%Read and display the dataset and the Features
dataSet=[{'Mono',   'low''BW'},
{'RGB', 'low''BW'},
{'RGB', 'High', 'Color'}];

Features={'Channel','Variance'};

display(cell2table(dataSet,'VariableNames',[Features,'ImageType']));


Now let’s try to split the dataset by measuring the information. Measuring the information before the split and after the split provides the ‘Information Gain’
Measure the information by means of Entropy. Entropy is the expected value of the information.

Calculate the Entropy Before the Split


The final decision can be ‘BW’ that represents Black and White image or the Color Image.




MATLAB CODE:

function Entropy = Calculate_Entropy(dataSet)

 Rlabel = {dataSet{:,size(dataSet,2)}}'; %Extract the last column (i.e) Assuming that the last column is the Final decision or Outcome
 uqlabel = unique(Rlabel);    %Find all the labels and eliminate the duplicates
 Entropy = 0; %Initialize the Entropy
 for k=1:size(uqlabel,1)
    ProbC = sum(ismember(Rlabel,uqlabel{k}))/size(dataSet,1); %Probability of the unique value occuring in the whole feature's column
    Entropy = Entropy-(ProbC*log2(ProbC)); %Shanon Entropy Estimation
 end

end

Calculate the Entropy After the Split

Now to find the best feature to make the split, do the following:
·         Initialize Best Feature as NULL or zero
·         Initialize Best Information Gain =0;
·        
Extract the values of the first feature (i.e) Channel





·         Find its Entropy with respect to the feature ‘Channel’ for all the unique values
    Create Subsets based on the unique values in the feature ‘Channel’ is ‘Mono’ and ‘RGB’
Subset Mono,
Image Type
BW

Subset RGB,
Image Type
BW
Color


 
·         Compare the Entropy Before the split and Entropy of the first feature.
If the difference between the Before_Entropy and Entropy of the first feature > Information Gain then Best Information Gain = Information Gain and Best Feature is the first feature.
Information Gain= Before Entropy – Entropy of the first feature
                            = 0.9183– 0.666
                             = 0.2516
Information Gain (0.2516) > Best Information Gain(0)
Best Information Gain
0.2516
Best Feature
Channel
  
·        
Now extract the values of the Second feature(i.e) Variance

·         Find its Entropy with respect to the feature ‘Variance’ for all the unique values
    Create Subsets based on the unique values in the feature ‘Variance’ is ‘low’ and ‘High’
Subset low,
Image Type
BW
BW


Subset High,
Image Type
Color




·         Note that Entropy is zero  denotes the best split.
·         Compare the Entropy Before the split and Entropy of the second feature
If the difference between the Before_Entropy and Entropy of the Second feature > Information Gain then Best Information Gain = Information Gain and Best Feature is the Second feature
Information Gain=0.9183(Before_Entropy) – 0(Entropy of the first feature) = 0.9183
Information Gain (0.9183) > Best Information Gain(0.2516)

Best Information Gain
0.9183
Best Feature
Variance

MATLAB CODE:
function Best_Feature = find_Best_feature(dataSet,Base_Entropy)
Best_Info = 0; %Initialize the Best Information Gain
Best_Feature = 0; %Initialize the Best Feature
 for ind = 1:size(dataSet,2)-1
    %Traverse through all the columns in the dataSet
    Rlabel = {dataSet{:,ind}}';
    uqlabel = unique(Rlabel); %Find the unique values in each column
    New_Entropy = 0;
   
    for k = 1:size(uqlabel,1)
        ProbC = sum(ismember(Rlabel,uqlabel{k}))/size(dataSet,1); %Find the occurance of the unique value
        SubSet = [dataSet(ismember(Rlabel,uqlabel(k)),size(dataSet,2))]; %Find the subset that corresponds to the unique values of that particular column
        New_Entropy = New_Entropy + (ProbC.*Calculate_Entropy(SubSet)); %Calculate the entropy with respect to the particular column or the Feature
    end
   
    %Information Gain obtained from the difference between the before and after split
    Info_Gain = Base_Entropy-New_Entropy;
   
    if(Info_Gain>Best_Info)
        Best_Info = Info_Gain;
        Best_Feature = ind;
    end
   
    
 end
end

Traverse through the dataset until the leaf node is reached
The First feature that will be used to split the dataset is ‘Variance’ and now the other features are traversed to either reach the final result or split further with respect to the other feature.
·         Select the best feature column
·         Find the unique values, ‘low’ and ‘High’
·         Obtain the subset based on the unique values,
·         For the value ‘low’
Variance
Image Type
low
BW
low
BW
Since the result is homogenous as it is evident from the above table, it can be stated as
If variance is low then Image Type = ‘BW’
For the value ‘High’
Variance
Image Type
High
Color

If variance is High then Image Type = ‘Color’
·         If the result is not homogeneous, then we can further proceed with finding the best feature until we obtain homogeneous results.

Finally, the decision tree with the best split is constructed as shown below.







MATLAB CODE:
function tree=Traverse_tree(dataSet,Best_Feature,Features,Ranges,tree)
        %Find the best features and the nodes
        Rangeval=Ranges(Ranges~=Best_Feature);
        Rlabel={dataSet{:,Best_Feature}}';
        uqlabel=unique(Rlabel);
       
        sz=size(tree,2);
        Prev=tree{sz}.Prev+1;
        for k=1:size(uqlabel,1)
          Uprev=Prev;
          SubSet = [dataSet(ismember(Rlabel,uqlabel(k)),Rangeval)];
          Tlabel=unique({SubSet{:,end}}');
        if(numel(Tlabel)==1) %Homogeneous or same result
            %To store the conditions and then correspoding leaf nodes
            sz=sz+1;
            tree{sz}.node=char(uqlabel(k));%Condition Example: 'Low', 'Yes'
            tree{sz}.Prev=Uprev;
            sz=sz+1;
           
            tree{sz}.node=char(Tlabel); %Final Decision Example:'BW,'Color'
            tree{sz}.Prev=Uprev+1;
           
        else
            %Not Homogeneous then Calculate Entropy and Find the best
            %Feature for the Subset and repeat the procedure
            Base_Entropy=Calculate_Entropy(SubSet);
            Best_Feature=find_Best_feature(SubSet,Base_Entropy);
            %To store the conditions and then correspoding leaf nodes
            sz=sz+1;
            tree{sz}.node=char(uqlabel(k));
            tree{sz}.Prev=Uprev;
            sz=sz+1;
            tree{sz}.node=char(Features(Best_Feature));
            tree{sz}.Prev=Uprev+1;
           
            Features1={Features{~ismember(Features,Features{Best_Feature})}};
            %Find the homogeneous result for the Subset
            tree= Traverse_tree(SubSet,Best_Feature,Features1,[1:numel(Rangeval)],tree);
            sz=size(tree,2);
        end
       
        end
end


Here is an example with the real data sets to examine the decision tree that we created using the information gain. ‘Classification of Black and White Image and Color Image


Here’s another Example with somewhat messed up dataset and let’s see how to decide on the splits. Instead of determining whether the image is BW(Black and White) or Color, the dominating Color in the image will also be taken into account. Here, it is clear that the number of features and the values in each feature has increased. The ‘DominantB’ feature indicates the amount of black or gray pixels in the image. It can be high, low or medium.



Channel
Variance
DominantB
ImageType
Monochrome
low
High
Black
Monochrome
low
low
White
Monochrome
low
Medium
BW
RGB
low
High
Black
RGB
low
low
White
RGB
low
Medium
BW
RGB
High
High
Color
RGB
High
low
Color


MATLAB CODE:

display(cell2table(dataSet,'VariableNames',[Features,'ImageType']));

Base_Entropy = Calculate_Entropy(dataSet); %Base Entropy = 2
Best_Feature = find_Best_feature(dataSet,Base_Entropy); %Best Feature=3
%To store the nodes and Features
tree={};
tree{1}.node = char(Features{Best_Feature});
tree{1}.Prev = 0;
Features = {Features{~ismember(Features,Features{Best_Feature})}};
tree = Traverse_tree(dataSet,Best_Feature,Features,[1:4],tree);
draw_tree(tree);



To conclude, we have seen a supervised method that works on nominal values. The main disadvantage of this method is that it cannot handle continuous values. Let’s see in the upcoming posts on how decision tree can be altered to suit the data  sets with continuous values. 
like button Like "IMAGE PROCESSING" page

Image Classification - Black and White, Color

One of the classic problems is classifying the images. There will be series of posts on this topic which will be closely related to machine learning concepts. Though there are different techniques available in the scientific community and market, let’s start with the basic method of classification - decision tree. Look at the dataset,
Channel
Variance
Image Type
Monochrome
low
BW
RGB
low
BW
RGB
High
Color

There is only two types of classification ‘Black and White’ or ‘Color’
MATLAB CODE:
%Image Classification: BW and Color Image
%List of 'jpeg' images
fname = dir('*.jpeg');

for k=1:size(fname,1)
   
    %Read the image
    Img = imread(fname(k).name);
    %Size of the Image
    mn=size(Img,1)*size(Img,2);
   
    %Estimate the difference along the Channels
    VarI=std(double(Img),0,3);
    BelowT = sum(VarI(:)<=25);
    Prob_BT= BelowT/mn;
   
    if(Prob_BT==1)
        figure(1),imshow(Img);title('Black and White Image');
    else
         figure(2),imshow(Img);title('Color Image');
    end
   
   
  
end

The key feature is the ‘Variance’ that ensures how different each value along the Red, Green and Blue Channels at each pixel location.


For instance, at a particular pixel location if Red channel = 30 , Green channel = 30 and Blue channel = 30 then the variance will be zero which infers that the image is grayscale or BW. Also, we can tend to infer that the image as black and white even when the pixel values in these channels are close to each other.



If we look at the car image and its corresponding variance map, we can see that the pixel values of the RGB channels at a particular pixel location varies greatly and thus the variance map shows this huge difference.  We can say that the picture has been classified correctly.

All the RGB channels have the same value with respect to each pixel location which indicates that the image is gray scale. And variance map indicates zero for the whole image. So the Cat image is BW type.

This zebra image which is a natural black and white image even though its RGB components are not same but they are close to each other and the variance map has done a good job to check the closeness between the RGB channels and it has been classified as BW.


The last one is the swan image. On a first glance it looks like a black and white image and I would like to classify it as BW but the variance map tells a different story. We can see the beak region has a high variance where the RGB components differ greatly while the remaining part of the image is almost having RGB components close to each other. 

The variance map has been quite useful feature but maybe we need to classify the image not as ‘Color Image’ but ‘Black and White dominant Image’








EXAMPLE 2:

Add more features and more classification to make the data set little bit complicated and messy. We have obtained the decision tree for the below mentioned data set from the article on ‘DecisionTree – supervised Learner’. Let’s use it to check out the results.

Channel
Variance
DominantB
ImageType
Monochrome
low
High
Black
Monochrome
low
low
White
Monochrome
low
Medium
BW
RGB
low
High
Black
RGB
low
low
White
RGB
low
Medium
BW
RGB
High
High
Color
RGB
High
low
Color

MATLAB CODE:
%Image Classification: BW, Black Dominant Image, White Dominant Image, Color Image
%List of 'jpeg' images
fname = dir('*.jpeg');

for k = 1:size(fname,1)
   
    %Read the image
    Img = imread(fname(k).name);
    %size of the image
    mn = size(Img,1)*size(Img,2);
   
    %Compute Variance
     VarI = std(double(Img),0,3);
     BelowT = sum(VarI(:)<=14);
     Prob_BT = BelowT/mn; %Number of values below the Threshold
       
       
      T1 = (Img(:,:,1) < 100&Img(:,:,2) < 100&Img(:,:,3) < 100);%Black or gray pixels
      T2 = (Img(:,:,1) > 150&Img(:,:,2) > 150&Img(:,:,3) > 150);%White pixels
      
      T1 = (sum(T1(:)) / mn)*100;
      T2 = (sum(T2(:)) / mn)*100;
     
      if((T1/T2) > 0.8 & (T1/T2) < 1.6) %Medium Criteria for the feature 'DominantB'
        
              figure(1),imshow(Img);title('Image Type:Black and White Image','FontSize',20);
             
         
         
      %When the number of black pixels are high   
      elseif(T1 > T2) %High Criteria for the feature 'DominantB'
          %Variance Low
          if(Prob_BT > 0.6)
              figure(1),imshow(Img);title('Image Type:Black Dominant Image','FontSize',20);
            
          else
              figure(1),imshow(Img);title('Image Type:Color Image','FontSize',20);
             
          end
       %When the number of white pixels are high  
      elseif(T2 > T1)%Low Criteria for the feature 'DominantB'
          %Variance Low
          if(Prob_BT > 0.6)
              figure(1),imshow(Img);title('Image Type:White Dominant Image','FontSize',20);
              
          else
              figure(1),imshow(Img);title('Image Type:Color Image','FontSize',20);
              
          end
      end

    %To visualize each image
    pause
   
end


EXPLANATION:


The number of gray or dark pixels is estimated. Similarly, the number of white or light pixels is estimated. If the ratio between these two is almost equal then the image is black and white. If the gray pixels are more than the white pixels then estimate the variance. Based on the variance map, decide whether the image is color image or black dominant image. In the similar manner, if the white pixels are more than the gray pixels then based on the variance map decide whether the image type is color or white dominant image. The result below illustrates the classification done on the images and how they were classified.




The images used for the classification were taken from website the https://pixabay.com/
like button Like "IMAGE PROCESSING" page

How to Save/Export Images in MATLAB

             
           This blog illustrates some simple stuff to deal with, when the images need to be saved with certain constraints.
Let’s start with simple text figure that needs to be saved in image format.
figure,axis off;
text(0.35,0.5,'\color[rgb]{red} E \color[rgb]{green}=\color[rgb]{blue} mc^2','FontSize',60,'FontName','Times New Roman','FontAngle','italic');

To save the above figure with different background:

The default background color is set to white in MATLAB. In order to have the different background color as in the figure, select ‘Export Setup’ in the figure window and select ‘Rendering’ from properties then unselect the custom color and finally select ‘Apply to figure’. Then select ‘Save As’ to save the image.

The step by step screenshots are as given below:










The final image saved in the local directory is as shown below.

To save the figure with black color background:
Select the custom color and type ‘k’ in the text box and select ‘Apply to figure’.
For Red, type ‘r’ or [1,0,0]. For Blue , type ‘b’ or [0,0,1]. For Green, type ‘g’ or [0 ,1,0]; For cyan ‘c’ or [0.5,1,1] and so on.
NOTE: ‘r’ denotes red color or a 1 x 3 Vector representing [R,G,B] can be used.  The value ranges between 0 and 1. [1,0,0] denotes value of red component is 1 whereas Green and Blue components are zero. In other words, the color Red is chosen.





The final image saved in the local directory is as shown below.

Example 2:
A=imread('cameraman.tif');
figure,imagesc(A);colormap(jet);

To save the image without the padded white space:
Select ‘Export Setup’ from File in the figure window and select ‘Size’ from properties, type the width and height of the image and change the units to ‘points’. Select  ‘Expand the axes to fill figure’ checkbox and finally select ‘Apply to figure’ to view the changes.
Now save the image.




Image saved in the local directory.


To save the image without the axes:
MATLAB code:
A=imread('cameraman.tif');
figure,imagesc(A);colormap(gray);
set(gca, 'XTickLabelMode', 'manual', 'XTickLabel', [],'YTickLabelMode', 'manual', 'YTickLabel', [],'Xtick',[],'Ytick',[]);

After executing the above code, follow the same procedure as the previous example to save the image without white spaces.


The image can be saved by selecting ‘Export’ tab as well.
Image stored in the local directory.


like button Like "IMAGE PROCESSING" page
Previous Post Next Post Home
Google ping Hypersmash.com