The entropy of a dataset is the average level of “information“, “surprise”, or “uncertainty” inherent in the data row’s possible outcomes. Entropy – A Key Concept for All Data Science Beginners

Any dataset is multi-dimensional.  Each column is a dimension.  Spatial column is a dimension as well. 

The entropy of the dataset will reduced if you increase the dimension.

"Search keyword" will reduce entropy down to a level between "1" and "0" depends on how many keywords and what keyword you search on

You can measure the entropy (information density) by click here 

 

 

1) The raw dataset has the max entropy  "1"

               Any dataset on Socrata, Arcgis Server, Arcgis online, are "raw" dataset. "Raw" dataset are unclassified dataset, heterogeneous data mixed together, has the most "uncertainty", most "surprise".  The information gain is very low. That is why you feel those unclassified "raw" data are not very useful. 

              In this example, a data item could be any city, any type of animal, or any time, etc... 

Click here open full screen in a new tab

 

 

 

 

 

2) Adding one dimension to reduce entropy

and increase information gain

                By adding animal type as dimension, the filted data item must be a "bird",  but could be in any city or any time. The entropy was reduced, the information gain increased. You will feel this filtered classifited dataset is more      useful    than the previous "raw" unclassified dataset. 

Click here open full screen in a new tab

 

 

 

 

 

 

3) By adding two dimension, we continue to reduce entropy

and increase information gain.

      By adding two dimension, animal type and Jurisdiction City, we are certain data item is Bird and in city of beaumont. But we still not sure their deposition type and time and so on. Compare to previous "one dimension" and "raw" data, we feel much more useful since we reduce more entropy and get more information gain. 

Click here open full screen in a new tab

 


4) By adding three dimension, ( you can keep drill down, keep adding dimension, the entropy will be unlimited close to "0")

     

 

 

5) "Search keyword" will reduce entropy down to a level between "1" and "0" depends on how many keywords and what keyword you search on

 

 

                       5.1)  Search "beaumont",  the entropy is same as "one dimension", because the key words you search is only one word, and that is a unit of a dimension. 

Click here open full screen in a new tab

 

                 5.2)  Search "beaumont bird",  the entropy is same as "two dimension", because the key words you search is two words, and each word is a unit of a dimension.

 

Click here open full screen in a new tab

 

 

 


Click here to open in a new tab

by

Please log in or register