Cosine similarity in data mining

What is Cosine similarity?

Cosine similarity is a measure to find the similarity between two files/documents.

Example of cosine similarity:

What is the similarity between two files, file 1 and file 2?

Formula: cos(file 1, file 2) =  (file 1 · file 2)  /  ||file 1|| ||file 2|| ,

file 1 (0, 3, 0, 0, 2, 0, 0, 2, 0, 5)

file 2(1, 2, 0, 0, 1, 1, 0, 1, 0, 3)

file 1 · file 2  =  0*1 + 3*2 + 0*0 + 0*0 + 2*1 + 0*1 + 0*0 + 2*1 + 0*0 + 5*3  

                     =  25

||d1||= (0*0 + 3*3 + 0*0 + 0*0 + 2*2 + 0*0 + 0*0 + 2*2 + 0*0 + 5*5)0.5

         =(42)0.5  = 6.481

||d2||= (1*1 + 2*2 + 0*0 + 0*0 + 1*1 + 1*1 + 0*0 + 1*1 + 0*0 + 3*3)0.5

          =(17)0.5       = 4.12

cos(d, d2 ) = 0.94

 Click Here to try Automatic Tool of cosine similarity 

Next Similar Tutorials

  1. Proximity Measure for Nominal Attributes – Click Here
  2. Distance measure for asymmetric binary attributes – Click Here
  3. Distance measure for symmetric binary variables – Click Here
  4. Euclidean distance in data mining – Click Here Euclidean distance Excel file – Click Here
  5. Jaccard coefficient similarity measure for asymmetric binary variables – Click Here
By:Prof. Fazal Rehman Shamil
CEO @ T4Tutorials
Last Modified: November 10, 2019

Leave a Reply

Your email address will not be published. Required fields are marked *