Cosine similarity in data mining

What is Cosine similarity?

Cosine similarity is a measure to find the similarity between two files/documents.

Example of cosine similarity:

What is the similarity between two files, file 1 and file 2?

Formula: cos(file 1, file 2) =  (file 1 · file 2)  /  ||file 1|| ||file 2|| ,

file 1 (0, 3, 0, 0, 2, 0, 0, 2, 0, 5)

file 2(1, 2, 0, 0, 1, 1, 0, 1, 0, 3)

file 1 · file 2  =  0*1 + 3*2 + 0*0 + 0*0 + 2*1 + 0*1 + 0*0 + 2*1 + 0*0 + 5*3  

                     =  25

||d1||= (0*0 + 3*3 + 0*0 + 0*0 + 2*2 + 0*0 + 0*0 + 2*2 + 0*0 + 5*5)0.5

         =(42)0.5  = 6.481

||d2||= (1*1 + 2*2 + 0*0 + 0*0 + 1*1 + 1*1 + 0*0 + 1*1 + 0*0 + 3*3)0.5

          =(17)0.5       = 4.12

cos(d, d2 ) = 0.94

 Click Here to try Automatic Tool of cosine similarity