T - type of the tokenpublic final class CosineSimilarity<T> extends Object implements MultisetMetric<T>, MultisetDistance<T>
similarity(a,b) = a·b / (||a|| * ||b||)
distance(a,b) = 1 - similarity(a,b)
The cosine similarity is identical to the Tanimoto coefficient, but unlike
Tanimoto the occurrence (cardinality) of an entry is taken into account. E.g.
[hello, world] and [hello, world, hello, world] would be
identical when compared with Tanimoto but are dissimilar when the cosine
similarity is used.
This class is immutable and thread-safe.
TanimotoCoefficient,
Wikipedia
Cosine similarity| Constructor and Description |
|---|
CosineSimilarity() |
| Modifier and Type | Method and Description |
|---|---|
float |
compare(com.google.common.collect.Multiset<T> a,
com.google.common.collect.Multiset<T> b)
Measures the similarity between multisets a and b.
|
float |
distance(com.google.common.collect.Multiset<T> a,
com.google.common.collect.Multiset<T> b)
Measures the distance between multisets a and b.
|
String |
toString() |
public float compare(com.google.common.collect.Multiset<T> a, com.google.common.collect.Multiset<T> b)
MultisetMetric
Results are undefined if a and b are based on different
equivalence relations (as HashMultiset and TreeMultiset
are).
public float distance(com.google.common.collect.Multiset<T> a, com.google.common.collect.Multiset<T> b)
MultisetDistance0.0 indicates that a
and b are similar.
Results are undefined if a and b are based on different
equivalence relations (as HashMultiset and TreeMultiset
are).
Copyright © 2014–2016. All rights reserved.