T - type of the tokenpublic final class GeneralizedJaccard<T> extends Object implements MultisetMetric<T>, MultisetDistance<T>
similarity(a,b) = ∣a ∩ b∣ / ∣a ∪ b∣
distance(a,b) = 1 - similarity(a,b)
When ∣a ∪ b∣ is empty the multisets have no elements in common.
In this case the similarity is 0 by definition.
Unlike the Jaccard index the occurrence (cardinality) of an entry is taken
into account. E.g. [hello, world] and
[hello, world, hello, world] would be identical when compared with
the Jaccard index but are dissimilar when the generalized version is used.
This class is immutable and thread-safe.
Jaccard,
Wikipedia - Jaccard
index| Constructor and Description |
|---|
GeneralizedJaccard() |
| Modifier and Type | Method and Description |
|---|---|
float |
compare(com.google.common.collect.Multiset<T> a,
com.google.common.collect.Multiset<T> b)
Measures the similarity between multisets a and b.
|
float |
distance(com.google.common.collect.Multiset<T> a,
com.google.common.collect.Multiset<T> b)
Measures the distance between multisets a and b.
|
String |
toString() |
public float compare(com.google.common.collect.Multiset<T> a, com.google.common.collect.Multiset<T> b)
MultisetMetric
Results are undefined if a and b are based on different
equivalence relations (as HashMultiset and TreeMultiset
are).
public float distance(com.google.common.collect.Multiset<T> a, com.google.common.collect.Multiset<T> b)
MultisetDistance0.0 indicates that a
and b are similar.
Results are undefined if a and b are based on different
equivalence relations (as HashMultiset and TreeMultiset
are).
Copyright © 2014–2016. All rights reserved.