T - type of the tokenpublic final class Jaccard<T> extends Object implements SetMetric<T>, SetDistance<T>
similarity(a,b) = ∣a ∩ b∣ / ∣a ∪ b∣
distance(a,b) = 1 - similarity(a,b)
When ∣a ∪ b∣ is empty the multisets have no elements in common.
In this case the similarity is 0 by definition.
Unlike the generalized Jaccard index the occurrence (cardinality) of an entry
is not taken into account. E.g. [hello, world] and
[hello, world, hello, world] would be identical when compared with
the Jaccard index but are dissimilar when the generalized version is used.
Similar to the overlap coefficient which divides the intersection by the size of the smaller of the two sets.
Similar to the dice coefficient which divides the shared information (intersection) by sum of cardinalities.
This class is immutable and thread-safe.
| Constructor and Description |
|---|
Jaccard() |
public float compare(Set<T> a, Set<T> b)
SetMetric
Results are undefined if set1 and set2 are sets based on
different equivalence relations (as HashSet, TreeSet, and
the keySet of an IdentityHashMap all are).
public float distance(Set<T> a, Set<T> b)
SetDistance0.0 indicates that a and
b are similar.
Results are undefined if a and b are sets based on
different equivalence relations (as HashSet, TreeSet, and
the keySet of an IdentityHashMap all are).
Copyright © 2014–2016. All rights reserved.