T - type of the tokenpublic final class OverlapCoefficient<T> extends Object implements SetMetric<T>, SetDistance<T>
similarity(q,r) = ∣q ∩ r∣ / min{∣q∣, ∣r∣}
distance(q,r) = 1 - similarity(q,r)
Unlike the generalized overlap coefficient the occurrence (cardinality) of an
entry is not taken into account. E.g. [hello, world] and
[hello, world, hello, world] would be identical when compared with
the overlap coefficient but are dissimilar when the generalized version is
used.
Similar to the generalized Jaccard similarity which divides the intersection by the union of two multisets.
Similar to the dice coefficient which divides the shared information (intersection) by sum of cardinalities.
This class is immutable and thread-safe.
| Constructor and Description |
|---|
OverlapCoefficient() |
public float distance(Set<T> a, Set<T> b)
SetDistance0.0 indicates that a and
b are similar.
Results are undefined if a and b are sets based on
different equivalence relations (as HashSet, TreeSet, and
the keySet of an IdentityHashMap all are).
public float compare(Set<T> a, Set<T> b)
SetMetric
Results are undefined if set1 and set2 are sets based on
different equivalence relations (as HashSet, TreeSet, and
the keySet of an IdentityHashMap all are).
Copyright © 2014–2016. All rights reserved.