T - type of the tokenpublic final class Dice<T> extends Object implements SetMetric<T>, SetDistance<T>
similarity(a,b) = 2 * ∣a ∩ b∣ / (∣a∣ + ∣b∣)
distance(a,b) = 1 - similarity(a,b)
The Dice similarity coefficient is identical to SimonWhite, but unlike Simon
White the occurrence (cardinality) of an entry is not taken into account.
E.g. [hello, world] and [hello, world, hello, world] would be
identical when compared with Dice but are dissimilar when Simon White is
used.
Similar to the overlap coefficient which divides the intersection by the size of the smaller of the two sets.
Similar to the generalized Jaccard similarity which divides the intersection by the union of two multisets.
This class is immutable and thread-safe.
| Constructor and Description |
|---|
Dice() |
public float compare(Set<T> a, Set<T> b)
SetMetric
Results are undefined if set1 and set2 are sets based on
different equivalence relations (as HashSet, TreeSet, and
the keySet of an IdentityHashMap all are).
public float distance(Set<T> a, Set<T> b)
SetDistance0.0 indicates that a and
b are similar.
Results are undefined if a and b are sets based on
different equivalence relations (as HashSet, TreeSet, and
the keySet of an IdentityHashMap all are).
Copyright © 2014–2016. All rights reserved.