public final class StringMetrics extends Object
Consists of well known string similarity metrics and methods to create string
similarity metrics from list- or set metrics. All metrics are setup with
sensible defaults, to customize metrics use StringMetricBuilder.
The created similarity metrics are immutable and thread-safe provided all their components are also immutable and thread-safe.
| Modifier and Type | Method and Description |
|---|---|
static StringMetric |
blockDistance()
Returns a block distance similarity metric over tokens in a string.
|
static StringMetric |
cosineSimilarity()
Returns a cosine similarity metric over tokens in a string.
|
static StringMetric |
create(Metric<String> metric)
Deprecated.
Use
StringMetricBuilder in favor of directly
constructing a metric. |
static StringMetric |
create(Metric<String> metric,
Simplifier simplifier)
Deprecated.
Use
StringMetricBuilder in favor of directly
constructing a metric. |
static StringMetric |
createForListMetric(Metric<List<String>> metric,
Simplifier simplifier,
Tokenizer tokenizer)
Deprecated.
Use
StringMetricBuilder in favor of directly
constructing a metric. |
static StringMetric |
createForListMetric(Metric<List<String>> metric,
Tokenizer tokenizer)
Deprecated.
Use
StringMetricBuilder in favor of directly
constructing a metric. |
static StringMetric |
createForMultisetMetric(Metric<com.google.common.collect.Multiset<String>> metric,
Simplifier simplifier,
Tokenizer tokenizer)
Deprecated.
Use
StringMetricBuilder in favor of directly
constructing a metric. |
static StringMetric |
createForMultisetMetric(Metric<com.google.common.collect.Multiset<String>> metric,
Tokenizer tokenizer)
Deprecated.
Use
StringMetricBuilder in favor of directly
constructing a metric. |
static StringMetric |
createForSetMetric(Metric<Set<String>> metric,
Simplifier simplifier,
Tokenizer tokenizer)
Deprecated.
Use
StringMetricBuilder in favor of directly
constructing a metric. |
static StringMetric |
createForSetMetric(Metric<Set<String>> metric,
Tokenizer tokenizer)
Deprecated.
Use
StringMetricBuilder in favor of directly
constructing a metric. |
static StringMetric |
damerauLevenshtein()
Returns a Damerau-Levenshtein similarity metric over tokens in a string.
|
static StringMetric |
dice()
Returns a Dice similarity metric over tokens in a string.
|
static StringMetric |
euclideanDistance()
Returns an Euclidean distance similarity metric over tokens in a string.
|
static StringMetric |
generalizedJaccard()
Returns a generalized Jaccard similarity metric over tokens in a string.
|
static StringMetric |
identity()
Returns an identity string similarity metric.
|
static StringMetric |
jaccard()
Returns a Jaccard similarity metric over tokens in a string.
|
static StringMetric |
jaro()
Returns a Jaro string similarity metric.
|
static StringMetric |
jaroWinkler()
Returns a Jaro-Winkler string similarity metric.
|
static StringMetric |
levenshtein()
Returns a Levenshtein string similarity metric.
|
static StringMetric |
longestCommonSubsequence()
Returns a string similarity metric that uses the
LongestCommonSubsequence metric. |
static StringMetric |
longestCommonSubstring()
Returns a similarity metric that uses the
LongestCommonSubstring
metric. |
static StringMetric |
mongeElkan()
Returns a normalized Monge-Elkan metric over tokens in a string.
|
static StringMetric |
needlemanWunch()
Returns a Needleman-Wunch string similarity metric.
|
static StringMetric |
overlapCoefficient()
Returns an overlap coefficient similarity metric over tokens in a string.
|
static StringMetric |
qGramsDistance()
Returns a q-grams distance similarity metric.
|
static StringMetric |
simonWhite()
Returns a Simon White similarity metric.
|
static StringMetric |
smithWaterman()
Returns a Smith-Waterman string similarity metric.
|
static StringMetric |
smithWatermanGotoh()
Returns a Smith-Waterman-Gotoh string similarity metric.
|
static StringMetric |
soundex()
Deprecated.
will be removed due to a lack of a good use case
|
public static StringMetric cosineSimilarity()
CosineSimilaritypublic static StringMetric blockDistance()
BlockDistancepublic static StringMetric damerauLevenshtein()
DamerauLevenshteinpublic static StringMetric dice()
Dicepublic static StringMetric euclideanDistance()
EuclideanDistancepublic static StringMetric generalizedJaccard()
GeneralizedJaccardpublic static StringMetric identity()
Identitypublic static StringMetric jaccard()
Jaccardpublic static StringMetric jaro()
Jaropublic static StringMetric jaroWinkler()
JaroWinklerpublic static StringMetric levenshtein()
Levenshteinpublic static StringMetric mongeElkan()
MongeElkanpublic static StringMetric needlemanWunch()
NeedlemanWunchpublic static StringMetric overlapCoefficient()
OverlapCoefficientpublic static StringMetric qGramsDistance()
BlockDistancepublic static StringMetric simonWhite()
Implementation based on the ideas as outlined in How to Strike a Match by Simon White.
SimonWhitepublic static StringMetric smithWaterman()
SmithWatermanpublic static StringMetric smithWatermanGotoh()
SmithWatermanGotoh@Deprecated public static StringMetric soundex()
Soundex,
JaroWinklerpublic static StringMetric longestCommonSubsequence()
LongestCommonSubsequence metric.public static StringMetric longestCommonSubstring()
LongestCommonSubstring
metric.@Deprecated public static StringMetric create(Metric<String> metric)
StringMetricBuilder in favor of directly
constructing a metric.metric - a metric for strings@Deprecated public static StringMetric create(Metric<String> metric, Simplifier simplifier)
StringMetricBuilder in favor of directly
constructing a metric.metric - a list metricsimplifier - a simplifierNullPointerException - when either metric or simplifier are nullStringMetricBuilder@Deprecated public static StringMetric createForListMetric(Metric<List<String>> metric, Simplifier simplifier, Tokenizer tokenizer)
StringMetricBuilder in favor of directly
constructing a metric.metric - a list metricsimplifier - a simplifiertokenizer - a tokenizerNullPointerException - when either metric, simplifier or tokenizer are nullStringMetricBuilder@Deprecated public static StringMetric createForListMetric(Metric<List<String>> metric, Tokenizer tokenizer)
StringMetricBuilder in favor of directly
constructing a metric.metric - a list metrictokenizer - a tokenizerNullPointerException - when either metric or tokenizer are nullStringMetricBuilder@Deprecated public static StringMetric createForSetMetric(Metric<Set<String>> metric, Simplifier simplifier, Tokenizer tokenizer)
StringMetricBuilder in favor of directly
constructing a metric.metric - a list metricsimplifier - a simplifiertokenizer - a tokenizerNullPointerException - when either metric, simplifier or tokenizer are nullStringMetricBuilder@Deprecated public static StringMetric createForSetMetric(Metric<Set<String>> metric, Tokenizer tokenizer)
StringMetricBuilder in favor of directly
constructing a metric.metric - a set metrictokenizer - a tokenizerNullPointerException - when either metric or tokenizer are nullStringMetricBuilder@Deprecated public static StringMetric createForMultisetMetric(Metric<com.google.common.collect.Multiset<String>> metric, Simplifier simplifier, Tokenizer tokenizer)
StringMetricBuilder in favor of directly
constructing a metric.metric - a list metricsimplifier - a simplifiertokenizer - a tokenizerNullPointerException - when either metric, simplifier or tokenizer are nullStringMetricBuilder@Deprecated public static StringMetric createForMultisetMetric(Metric<com.google.common.collect.Multiset<String>> metric, Tokenizer tokenizer)
StringMetricBuilder in favor of directly
constructing a metric.metric - a set metrictokenizer - a tokenizerNullPointerException - when either metric or tokenizer are nullStringMetricBuilderCopyright © 2014–2016. All rights reserved.