Class MatchService
- java.lang.Object
-
- com.intuit.fuzzymatcher.component.MatchService
-
public class MatchService extends Object
Entry Point for Fuzzy Matching. This class provides different ways to accept Documents for primarily 3 use case1. De-duplication of data - Where for a given list of documents it finds duplicates 2. Check duplicate for a new data - Where it checks for a new Document a duplicate is present in existing list 3. Check duplicates for bulk inserts - Similar to 2, where a list of new Documents is checked against existing
This also has similar implementation to aggregate results in different formats.
-
-
Constructor Summary
Constructors Constructor Description MatchService()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Map<Document,List<Match<Document>>>applyMatch(Document document, List<Document> matchWith)Use this to check duplicate for a new record, where it checks whether a new Document is a duplicate in existing list Data is aggregated by a given DocumentMap<Document,List<Match<Document>>>applyMatch(List<Document> documents)Use this for De-duplication of data, where for a given list of documents it finds duplicates Data is aggregated by a given DocumentMap<Document,List<Match<Document>>>applyMatch(List<Document> documents, List<Document> matchWith)Use this to check duplicates for bulk inserts, where a list of new Documents is checked against existing list Data is aggregated by a given DocumentMap<String,List<Match<Document>>>applyMatchByDocId(Document document, List<Document> matchWith)Use this to check duplicate for a new record, where it checks whether a new Document is a duplicate in existing list Data is aggregated by a given Document IdMap<String,List<Match<Document>>>applyMatchByDocId(List<Document> documents)Use this for De-duplication of data, where for a given list of documents it finds duplicates Data is aggregated by a given Document IdMap<String,List<Match<Document>>>applyMatchByDocId(List<Document> documents, List<Document> matchWith)Use this to check duplicates for bulk inserts, where a list of new Documents is checked against existing list Data is aggregated by a given Document IdSet<Set<Match<Document>>>applyMatchByGroups(List<Document> documents)Use this for De-duplication of data, where for a given list of documents it finds duplicates Data is aggregated by a given Document Id
-
-
-
Method Detail
-
applyMatch
public Map<Document,List<Match<Document>>> applyMatch(List<Document> documents)
Use this for De-duplication of data, where for a given list of documents it finds duplicates Data is aggregated by a given Document- Parameters:
documents- the list of documents to match against- Returns:
- a map containing the grouping of each document and its corresponding matches
-
applyMatch
public Map<Document,List<Match<Document>>> applyMatch(List<Document> documents, List<Document> matchWith)
Use this to check duplicates for bulk inserts, where a list of new Documents is checked against existing list Data is aggregated by a given Document- Parameters:
documents- the list of documents to match frommatchWith- the list of documents to match against- Returns:
- a map containing the grouping of each document and its corresponding matches
-
applyMatch
public Map<Document,List<Match<Document>>> applyMatch(Document document, List<Document> matchWith)
Use this to check duplicate for a new record, where it checks whether a new Document is a duplicate in existing list Data is aggregated by a given Document- Parameters:
document- the document to matchmatchWith- the list of documents to match against- Returns:
- a map containing the grouping of each document and its corresponding matches
-
applyMatchByDocId
public Map<String,List<Match<Document>>> applyMatchByDocId(Document document, List<Document> matchWith)
Use this to check duplicate for a new record, where it checks whether a new Document is a duplicate in existing list Data is aggregated by a given Document Id- Parameters:
document- the document to matchmatchWith- the list of documents to match against- Returns:
- a map containing the grouping of each document id and its corresponding matches
-
applyMatchByDocId
public Map<String,List<Match<Document>>> applyMatchByDocId(List<Document> documents)
Use this for De-duplication of data, where for a given list of documents it finds duplicates Data is aggregated by a given Document Id- Parameters:
documents- the list of documents to match against- Returns:
- a map containing the grouping of each document id and its corresponding matches
-
applyMatchByDocId
public Map<String,List<Match<Document>>> applyMatchByDocId(List<Document> documents, List<Document> matchWith)
Use this to check duplicates for bulk inserts, where a list of new Documents is checked against existing list Data is aggregated by a given Document Id- Parameters:
documents- the list of documents to match frommatchWith- the list of documents to match against- Returns:
- a map containing the grouping of each document id and its corresponding matches
-
applyMatchByGroups
public Set<Set<Match<Document>>> applyMatchByGroups(List<Document> documents)
Use this for De-duplication of data, where for a given list of documents it finds duplicates Data is aggregated by a given Document Id- Parameters:
documents- the list of documents to match against- Returns:
- a set containing the grouping of all relevant matches. So if A matches B, and B matches C. They will be grouped together
-
-