Class PreProcessFunction<T>


  • public class PreProcessFunction<T>
    extends Object
    A functional interface to pre-process the elements. These function are applied to element.value String's
    • Constructor Detail

      • PreProcessFunction

        public PreProcessFunction()
    • Method Detail

      • trim

        public static java.util.function.Function<String,​String> trim()
        Uses Apache commons StringUtils trim method
        Returns:
        the function to perform trim
      • toLowerCase

        public static java.util.function.Function<String,​String> toLowerCase()
        Uses Apache commons StringUtils lowerCase method
        Returns:
        the function to perform toLowerCase
      • numericValue

        public static java.util.function.Function<String,​String> numericValue()
        replaces all non-numeric characters in a string
        Returns:
        the function to perform numericValue
      • removeSpecialChars

        public static java.util.function.Function<String,​String> removeSpecialChars()
        removes special characters in a string
        Returns:
        the function to perform removeSpecialChars
      • removeDomain

        public static java.util.function.Function<String,​String> removeDomain()
        Used for emails, remove everything after the '@' character
        Returns:
        the function to perform removeDomain
      • addressPreprocessing

        public static java.util.function.Function<String,​String> addressPreprocessing()
        applies both "RemoveSpecialChars" and also "addressNormalization" functions
        Returns:
        the function to perform addressPreprocessing
      • namePreprocessing

        public static java.util.function.Function<String,​String> namePreprocessing()
        applies "removeTrailingNumber", "removeSpecialChars" and "nameNormalization" functions
        Returns:
        the function to perform namePreprocessing
      • addressNormalization

        public static java.util.function.Function<String,​String> addressNormalization()
        Uses "address-dictionary" to normalize commonly uses string in addresses eg. "st.", "street", "ave", "avenue"
        Returns:
        the function to perform addressNormalization
      • removeTrailingNumber

        public static java.util.function.Function<String,​String> removeTrailingNumber()
        Removes numeric character from the end of a string
        Returns:
        the function to perform removeTrailingNumber
      • nameNormalization

        public static java.util.function.Function<String,​String> nameNormalization()
        Uses "name-dictionary" to remove common prefix and suffix in user names. like "jr", "sr", etc It also removes commonly used words in company names "corp", "inc", etc
        Returns:
        the function to perform nameNormalization
      • usPhoneNormalization

        public static java.util.function.Function<String,​String> usPhoneNormalization()
        For a 10 character string, it prefixes it with US international code of "1".
        Returns:
        the function to perform usPhoneNormalization
      • numberPreprocessing

        public static java.util.function.Function numberPreprocessing()
        removes all characters and retains only double numbers
        Returns:
        PreProcessFunction
      • none

        public static java.util.function.Function none()
        Does nothing, used for already preprocessed values
        Returns:
        PreProcessFunction