com.cybozu.labs.langdetect
Class Detector

java.lang.Object
  extended by com.cybozu.labs.langdetect.Detector

public class Detector
extends Object

Detector class is to detect language from specified text. Its instance is able to be constructed via the factory class DetectorFactory.

After appending a target text to the Detector instance with append(Reader) or append(String), the detector provides the language detection results for target text via detect() or getProbabilities(). detect() method returns a single language name which has the highest probability. getProbabilities() methods returns a list of multiple languages and their probabilities.

The detector has some parameters for language detection. See setAlpha(double), setMaxTextLength(int) and setPriorMap(HashMap).

 import java.util.ArrayList;
 import com.cybozu.labs.langdetect.Detector;
 import com.cybozu.labs.langdetect.DetectorFactory;
 import com.cybozu.labs.langdetect.Language;
 
 class LangDetectSample {
     public void init(String profileDirectory) throws LangDetectException {
         DetectorFactory.loadProfile(profileDirectory);
     }
     public String detect(String text) throws LangDetectException {
         Detector detector = DetectorFactory.create();
         detector.append(text);
         return detector.detect();
     }
     public ArrayList detectLangs(String text) throws LangDetectException {
         Detector detector = DetectorFactory.create();
         detector.append(text);
         return detector.getProbabilities();
     }
 }
 

Author:
Nakatani Shuyo
See Also:
DetectorFactory

Constructor Summary
Detector(DetectorFactory factory)
          Constructor.
 
Method Summary
 void append(Reader reader)
          Append the target text for language detection.
 void append(String text)
          Append the target text for language detection.
 String detect()
          Detect language of the target text and return the language name which has the highest probability.
 ArrayList<Language> getProbabilities()
          Get language candidates which have high probabilities
 void setAlpha(double alpha)
          Set smoothing parameter.
 void setMaxTextLength(int max_text_length)
          Specify max size of target text to use for language detection.
 void setPriorMap(HashMap<String,Double> priorMap)
          Set prior information about language probabilities.
 void setVerbose()
          Set Verbose Mode(use for debug).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Detector

public Detector(DetectorFactory factory)
Constructor. Detector instance can be constructed via DetectorFactory.create().

Parameters:
factory - DetectorFactory instance (only DetectorFactory inside)
Method Detail

setVerbose

public void setVerbose()
Set Verbose Mode(use for debug).


setAlpha

public void setAlpha(double alpha)
Set smoothing parameter. The default value is 0.5(i.e. Expected Likelihood Estimate).

Parameters:
alpha - the smoothing parameter

setPriorMap

public void setPriorMap(HashMap<String,Double> priorMap)
                 throws LangDetectException
Set prior information about language probabilities.

Parameters:
priorMap - the priorMap to set
Throws:
LangDetectException

setMaxTextLength

public void setMaxTextLength(int max_text_length)
Specify max size of target text to use for language detection. The default value is 10000(10KB).

Parameters:
max_text_length - the max_text_length to set

append

public void append(Reader reader)
            throws IOException
Append the target text for language detection. This method read the text from specified input reader. If the total size of target text exceeds the limit size specified by setMaxTextLength(int), the rest is cut down.

Parameters:
reader - the input reader (BufferedReader as usual)
Throws:
IOException - Can't read the reader.

append

public void append(String text)
Append the target text for language detection. If the total size of target text exceeds the limit size specified by setMaxTextLength(int), the rest is cut down.

Parameters:
text - the target text to append

detect

public String detect()
              throws LangDetectException
Detect language of the target text and return the language name which has the highest probability.

Returns:
detected language name which has most probability.
Throws:
LangDetectException - code = ErrorCode.CantDetectError : Can't detect because of no valid features in text

getProbabilities

public ArrayList<Language> getProbabilities()
                                     throws LangDetectException
Get language candidates which have high probabilities

Returns:
possible languages list (whose probabilities are over PROB_THRESHOLD, ordered by probabilities descendently
Throws:
LangDetectException - code = ErrorCode.CantDetectError : Can't detect because of no valid features in text


Copyright © 2012. All Rights Reserved.