org.apache.tika.parser.html
Class HtmlEncodingDetector

java.lang.Object
  extended by org.apache.tika.parser.html.HtmlEncodingDetector
All Implemented Interfaces:
org.apache.tika.detect.EncodingDetector

public class HtmlEncodingDetector
extends Object
implements org.apache.tika.detect.EncodingDetector

Character encoding detector for determining the character encoding of a HTML document based on the potential charset parameter found in a Content-Type http-equiv meta tag somewhere near the beginning. Especially useful for determining the type among multiple closely related encodings (ISO-8859-*) for which other types of encoding detection are unreliable.

Since:
Apache Tika 1.2

Constructor Summary
HtmlEncodingDetector()
           
 
Method Summary
 Charset detect(InputStream input, org.apache.tika.metadata.Metadata metadata)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlEncodingDetector

public HtmlEncodingDetector()
Method Detail

detect

public Charset detect(InputStream input,
                      org.apache.tika.metadata.Metadata metadata)
               throws IOException
Specified by:
detect in interface org.apache.tika.detect.EncodingDetector
Throws:
IOException


Copyright © 2007-2013 The Apache Software Foundation. All Rights Reserved.