org.apache.tika.parser.txt
Class TXTParser
java.lang.Object
org.apache.tika.parser.txt.TXTParser
- All Implemented Interfaces:
- java.io.Serializable, Parser
public class TXTParser
- extends java.lang.Object
- implements Parser
Plain text parser. The text encoding of the document stream is
automatically detected based on the byte patterns found at the
beginning of the stream. The input metadata key
HttpHeaders.CONTENT_ENCODING is used
as an encoding hint if the automatic encoding detection fails.
This parser sets the following output metadata entries:
HttpHeaders.CONTENT_TYPE
text/plain
HttpHeaders.CONTENT_ENCODING
- The detected text encoding of the document.
-
HttpHeaders.CONTENT_LANGUAGE and
DublinCore.LANGUAGE
- See Also:
- Serialized Form
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TXTParser
public TXTParser()
getSupportedTypes
public java.util.Set<MediaType> getSupportedTypes(ParseContext context)
- Specified by:
getSupportedTypes in interface Parser
parse
public void parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
throws java.io.IOException,
org.xml.sax.SAXException,
TikaException
- Specified by:
parse in interface Parser
- Throws:
java.io.IOException
org.xml.sax.SAXException
TikaException
parse
public void parse(java.io.InputStream stream,
org.xml.sax.ContentHandler handler,
Metadata metadata)
throws java.io.IOException,
org.xml.sax.SAXException,
TikaException
- Deprecated. This method will be removed in Apache Tika 1.0.
- Specified by:
parse in interface Parser
- Throws:
java.io.IOException
org.xml.sax.SAXException
TikaException
Copyright © 2007-2011 The Apache Software Foundation. All Rights Reserved.