org.apache.tika.parser.txt
Class TXTParser
java.lang.Object
org.apache.tika.parser.AbstractParser
org.apache.tika.parser.txt.TXTParser
- All Implemented Interfaces:
- Serializable, org.apache.tika.parser.Parser
public class TXTParser
- extends org.apache.tika.parser.AbstractParser
Plain text parser. The text encoding of the document stream is
automatically detected based on the byte patterns found at the
beginning of the stream. The input metadata key
HttpHeaders.CONTENT_ENCODING is used
as an encoding hint if the automatic encoding detection fails.
This parser sets the following output metadata entries:
HttpHeaders.CONTENT_TYPE
text/plain
HttpHeaders.CONTENT_ENCODING
- The detected text encoding of the document.
-
HttpHeaders.CONTENT_LANGUAGE and
DublinCore.LANGUAGE
- See Also:
- Serialized Form
|
Method Summary |
Set<org.apache.tika.mime.MediaType> |
getSupportedTypes(org.apache.tika.parser.ParseContext context)
|
void |
parse(InputStream stream,
ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
|
| Methods inherited from class org.apache.tika.parser.AbstractParser |
parse |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TXTParser
public TXTParser()
getSupportedTypes
public Set<org.apache.tika.mime.MediaType> getSupportedTypes(org.apache.tika.parser.ParseContext context)
parse
public void parse(InputStream stream,
ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
throws IOException,
SAXException,
org.apache.tika.exception.TikaException
- Throws:
IOException
SAXException
org.apache.tika.exception.TikaException
Copyright © 2007-2011 The Apache Software Foundation. All Rights Reserved.