|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.tika.parser.AbstractParser
org.apache.tika.parser.external.ExternalParser
public class ExternalParser
Parser that uses an external program (like catdoc or pdf2txt) to extract text content and metadata from a given document.
| Field Summary | |
|---|---|
static String |
INPUT_FILE_TOKEN
The token, which if present in the Command string, will be replaced with the input filename. |
static String |
OUTPUT_FILE_TOKEN
The token, which if present in the Command string, will be replaced with the output filename. |
| Constructor Summary | |
|---|---|
ExternalParser()
|
|
| Method Summary | |
|---|---|
static boolean |
check(String[] checkCmd,
int... errorValue)
|
static boolean |
check(String checkCmd,
int... errorValue)
Checks to see if the command can be run. |
String[] |
getCommand()
|
Map<Pattern,String> |
getMetadataExtractionPatterns()
|
Set<MediaType> |
getSupportedTypes()
|
Set<MediaType> |
getSupportedTypes(ParseContext context)
Returns the set of media types supported by this parser when used with the given parse context. |
void |
parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler. |
void |
setCommand(String... command)
Sets the command to be run. |
void |
setMetadataExtractionPatterns(Map<Pattern,String> patterns)
Sets the map of regular expression patterns and Metadata keys. |
void |
setSupportedTypes(Set<MediaType> supportedTypes)
|
| Methods inherited from class org.apache.tika.parser.AbstractParser |
|---|
parse |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String INPUT_FILE_TOKEN
public static final String OUTPUT_FILE_TOKEN
| Constructor Detail |
|---|
public ExternalParser()
| Method Detail |
|---|
public Set<MediaType> getSupportedTypes(ParseContext context)
Parser
context - parse context
public Set<MediaType> getSupportedTypes()
public void setSupportedTypes(Set<MediaType> supportedTypes)
public String[] getCommand()
public void setCommand(String... command)
INPUT_FILE_TOKEN or OUTPUT_FILE_TOKEN
if the command needs filenames.
Runtime.exec(String[])public Map<Pattern,String> getMetadataExtractionPatterns()
public void setMetadataExtractionPatterns(Map<Pattern,String> patterns)
public void parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
throws IOException,
SAXException,
TikaException
setMetadataExtractionPatterns(Map)
has been called to set patterns.
stream - the document stream (input)handler - handler for the XHTML SAX events (output)metadata - document metadata (input and output)context - parse context
IOException - if the document stream could not be read
SAXException - if the SAX events could not be processed
TikaException - if the document could not be parsed
public static boolean check(String checkCmd,
int... errorValue)
checkCmd - The check command to runerrorValue - What is considered an error value?
public static boolean check(String[] checkCmd,
int... errorValue)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||