org.apache.tika.embedder
Class ExternalEmbedder

java.lang.Object
  extended by org.apache.tika.embedder.ExternalEmbedder
All Implemented Interfaces:
Serializable, Embedder

public class ExternalEmbedder
extends Object
implements Embedder

Embedder that uses an external program (like sed or exiftool) to embed text content and metadata into a given document.

Since:
Apache Tika 1.3
See Also:
Serialized Form

Field Summary
static String METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN
          Token to be replaced with a String array of metadata assignment command arguments
static String METADATA_COMMAND_ARGUMENTS_TOKEN
          Token to be replaced with a String array of metadata assignment command arguments
 
Constructor Summary
ExternalEmbedder()
           
 
Method Summary
static boolean check(String[] checkCmd, int... errorValue)
          Checks to see if the command can be run.
static boolean check(String checkCmd, int... errorValue)
          Checks to see if the command can be run.
 void embed(Metadata metadata, InputStream inputStream, OutputStream outputStream, ParseContext context)
          Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler.
 String[] getCommand()
          Gets the command to be run.
 String getCommandAppendOperator()
          Gets the operator to append rather than replace a value for the command line tool, i.e.
 String getCommandAssignmentDelimeter()
          Gets the delimiter for multiple assignments for the command line tool, i.e.
 String getCommandAssignmentOperator()
          Gets the assignment operator for the command line tool, i.e.
protected  List<String> getCommandMetadataSegments(Metadata metadata)
          Constructs a collection of command line arguments responsible for setting individual metadata fields based on the given metadata.
 Map<Property,String[]> getMetadataCommandArguments()
          Gets the map of Metadata keys to command line parameters.
 Set<MediaType> getSupportedEmbedTypes()
           
 Set<MediaType> getSupportedEmbedTypes(ParseContext context)
          Returns the set of media types supported by this embedder when used with the given parse context.
 boolean isQuoteAssignmentValues()
          Gets whether or not to quote assignment values, i.e.
protected static String serializeMetadata(List<String> metadataCommandArguments)
          Serializes a collection of metadata command line arguments into a single string.
 void setCommand(String... command)
          Sets the command to be run.
 void setCommandAppendOperator(String commandAppendOperator)
          Sets the operator to append rather than replace a value for the command line tool, i.e.
 void setCommandAssignmentDelimeter(String commandAssignmentDelimeter)
          Sets the delimiter for multiple assignments for the command line tool, i.e.
 void setCommandAssignmentOperator(String commandAssignmentOperator)
          Sets the assignment operator for the command line tool, i.e.
 void setMetadataCommandArguments(Map<Property,String[]> arguments)
          Sets the map of Metadata keys to command line parameters.
 void setQuoteAssignmentValues(boolean quoteAssignmentValues)
          Sets whether or not to quote assignment values, i.e.
 void setSupportedEmbedTypes(Set<MediaType> supportedEmbedTypes)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

METADATA_COMMAND_ARGUMENTS_TOKEN

public static final String METADATA_COMMAND_ARGUMENTS_TOKEN
Token to be replaced with a String array of metadata assignment command arguments

See Also:
Constant Field Values

METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN

public static final String METADATA_COMMAND_ARGUMENTS_SERIALIZED_TOKEN
Token to be replaced with a String array of metadata assignment command arguments

See Also:
Constant Field Values
Constructor Detail

ExternalEmbedder

public ExternalEmbedder()
Method Detail

getSupportedEmbedTypes

public Set<MediaType> getSupportedEmbedTypes(ParseContext context)
Description copied from interface: Embedder
Returns the set of media types supported by this embedder when used with the given parse context.

The name differs from the precedence of Parser.getSupportedTypes(ParseContext) so that parser implementations may also choose to implement this interface.

Specified by:
getSupportedEmbedTypes in interface Embedder
Parameters:
context - parse context
Returns:
immutable set of media types

getSupportedEmbedTypes

public Set<MediaType> getSupportedEmbedTypes()

setSupportedEmbedTypes

public void setSupportedEmbedTypes(Set<MediaType> supportedEmbedTypes)

getCommand

public String[] getCommand()
Gets the command to be run. This can include either of #INPUT_FILE_TOKEN or #OUTPUT_FILE_TOKEN if the command needs filenames.

Returns:

setCommand

public void setCommand(String... command)
Sets the command to be run. This can include either of #INPUT_FILE_TOKEN or #OUTPUT_FILE_TOKEN if the command needs filenames.

See Also:
Runtime.exec(String[])

getCommandAssignmentOperator

public String getCommandAssignmentOperator()
Gets the assignment operator for the command line tool, i.e. "=".

Returns:
the assignment operator

setCommandAssignmentOperator

public void setCommandAssignmentOperator(String commandAssignmentOperator)
Sets the assignment operator for the command line tool, i.e. "=".

Parameters:
commandAssignmentOperator -

getCommandAssignmentDelimeter

public String getCommandAssignmentDelimeter()
Gets the delimiter for multiple assignments for the command line tool, i.e. ", ".

Returns:
the assignment delimiter

setCommandAssignmentDelimeter

public void setCommandAssignmentDelimeter(String commandAssignmentDelimeter)
Sets the delimiter for multiple assignments for the command line tool, i.e. ", ".

Parameters:
commandAssignmentDelimeter -

getCommandAppendOperator

public String getCommandAppendOperator()
Gets the operator to append rather than replace a value for the command line tool, i.e. "+=".

Returns:
the append operator

setCommandAppendOperator

public void setCommandAppendOperator(String commandAppendOperator)
Sets the operator to append rather than replace a value for the command line tool, i.e. "+=".

Parameters:
commandAppendOperator -

isQuoteAssignmentValues

public boolean isQuoteAssignmentValues()
Gets whether or not to quote assignment values, i.e. tag='value'. The default is false.

Returns:
whether or not to quote assignment values

setQuoteAssignmentValues

public void setQuoteAssignmentValues(boolean quoteAssignmentValues)
Sets whether or not to quote assignment values, i.e. tag='value'.

Parameters:
quoteAssignmentValues -

getMetadataCommandArguments

public Map<Property,String[]> getMetadataCommandArguments()
Gets the map of Metadata keys to command line parameters.

Returns:
the metadata to CLI param map

setMetadataCommandArguments

public void setMetadataCommandArguments(Map<Property,String[]> arguments)
Sets the map of Metadata keys to command line parameters. Set this to null to disable Metadata embedding.

Parameters:
arguments -

getCommandMetadataSegments

protected List<String> getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting individual metadata fields based on the given metadata.

Parameters:
metadata - the metadata to embed
Returns:
the metadata-related command line arguments

serializeMetadata

protected static String serializeMetadata(List<String> metadataCommandArguments)
Serializes a collection of metadata command line arguments into a single string.

Parameters:
metadataCommandArguments -
Returns:
the serialized metadata arguments string

embed

public void embed(Metadata metadata,
                  InputStream inputStream,
                  OutputStream outputStream,
                  ParseContext context)
           throws IOException,
                  TikaException
Executes the configured external command and passes the given document stream as a simple XHTML document to the given SAX content handler. Metadata is only extracted if setMetadataCommandArguments(Map) has been called to set arguments.

Specified by:
embed in interface Embedder
Parameters:
metadata - document metadata (input and output)
inputStream - the document stream (input)
outputStream - the output stream to write the metadata embedded data to
context - parse context
Throws:
IOException - if the document stream could not be read
TikaException - if the document could not be parsed

check

public static boolean check(String checkCmd,
                            int... errorValue)
Checks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.

Parameters:
checkCmd - the check command to run
errorValue - what is considered an error value?
Returns:
whether or not the check completed without error

check

public static boolean check(String[] checkCmd,
                            int... errorValue)
Checks to see if the command can be run. Typically used with something like "myapp --version" to check to see if "myapp" is installed and on the path.

Parameters:
checkCmd - the check command to run
errorValue - what is considered an error value?
Returns:
whether or not the check completed without error


Copyright © 2007-2013 The Apache Software Foundation. All Rights Reserved.