| Package | Description |
|---|---|
| org.apache.tika |
Apache Tika.
|
| org.apache.tika.detect |
Media type detection.
|
| org.apache.tika.embedder | |
| org.apache.tika.extractor |
Extraction of component documents.
|
| org.apache.tika.fork |
Forked parser.
|
| org.apache.tika.io |
IO utilities.
|
| org.apache.tika.metadata |
Multi-valued metadata container, and set of constant metadata fields.
|
| org.apache.tika.metadata.filter | |
| org.apache.tika.mime |
Media type information.
|
| org.apache.tika.parser |
Tika parsers.
|
| org.apache.tika.parser.digest | |
| org.apache.tika.parser.external |
External parser process.
|
| org.apache.tika.parser.external2 | |
| org.apache.tika.parser.multiple | |
| org.apache.tika.pipes | |
| org.apache.tika.pipes.emitter | |
| org.apache.tika.pipes.fetcher | |
| org.apache.tika.pipes.fetcher.fs | |
| org.apache.tika.pipes.fetcher.url | |
| org.apache.tika.renderer | |
| org.apache.tika.sax |
SAX utilities.
|
| org.apache.tika.utils |
Utilities.
|
| Modifier and Type | Method and Description |
|---|---|
String |
Tika.detect(InputStream stream,
Metadata metadata)
Detects the media type of the given document.
|
Reader |
Tika.parse(File file,
Metadata metadata)
Parses the given file and returns the extracted text content.
|
Reader |
Tika.parse(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
Reader |
Tika.parse(Path path,
Metadata metadata)
Parses the file at the given path and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata)
Parses the given document and returns the extracted text content.
|
String |
Tika.parseToString(InputStream stream,
Metadata metadata,
int maxLength)
Parses the given document and returns the extracted text content.
|
| Modifier and Type | Method and Description |
|---|---|
MediaType |
OverrideDetector.detect(InputStream input,
Metadata metadata)
Deprecated.
|
MediaType |
FileCommandDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
TextDetector.detect(InputStream input,
Metadata metadata)
Looks at the beginning of the document input stream to determine
whether the document is text or not.
|
Charset |
EncodingDetector.detect(InputStream input,
Metadata metadata)
Detects the character encoding of the given text document, or
null if the encoding of the document can not be detected. |
MediaType |
ZeroSizeFileDetector.detect(InputStream stream,
Metadata metadata) |
Charset |
CompositeEncodingDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
MagicDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
CompositeDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
TrainedModelDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
EmptyDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
NameDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on the document
name given in the input metadata.
|
MediaType |
Detector.detect(InputStream input,
Metadata metadata)
Detects the content type of the given input document.
|
Charset |
NonDetectingEncodingDetector.detect(InputStream input,
Metadata metadata) |
MediaType |
TypeDetector.detect(InputStream input,
Metadata metadata)
Detects the content type of an input document based on a type hint
given in the input metadata.
|
| Constructor and Description |
|---|
AutoDetectReader(InputStream stream,
Metadata metadata) |
AutoDetectReader(InputStream stream,
Metadata metadata,
EncodingDetector encodingDetector) |
AutoDetectReader(InputStream stream,
Metadata metadata,
ServiceLoader loader) |
| Modifier and Type | Method and Description |
|---|---|
void |
ExternalEmbedder.embed(Metadata metadata,
InputStream inputStream,
OutputStream outputStream,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
void |
Embedder.embed(Metadata metadata,
InputStream originalStream,
OutputStream outputStream,
ParseContext context)
Embeds related document metadata from the given metadata object into the
given output stream.
|
protected List<String> |
ExternalEmbedder.getCommandMetadataSegments(Metadata metadata)
Constructs a collection of command line arguments responsible for setting
individual metadata fields based on the given
metadata. |
| Modifier and Type | Method and Description |
|---|---|
String |
EmbeddedDocumentUtil.getExtension(TikaInputStream is,
Metadata metadata) |
EmbeddedDocumentExtractor |
ParsingEmbeddedDocumentExtractorFactory.newInstance(Metadata metadata,
ParseContext parseContext) |
EmbeddedDocumentExtractor |
EmbeddedDocumentExtractorFactory.newInstance(Metadata metadata,
ParseContext parseContext) |
void |
ParsingEmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
void |
EmbeddedDocumentExtractor.parseEmbedded(InputStream stream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml)
Processes the supplied embedded resource, calling the delegating
parser with the appropriate details.
|
void |
EmbeddedDocumentUtil.parseEmbedded(InputStream inputStream,
ContentHandler handler,
Metadata metadata,
boolean outputHtml) |
static void |
EmbeddedDocumentUtil.recordEmbeddedStreamException(Throwable t,
Metadata m) |
static void |
EmbeddedDocumentUtil.recordException(Throwable t,
Metadata m) |
boolean |
DocumentSelector.select(Metadata metadata)
Checks if a document with the given metadata matches the specified
selection criteria.
|
boolean |
ParsingEmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
EmbeddedDocumentExtractor.shouldParseEmbedded(Metadata metadata) |
boolean |
EmbeddedDocumentUtil.shouldParseEmbedded(Metadata m) |
boolean |
DefaultEmbeddedStreamTranslator.shouldTranslate(InputStream inputStream,
Metadata metadata)
This should sniff the stream to determine if it needs to be translated.
|
boolean |
EmbeddedStreamTranslator.shouldTranslate(InputStream inputStream,
Metadata metadata) |
InputStream |
DefaultEmbeddedStreamTranslator.translate(InputStream inputStream,
Metadata metadata)
This will consume the InputStream and return a new stream of translated bytes.
|
InputStream |
EmbeddedStreamTranslator.translate(InputStream inputStream,
Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
void |
ForkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
This sends the objects to the server for parsing, and the server via
the proxies acts on the handler as if it were updating it directly.
|
| Modifier and Type | Method and Description |
|---|---|
Path |
TemporaryResources.createTempFile(Metadata metadata)
Creates a temporary file that will automatically be deleted when
the
TemporaryResources.close() method is called, returning its path. |
static TikaInputStream |
TikaInputStream.get(Blob blob,
Metadata metadata)
Creates a TikaInputStream from the given database BLOB.
|
static TikaInputStream |
TikaInputStream.get(byte[] data,
Metadata metadata)
Creates a TikaInputStream from the given array of bytes.
|
static TikaInputStream |
TikaInputStream.get(File file,
Metadata metadata)
Deprecated.
use
TikaInputStream.get(Path, Metadata). In Tika 2.0,
this will be removed or modified to throw an IOException. |
static TikaInputStream |
TikaInputStream.get(InputStream stream,
TemporaryResources tmp,
Metadata metadata)
Casts or wraps the given stream to a TikaInputStream instance.
|
static TikaInputStream |
TikaInputStream.get(Path path,
Metadata metadata)
Creates a TikaInputStream from the file at the given path.
|
static TikaInputStream |
TikaInputStream.get(Path path,
Metadata metadata,
TemporaryResources tmp) |
static TikaInputStream |
TikaInputStream.get(URI uri,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URI.
|
static TikaInputStream |
TikaInputStream.get(URL url,
Metadata metadata)
Creates a TikaInputStream from the resource at the given URL.
|
| Modifier and Type | Method and Description |
|---|---|
static void |
XMPDM.ChannelTypePropertyConverter.convertAndSet(Metadata metadata,
Object value)
Deprecated.
How convert+set might work
|
| Modifier and Type | Method and Description |
|---|---|
void |
DateNormalizingMetadataFilter.filter(Metadata metadata) |
void |
ClearByMimeMetadataFilter.filter(Metadata metadata) |
void |
NoOpFilter.filter(Metadata metadata) |
void |
FieldNameMappingFilter.filter(Metadata metadata) |
void |
ExcludeFieldMetadataFilter.filter(Metadata metadata) |
void |
CompositeMetadataFilter.filter(Metadata metadata) |
void |
GeoPointMetadataFilter.filter(Metadata metadata) |
void |
IncludeFieldMetadataFilter.filter(Metadata metadata) |
void |
CaptureGroupMetadataFilter.filter(Metadata metadata) |
abstract void |
MetadataFilter.filter(Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
MediaType |
MimeTypes.detect(InputStream input,
Metadata metadata)
Automatically detects the MIME type of a document based on magic
markers in the stream prefix and any given metadata hints.
|
MediaType |
ProbabilisticMimeDetectionSelector.detect(InputStream input,
Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
List<Metadata> |
ParseRecord.getMetadataList() |
| Modifier and Type | Method and Description |
|---|---|
void |
ParseRecord.addMetadata(Metadata metadata) |
void |
DigestingParser.Digester.digest(InputStream is,
Metadata m,
ParseContext parseContext)
Digests an InputStream and sets the appropriate value(s) in the metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata)
Returns the parser that best matches the given metadata.
|
protected Parser |
CompositeParser.getParser(Metadata metadata,
ParseContext context) |
String |
PasswordProvider.getPassword(Metadata metadata)
Looks up the password for a document with the given metadata,
and returns it for the Parser.
|
void |
AbstractParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata)
Deprecated.
use the
Parser.parse(InputStream, ContentHandler,
Metadata, ParseContext) method instead |
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata) |
void |
ParserPostProcessor.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Forwards the call to the delegated parser and post-processes the
results as described above.
|
void |
Parser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses a document stream into a sequence of XHTML SAX events.
|
void |
DelegatingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Looks up the delegate parser from the parsing context and
delegates the parse operation to it.
|
void |
CryptoParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
ParserDecorator.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the method call to the decorated parser.
|
void |
CompositeParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Delegates the call to the matching component parser.
|
void |
EmptyParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
RecursiveParserWrapper.parse(InputStream stream,
ContentHandler recursiveParserWrapperHandler,
Metadata metadata,
ParseContext context) |
void |
ErrorParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
AutoDetectParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
NetworkParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
RegexCaptureParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
void |
DigestingParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
| Constructor and Description |
|---|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
ParsingReader(Parser parser,
InputStream stream,
Metadata metadata,
ParseContext context,
Executor executor)
Creates a reader for the text content of the given binary stream
with the given document metadata.
|
| Modifier and Type | Method and Description |
|---|---|
void |
CompositeDigester.digest(InputStream is,
Metadata m,
ParseContext parseContext) |
void |
InputStreamDigester.digest(InputStream is,
Metadata metadata,
ParseContext parseContext) |
| Modifier and Type | Method and Description |
|---|---|
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Executes the configured external command and passes the given document
stream as a simple XHTML document to the given SAX content handler.
|
| Modifier and Type | Method and Description |
|---|---|
void |
ExternalParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context) |
| Modifier and Type | Method and Description |
|---|---|
protected static Metadata |
AbstractMultipleParser.mergeMetadata(Metadata newMetadata,
Metadata lastMetadata,
AbstractMultipleParser.MetadataPolicy policy) |
| Modifier and Type | Method and Description |
|---|---|
protected static Metadata |
AbstractMultipleParser.mergeMetadata(Metadata newMetadata,
Metadata lastMetadata,
AbstractMultipleParser.MetadataPolicy policy) |
void |
AbstractMultipleParser.parse(InputStream stream,
ContentHandlerFactory handlers,
Metadata metadata,
ParseContext context)
Deprecated.
The
ContentHandlerFactory override is still experimental
and the method signature is subject to change before Tika 2.0 |
void |
AbstractMultipleParser.parse(InputStream stream,
ContentHandler handler,
Metadata metadata,
ParseContext context)
Processes the given Stream through one or more parsers,
resetting things between parsers as requested by policy.
|
protected abstract boolean |
AbstractMultipleParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception)
Used to notify implementations that a Parser has Finished
or Failed, and to allow them to decide to continue or
abort further parsing
|
protected boolean |
FallbackParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception) |
protected boolean |
SupplementingParser.parserCompleted(Parser parser,
Metadata metadata,
ContentHandler handler,
ParseContext context,
Exception exception) |
protected void |
AbstractMultipleParser.parserPrepare(Parser parser,
Metadata metadata,
ParseContext context)
Used to allow implementations to prepare or change things
before parsing occurs
|
| Modifier and Type | Method and Description |
|---|---|
Metadata |
FetchEmitTuple.getMetadata() |
| Modifier and Type | Method and Description |
|---|---|
protected List<Metadata> |
PipesServer.parseIt(FetchEmitTuple t,
Fetcher fetcher) |
| Constructor and Description |
|---|
FetchEmitTuple(String id,
FetchKey fetchKey,
EmitKey emitKey,
Metadata metadata) |
FetchEmitTuple(String id,
FetchKey fetchKey,
EmitKey emitKey,
Metadata metadata,
HandlerConfig handlerConfig,
FetchEmitTuple.ON_PARSE_EXCEPTION onParseException) |
| Modifier and Type | Method and Description |
|---|---|
List<Metadata> |
EmitData.getMetadataList() |
| Modifier and Type | Method and Description |
|---|---|
void |
StreamEmitter.emit(String emitKey,
InputStream inputStream,
Metadata userMetadata) |
| Modifier and Type | Method and Description |
|---|---|
void |
EmptyEmitter.emit(String emitKey,
List<Metadata> metadataList) |
void |
Emitter.emit(String emitKey,
List<Metadata> metadataList) |
| Constructor and Description |
|---|
EmitData(EmitKey emitKey,
List<Metadata> metadataList) |
EmitData(EmitKey emitKey,
List<Metadata> metadataList,
String containerStackTrace) |
| Modifier and Type | Method and Description |
|---|---|
InputStream |
RangeFetcher.fetch(String fetchKey,
long startOffset,
long endOffset,
Metadata metadata) |
InputStream |
Fetcher.fetch(String fetchKey,
Metadata metadata) |
InputStream |
EmptyFetcher.fetch(String fetchKey,
Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
InputStream |
FileSystemFetcher.fetch(String fetchKey,
Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
InputStream |
UrlFetcher.fetch(String fetchKey,
Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
Metadata |
RenderResult.getMetadata() |
| Modifier and Type | Method and Description |
|---|---|
RenderResults |
Renderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
RenderResults |
CompositeRenderer.render(InputStream is,
Metadata metadata,
ParseContext parseContext,
RenderRequest... requests) |
| Constructor and Description |
|---|
RenderResult(RenderResult.STATUS status,
int id,
Object result,
Metadata metadata) |
| Modifier and Type | Field and Description |
|---|---|
protected List<Metadata> |
RecursiveParserWrapperHandler.metadataList |
| Modifier and Type | Method and Description |
|---|---|
List<Metadata> |
RecursiveParserWrapperHandler.getMetadataList() |
| Modifier and Type | Method and Description |
|---|---|
ContentHandler |
ContentHandlerDecoratorFactory.decorate(ContentHandler contentHandler,
Metadata metadata)
Deprecated.
use
ContentHandlerDecoratorFactory.decorate(ContentHandler, Metadata, ParseContext)
This will be removed in 2.5.0 |
ContentHandler |
ContentHandlerDecoratorFactory.decorate(ContentHandler contentHandler,
Metadata metadata,
ParseContext parseContext) |
void |
RecursiveParserWrapperHandler.endDocument(ContentHandler contentHandler,
Metadata metadata) |
void |
AbstractRecursiveParserWrapperHandler.endDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after the full parse has completed.
|
void |
RecursiveParserWrapperHandler.endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing an embedded document.
|
void |
AbstractRecursiveParserWrapperHandler.endEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called after parsing each embedded document.
|
void |
XMPContentHandler.metadata(Metadata metadata) |
void |
RecursiveParserWrapperHandler.startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing an embedded document
|
void |
AbstractRecursiveParserWrapperHandler.startEmbeddedDocument(ContentHandler contentHandler,
Metadata metadata)
This is called before parsing each embedded document.
|
| Constructor and Description |
|---|
DIFContentHandler(ContentHandler delegate,
Metadata metadata) |
PhoneExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
StandardsExtractingContentHandler(ContentHandler handler,
Metadata metadata)
Creates a decorator for the given SAX event handler and Metadata object.
|
XHTMLContentHandler(ContentHandler handler,
Metadata metadata) |
| Modifier and Type | Method and Description |
|---|---|
static Metadata |
ParserUtils.cloneMetadata(Metadata m)
Does a deep clone of a Metadata object.
|
| Modifier and Type | Method and Description |
|---|---|
static Metadata |
ParserUtils.cloneMetadata(Metadata m)
Does a deep clone of a Metadata object.
|
static InputStream |
ParserUtils.ensureStreamReReadable(InputStream stream,
TemporaryResources tmp,
Metadata metadata)
Ensures that the Stream will be able to be re-read, by buffering to
a temporary file if required.
|
static void |
ParserUtils.recordParserDetails(Parser parser,
Metadata metadata)
|
static void |
ParserUtils.recordParserDetails(String parserClassName,
Metadata metadata)
|
static void |
ParserUtils.recordParserFailure(Parser parser,
Throwable failure,
Metadata metadata)
|
Copyright © 2007–2024 The Apache Software Foundation. All rights reserved.