|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
public abstract class AbstractOOXMLExtractor
Base class for all Tika OOXML extractors.
Tika extractors decorate POI extractors so that the parsed content of
documents is returned as a sequence of XHTML SAX events. Subclasses must
implement the buildXHTML method buildXHTML(XHTMLContentHandler) that
populates the XHTMLContentHandler object received as parameter.
| Field Summary | |
|---|---|
protected org.apache.poi.POIXMLTextExtractor |
extractor
|
| Constructor Summary | |
|---|---|
AbstractOOXMLExtractor(org.apache.tika.parser.ParseContext context,
org.apache.poi.POIXMLTextExtractor extractor,
String type)
|
|
| Method Summary | |
|---|---|
protected abstract void |
buildXHTML(org.apache.tika.sax.XHTMLContentHandler xhtml)
Populates the XHTMLContentHandler object received as parameter. |
org.apache.poi.POIXMLDocument |
getDocument()
Returns the opened document. |
protected abstract List<org.apache.poi.openxml4j.opc.PackagePart> |
getMainDocumentParts()
Return a list of the main parts of the document, used when searching for embedded resources. |
MetadataExtractor |
getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor() not yet supported
for OOXML by POI. |
void |
getXHTML(ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the given content handler. |
protected void |
handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part,
ContentHandler handler)
Handles an embedded file in the document |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected org.apache.poi.POIXMLTextExtractor extractor
| Constructor Detail |
|---|
public AbstractOOXMLExtractor(org.apache.tika.parser.ParseContext context,
org.apache.poi.POIXMLTextExtractor extractor,
String type)
| Method Detail |
|---|
public org.apache.poi.POIXMLDocument getDocument()
OOXMLExtractor
getDocument in interface OOXMLExtractorOOXMLExtractor.getDocument()public MetadataExtractor getMetadataExtractor()
OOXMLExtractorPOIXMLTextExtractor.getMetadataTextExtractor() not yet supported
for OOXML by POI.
getMetadataExtractor in interface OOXMLExtractorOOXMLExtractor.getMetadataExtractor()
public void getXHTML(ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
throws SAXException,
org.apache.xmlbeans.XmlException,
IOException,
org.apache.tika.exception.TikaException
OOXMLExtractor
getXHTML in interface OOXMLExtractorSAXException
org.apache.xmlbeans.XmlException
IOException
org.apache.tika.exception.TikaExceptionorg.apache.tika.parser.microsoft.ooxml.OOXMLExtractor#getXHTML(org.xml.sax.ContentHandler,
org.apache.tika.metadata.Metadata)
protected void handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part,
ContentHandler handler)
throws SAXException,
IOException
SAXException
IOException
protected abstract void buildXHTML(org.apache.tika.sax.XHTMLContentHandler xhtml)
throws SAXException,
org.apache.xmlbeans.XmlException,
IOException
XHTMLContentHandler object received as parameter.
SAXException
org.apache.xmlbeans.XmlException
IOException
protected abstract List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts()
throws org.apache.tika.exception.TikaException
org.apache.tika.exception.TikaException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||