|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
public abstract class AbstractOOXMLExtractor
Base class for all Tika OOXML extractors.
Tika extractors decorate POI extractors so that the parsed content of
documents is returned as a sequence of XHTML SAX events. Subclasses must
implement the buildXHTML method buildXHTML(XHTMLContentHandler) that
populates the XHTMLContentHandler object received as parameter.
| Field Summary | |
|---|---|
protected org.apache.poi.POIXMLTextExtractor |
extractor
|
| Constructor Summary | |
|---|---|
AbstractOOXMLExtractor(org.apache.tika.parser.ParseContext context,
org.apache.poi.POIXMLTextExtractor extractor)
|
|
| Method Summary | |
|---|---|
protected abstract void |
buildXHTML(org.apache.tika.sax.XHTMLContentHandler xhtml)
Populates the XHTMLContentHandler object received as parameter. |
org.apache.poi.POIXMLDocument |
getDocument()
Returns the opened document. |
protected String |
getJustFileName(String desc)
|
protected abstract List<org.apache.poi.openxml4j.opc.PackagePart> |
getMainDocumentParts()
Return a list of the main parts of the document, used when searching for embedded resources. |
MetadataExtractor |
getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor() not yet supported
for OOXML by POI. |
void |
getXHTML(ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the given content handler. |
protected void |
handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part,
ContentHandler handler,
String rel)
Handles an embedded file in the document |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected org.apache.poi.POIXMLTextExtractor extractor
| Constructor Detail |
|---|
public AbstractOOXMLExtractor(org.apache.tika.parser.ParseContext context,
org.apache.poi.POIXMLTextExtractor extractor)
| Method Detail |
|---|
public org.apache.poi.POIXMLDocument getDocument()
OOXMLExtractor
getDocument in interface OOXMLExtractorOOXMLExtractor.getDocument()public MetadataExtractor getMetadataExtractor()
OOXMLExtractorPOIXMLTextExtractor.getMetadataTextExtractor() not yet supported
for OOXML by POI.
getMetadataExtractor in interface OOXMLExtractorOOXMLExtractor.getMetadataExtractor()
public void getXHTML(ContentHandler handler,
org.apache.tika.metadata.Metadata metadata,
org.apache.tika.parser.ParseContext context)
throws SAXException,
org.apache.xmlbeans.XmlException,
IOException,
org.apache.tika.exception.TikaException
OOXMLExtractor
getXHTML in interface OOXMLExtractorSAXException
org.apache.xmlbeans.XmlException
IOException
org.apache.tika.exception.TikaExceptionorg.apache.tika.parser.microsoft.ooxml.OOXMLExtractor#getXHTML(org.xml.sax.ContentHandler,
org.apache.tika.metadata.Metadata)protected String getJustFileName(String desc)
protected void handleEmbeddedFile(org.apache.poi.openxml4j.opc.PackagePart part,
ContentHandler handler,
String rel)
throws SAXException,
IOException
SAXException
IOException
protected abstract void buildXHTML(org.apache.tika.sax.XHTMLContentHandler xhtml)
throws SAXException,
org.apache.xmlbeans.XmlException,
IOException
XHTMLContentHandler object received as parameter.
SAXException
org.apache.xmlbeans.XmlException
IOException
protected abstract List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts()
throws org.apache.tika.exception.TikaException
org.apache.tika.exception.TikaException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||