|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor
public abstract class AbstractOOXMLExtractor
Base class for all Tika OOXML extractors.
Tika extractors decorate POI extractors so that the parsed content of
documents is returned as a sequence of XHTML SAX events. Subclasses must
implement the buildXHTML method buildXHTML(XHTMLContentHandler) that
populates the XHTMLContentHandler object received as parameter.
| Field Summary | |
|---|---|
protected org.apache.poi.POIXMLTextExtractor |
extractor
|
| Constructor Summary | |
|---|---|
AbstractOOXMLExtractor(ParseContext context,
org.apache.poi.POIXMLTextExtractor extractor,
java.lang.String type)
|
|
| Method Summary | |
|---|---|
protected abstract void |
buildXHTML(XHTMLContentHandler xhtml)
Populates the XHTMLContentHandler object received as parameter. |
org.apache.poi.POIXMLDocument |
getDocument()
Returns the opened document. |
protected abstract java.util.List<org.apache.poi.openxml4j.opc.PackagePart> |
getMainDocumentParts()
Return a list of the main parts of the document, used when searching for embedded resources. |
MetadataExtractor |
getMetadataExtractor()
POIXMLTextExtractor.getMetadataTextExtractor() not yet supported
for OOXML by POI. |
void |
getXHTML(org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
Parses the document into a sequence of XHTML SAX events sent to the given content handler. |
protected void |
handleEmbedded(org.apache.poi.openxml4j.opc.PackageRelationship rel,
org.apache.poi.openxml4j.opc.PackagePart part,
org.xml.sax.ContentHandler handler,
ParseContext context)
Handles an embedded resource in the file |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected org.apache.poi.POIXMLTextExtractor extractor
| Constructor Detail |
|---|
public AbstractOOXMLExtractor(ParseContext context,
org.apache.poi.POIXMLTextExtractor extractor,
java.lang.String type)
| Method Detail |
|---|
public org.apache.poi.POIXMLDocument getDocument()
OOXMLExtractor
getDocument in interface OOXMLExtractorOOXMLExtractor.getDocument()public MetadataExtractor getMetadataExtractor()
OOXMLExtractorPOIXMLTextExtractor.getMetadataTextExtractor() not yet supported
for OOXML by POI.
getMetadataExtractor in interface OOXMLExtractorOOXMLExtractor.getMetadataExtractor()
public void getXHTML(org.xml.sax.ContentHandler handler,
Metadata metadata,
ParseContext context)
throws org.xml.sax.SAXException,
org.apache.xmlbeans.XmlException,
java.io.IOException,
TikaException
OOXMLExtractor
getXHTML in interface OOXMLExtractororg.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException
TikaExceptionorg.apache.tika.parser.microsoft.ooxml.OOXMLExtractor#getXHTML(org.xml.sax.ContentHandler,
org.apache.tika.metadata.Metadata)
protected void handleEmbedded(org.apache.poi.openxml4j.opc.PackageRelationship rel,
org.apache.poi.openxml4j.opc.PackagePart part,
org.xml.sax.ContentHandler handler,
ParseContext context)
throws org.xml.sax.SAXException,
org.apache.xmlbeans.XmlException,
java.io.IOException,
TikaException
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException
TikaException
protected abstract void buildXHTML(XHTMLContentHandler xhtml)
throws org.xml.sax.SAXException,
org.apache.xmlbeans.XmlException,
java.io.IOException
XHTMLContentHandler object received as parameter.
org.xml.sax.SAXException
org.apache.xmlbeans.XmlException
java.io.IOException
protected abstract java.util.List<org.apache.poi.openxml4j.opc.PackagePart> getMainDocumentParts()
throws TikaException
TikaException
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||