org.apache.tika.parser.microsoft
Class POIFSContainerDetector

java.lang.Object
  extended by org.apache.tika.parser.microsoft.POIFSContainerDetector
All Implemented Interfaces:
Serializable, org.apache.tika.detect.Detector

public class POIFSContainerDetector
extends Object
implements org.apache.tika.detect.Detector

A detector that works on a POIFS OLE2 document to figure out exactly what the file is. This should work for all OLE2 documents, whether they are ones supported by POI or not.

See Also:
Serialized Form

Field Summary
static org.apache.tika.mime.MediaType COMP_OBJ
          Some other kind of embedded document, in a CompObj container within another OLE2 document
static org.apache.tika.mime.MediaType DOC
          Microsoft Word
static org.apache.tika.mime.MediaType GENERAL_EMBEDDED
          General embedded document type within an OLE2 container
static org.apache.tika.mime.MediaType MPP
          Microsoft Project
static org.apache.tika.mime.MediaType MSG
          Microsoft Outlook
static org.apache.tika.mime.MediaType OLE
          The OLE base file format
static org.apache.tika.mime.MediaType OLE10_NATIVE
          An OLE10 Native embedded document within another OLE2 document
static org.apache.tika.mime.MediaType OOXML_PROTECTED
          The protected OOXML base file format
static org.apache.tika.mime.MediaType PPT
          Microsoft PowerPoint
static org.apache.tika.mime.MediaType PUB
          Microsoft Publisher
static org.apache.tika.mime.MediaType SDA
          StarOffice Draw
static org.apache.tika.mime.MediaType SDC
          StarOffice Calc
static org.apache.tika.mime.MediaType SDD
          StarOffice Impress
static org.apache.tika.mime.MediaType SDW
          StarOffice Writer
static org.apache.tika.mime.MediaType VSD
          Microsoft Visio
static org.apache.tika.mime.MediaType WPS
          Microsoft Works
static org.apache.tika.mime.MediaType XLR
          Microsoft Works Spreadsheet 7.0
static org.apache.tika.mime.MediaType XLS
          Microsoft Excel
 
Constructor Summary
POIFSContainerDetector()
           
 
Method Summary
 org.apache.tika.mime.MediaType detect(InputStream input, org.apache.tika.metadata.Metadata metadata)
           
protected static org.apache.tika.mime.MediaType detect(Set<String> names)
          Deprecated. Use detect(Set, DirectoryEntry) and pass the root entry of the filesystem whose type is to be detected, as a second argument.
protected static org.apache.tika.mime.MediaType detect(Set<String> names, org.apache.poi.poifs.filesystem.DirectoryEntry root)
          Internal detection of the specific kind of OLE2 document, based on the names of the top-level streams within the file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

OLE

public static final org.apache.tika.mime.MediaType OLE
The OLE base file format


OOXML_PROTECTED

public static final org.apache.tika.mime.MediaType OOXML_PROTECTED
The protected OOXML base file format


GENERAL_EMBEDDED

public static final org.apache.tika.mime.MediaType GENERAL_EMBEDDED
General embedded document type within an OLE2 container


OLE10_NATIVE

public static final org.apache.tika.mime.MediaType OLE10_NATIVE
An OLE10 Native embedded document within another OLE2 document


COMP_OBJ

public static final org.apache.tika.mime.MediaType COMP_OBJ
Some other kind of embedded document, in a CompObj container within another OLE2 document


XLS

public static final org.apache.tika.mime.MediaType XLS
Microsoft Excel


DOC

public static final org.apache.tika.mime.MediaType DOC
Microsoft Word


PPT

public static final org.apache.tika.mime.MediaType PPT
Microsoft PowerPoint


PUB

public static final org.apache.tika.mime.MediaType PUB
Microsoft Publisher


VSD

public static final org.apache.tika.mime.MediaType VSD
Microsoft Visio


WPS

public static final org.apache.tika.mime.MediaType WPS
Microsoft Works


XLR

public static final org.apache.tika.mime.MediaType XLR
Microsoft Works Spreadsheet 7.0


MSG

public static final org.apache.tika.mime.MediaType MSG
Microsoft Outlook


MPP

public static final org.apache.tika.mime.MediaType MPP
Microsoft Project


SDC

public static final org.apache.tika.mime.MediaType SDC
StarOffice Calc


SDA

public static final org.apache.tika.mime.MediaType SDA
StarOffice Draw


SDD

public static final org.apache.tika.mime.MediaType SDD
StarOffice Impress


SDW

public static final org.apache.tika.mime.MediaType SDW
StarOffice Writer

Constructor Detail

POIFSContainerDetector

public POIFSContainerDetector()
Method Detail

detect

public org.apache.tika.mime.MediaType detect(InputStream input,
                                             org.apache.tika.metadata.Metadata metadata)
                                      throws IOException
Specified by:
detect in interface org.apache.tika.detect.Detector
Throws:
IOException

detect

protected static org.apache.tika.mime.MediaType detect(Set<String> names)
Deprecated. Use detect(Set, DirectoryEntry) and pass the root entry of the filesystem whose type is to be detected, as a second argument.

Internal detection of the specific kind of OLE2 document, based on the names of the top level streams within the file.


detect

protected static org.apache.tika.mime.MediaType detect(Set<String> names,
                                                       org.apache.poi.poifs.filesystem.DirectoryEntry root)
Internal detection of the specific kind of OLE2 document, based on the names of the top-level streams within the file. In some cases the detection may need access to the root DirectoryEntry of that file for best results. The entry can be given as a second, optional argument.

Parameters:
names -
root -
Returns:


Copyright © 2007-2013 The Apache Software Foundation. All Rights Reserved.