|
Class Summary |
| BoilerpipeContentHandler |
Uses the boilerpipe
library to automatically extract the main content from a web page. |
| DefaultHtmlMapper |
The default HTML mapping rules in Tika. |
| HtmlEncodingDetector |
Character encoding detector for determining the character encoding of a
HTML document based on the potential charset parameter found in a
Content-Type http-equiv meta tag somewhere near the beginning. |
| HtmlParser |
HTML parser. |
| IdentityHtmlMapper |
Alternative HTML mapping rules that pass the input HTML as-is without any
modifications. |