| Interface | Description |
|---|---|
| CharSequenceBuffer | |
| Tag |
Tag returned by
TagTokenizer. |
| TagProcessorContext |
Defines a set of methods that allows
TagRules to
interact with the TagProcessor. |
| TagRule |
User defined rule for processing
Tags encountered by the TagProcessor. |
| TagTokenizer.TokenHandler |
Handler that will receive callbacks as 'tags' and 'text' are encountered.
|
| Class | Description |
|---|---|
| BasicBlockRule<T> |
TagRule helper class for dealing with blocks surrounded by an opening and closing tag. |
| BasicRule |
Basic implementation of
TagRule. |
| CustomTag |
A CustomTag provides a mechanism to manipulate the contents of a Tag.
|
| State |
Acts a registry of
TagRules to apply whilst the TagProcessor
is processing the document in this particular state. |
| StateTransitionRule | |
| TagProcessor |
Copies a document from a source to a destination, applying rules on the way
to extract content and/or transform the content.
|
| TagTokenizer |
Splits a chunk of HTML into 'text' and 'tag' tokens, for easy processing.
|
| Enum | Description |
|---|---|
| Tag.Type |
Type of tag.
|
| TagTokenizer.Token |
This package is for processing tag-like markup languages - things with anglybrackets. HTML, XHTML, WML, XML and other SGML dialects.
Strengths:
It has 2 APIs you can use:
The TagTokenizer scans through a document and fires events as it encounters Tags of
interest. Anything that does not qualify as a Tag will be treated as a Text token.
This is a similar approach to the SAX API for XML processing.
The TagProcessor is built on top of the TagTokenizer and acts as a registry for TagRules and
TextFilters.
It also supports multiple States, allowing different rules to be applied in different sections of
document.
Copyright © 2015. All Rights Reserved.