public class FrontCodedStringList extends AbstractObjectList<MutableString> implements RandomAccess, Serializable
This class stores a list of strings using front-coding
(a.k.a. prefix-omission) compression;
the compression will be reasonable only if the list is sorted, but you could
also use instances of this class just as a handy way to manage a large
amount of strings. It implements an immutable ObjectList that returns the i-th
string (as a MutableString) when the get(int) method is
called with argument i. The returned mutable string may be freely
modified.
As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated strings, and writes a corresponding serialized front-coded string list.
To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList, or a CharArrayFrontCodedList, depending on
the value of the utf8 parameter at creation time. In the first case, if the
strings are ASCII-oriented the resulting array will be much smaller, but
access times will increase manifold, as each string must be UTF-8 decoded
before being returned.
AbstractObjectList.ObjectSubList<K>| Modifier and Type | Field and Description |
|---|---|
protected ByteArrayFrontCodedList |
byteFrontCodedList
The underlying
ByteArrayFrontCodedList, or null. |
protected CharArrayFrontCodedList |
charFrontCodedList
The underlying
CharArrayFrontCodedList, or null. |
static long |
serialVersionUID |
protected boolean |
utf8
Whether this front-coded list is UTF-8 encoded.
|
| Constructor and Description |
|---|
FrontCodedStringList(Collection<? extends CharSequence> c,
int ratio,
boolean utf8)
Creates a new front-coded string list containing the character sequences contained in the given collection.
|
FrontCodedStringList(Iterator<? extends CharSequence> words,
int ratio,
boolean utf8)
Creates a new front-coded string list containing the character sequences returned by the given iterator.
|
| Modifier and Type | Method and Description |
|---|---|
protected static char[] |
byte2Char(byte[] a,
char[] s) |
protected static int |
countUTF8Chars(byte[] a) |
MutableString |
get(int index)
Returns the element at the specified position in this front-coded as a mutable string.
|
void |
get(int index,
MutableString s)
Returns the element at the specified position in this front-coded list by storing it in a mutable string.
|
ObjectListIterator<MutableString> |
listIterator(int k) |
static void |
main(String[] arg) |
int |
ratio()
Returns the ratio of the underlying front-coded list.
|
int |
size() |
boolean |
utf8()
Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.
|
add, add, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, objectListIterator, objectListIterator, objectSubList, peek, pop, push, remove, removeElements, set, size, subList, top, toStringcontainsAll, isEmpty, objectIterator, removeAll, retainAll, toArray, toArrayclear, removeclone, finalize, getClass, notify, notifyAll, wait, wait, waitclear, containsAll, isEmpty, remove, removeAll, replaceAll, retainAll, sort, spliterator, toArray, toArrayobjectIterator, toArrayparallelStream, removeIf, streampublic static final long serialVersionUID
protected final ByteArrayFrontCodedList byteFrontCodedList
ByteArrayFrontCodedList, or null.protected final CharArrayFrontCodedList charFrontCodedList
CharArrayFrontCodedList, or null.protected final boolean utf8
public FrontCodedStringList(Iterator<? extends CharSequence> words, int ratio, boolean utf8)
words - an iterator returning character sequences.ratio - the desired ratio.utf8 - if true, the strings will be stored as UTF-8 byte arrays.public FrontCodedStringList(Collection<? extends CharSequence> c, int ratio, boolean utf8)
c - a collection containing character sequences.ratio - the desired ratio.utf8 - if true, the strings will be stored as UTF-8 byte arrays.public boolean utf8()
public int ratio()
public MutableString get(int index)
get in interface List<MutableString>index - an index in the list.MutableString that will contain the string at the specified position. The string may be freely modified.public void get(int index,
MutableString s)
index - an index in the list.s - a mutable string that will contain the string at the specified position.protected static int countUTF8Chars(byte[] a)
protected static char[] byte2Char(byte[] a,
char[] s)
public ObjectListIterator<MutableString> listIterator(int k)
listIterator in interface ObjectList<MutableString>listIterator in interface List<MutableString>listIterator in class AbstractObjectList<MutableString>public int size()
size in interface Collection<MutableString>size in interface List<MutableString>size in class AbstractCollection<MutableString>public static void main(String[] arg) throws IOException, JSAPException, NoSuchMethodException