public class PermutedFrontCodedStringList extends AbstractObjectList<CharSequence> implements Serializable
FrontCodedStringList whose indices are permuted.
It may happen that a list of strings compresses very well
using front coding, but unfortunately alphabetical order is not
the right order for the strings in the list. Instances of this class
wrap an instance of FrontCodedStringList
together with a permutation π: inquiries with index i will
actually return the string with index πi.
In case you start from a newline-delimited non-sorted list of UTF-8 strings, the simplest way to build an instance of this map is obtaining a front-coded string list and a permutation with a simple UN*X pipe (which also avoids storing the sorted strings):
nl -v0 -nln | sort -k2 | tee >(cut -f1 >perm.txt) \
| cut -f2 | java it.unimi.dsi.util.FrontCodedStringList tmp-lex.fcl
The above command will read a list of strings from standard input,
output a their sorted index list in perm.txt and create a tmp-lex.fcl front-coded
string list containing the sorted list of strings.
Important: you must be sure to be using the byte-by-byte collation order—in UN*X,
be sure that LC_COLLATE=C. Failure to do so will result in an order-of-magnitude-slower sorting and
worse compression.
Now, in perm.txt you will find the permutation that you have to pass to
this class (given that you will use the option -i). So the last step is just
java it.unimi.dsi.util.PermutedFrontCodedStringList -i -t tmp-lex.fcl perm.txt your.fcl
AbstractObjectList.ObjectSubList<K>| Modifier and Type | Field and Description |
|---|---|
protected FrontCodedStringList |
frontCodedStringList
The underlying front-coded string list.
|
protected int[] |
permutation
The permutation.
|
static long |
serialVersionUID |
| Constructor and Description |
|---|
PermutedFrontCodedStringList(FrontCodedStringList frontCodedStringList,
int[] permutation)
Creates a new permuted front-coded string list using a given front-coded string list and permutation.
|
| Modifier and Type | Method and Description |
|---|---|
CharSequence |
get(int index) |
void |
get(int index,
MutableString s)
Returns the element at the specified position in this front-coded list by storing it in a mutable string.
|
ObjectListIterator<CharSequence> |
listIterator(int k) |
static void |
main(String[] arg) |
int |
size() |
add, add, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, objectListIterator, objectListIterator, objectSubList, peek, pop, push, remove, removeElements, set, size, subList, top, toStringcontainsAll, isEmpty, objectIterator, removeAll, retainAll, toArray, toArrayclear, removeclone, finalize, getClass, notify, notifyAll, wait, wait, waitclear, containsAll, isEmpty, remove, removeAll, replaceAll, retainAll, sort, spliterator, toArray, toArrayobjectIterator, toArrayparallelStream, removeIf, streampublic static final long serialVersionUID
protected final FrontCodedStringList frontCodedStringList
protected final int[] permutation
public PermutedFrontCodedStringList(FrontCodedStringList frontCodedStringList, int[] permutation)
frontCodedStringList - the underlying front-coded string list.permutation - the underlying permutation.public CharSequence get(int index)
get in interface List<CharSequence>public void get(int index,
MutableString s)
index - an index in the list.s - a mutable string that will contain the string at the specified position.public int size()
size in interface Collection<CharSequence>size in interface List<CharSequence>size in class AbstractCollection<CharSequence>public ObjectListIterator<CharSequence> listIterator(int k)
listIterator in interface ObjectList<CharSequence>listIterator in interface List<CharSequence>listIterator in class AbstractObjectList<CharSequence>public static void main(String[] arg) throws IOException, ClassNotFoundException, JSAPException