public class TransformationStrategies extends Object
This class provides several transformation strategies that turn strings or other objects into bit vectors. The transformations might optionally be:
prefixFree()) or constant (e.g., prefixFreeIso()) additional space.
As a general rule, transformations without additional naming are lexicographical. Transformation that generate prefix-free bit vectors are marked as such. Plain transformations that do not provide any guarantee are called raw. They should be used only when performance is the main issue and the two properties above are not relevant.
TransformationStrategy| Constructor and Description |
|---|
TransformationStrategies() |
| Modifier and Type | Method and Description |
|---|---|
static TransformationStrategy<byte[]> |
byteArray()
A lexicographical transformation from byte arrays to bit vectors.
|
static TransformationStrategy<Long> |
fixedLong()
A transformation from longs to bit vectors that returns a fixed-size
Long.SIZE-bit vector. |
static <T extends BitVector> |
identity()
A trivial transformation for data already in
BitVector form. |
static <T extends CharSequence> |
iso()
A trivial transformation from strings to bit vectors that concatenates the lower eight bits of the UTF-16 representation.
|
static <T extends BitVector> |
prefixFree()
A transformation from bit vectors to bit vectors that guarantees that its results are prefix free.
|
static <T extends CharSequence> |
prefixFreeIso()
A trivial transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation and completes
the representation with an ASCII NUL to guarantee lexicographical ordering and prefix-freeness.
|
static <T extends CharSequence> |
prefixFreeUtf16()
A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation and completes
the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.
|
static <T extends CharSequence> |
prefixFreeUtf32()
A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation,
decodes surrogate pairs, concatenates the bits of the UTF-32 representation and completes
the representation with an NUL to guarantee lexicographical ordering and prefix-freeness.
|
static TransformationStrategy<byte[]> |
rawByteArray()
A trivial, high-performance, raw transformation from byte arrays to bit
vectors that simply concatenates the bytes of the array.
|
static TransformationStrategy<Long> |
rawFixedLong()
A trivial, high-performance, raw transformation from longs to bit vectors that returns a fixed-size
Long.SIZE-bit vector. |
static <T extends CharSequence> |
rawIso()
A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the lower eight bits bits of the UTF-16 representation.
|
static <T extends CharSequence> |
rawUtf16()
A trivial, high-performance, raw transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.
|
static <T extends CharSequence> |
rawUtf32()
A trivial raw transformation from strings to bit vectors
that turns the UTF-16 representation into a UTF-32 representation,
decodes surrogate pairs and concatenates the bits of the UTF-32 representation.
|
static <T extends CharSequence> |
utf16()
A trivial transformation from strings to bit vectors that concatenates the bits of the UTF-16 representation.
|
static <T extends CharSequence> |
utf32()
A transformation from strings to bit vectors that turns the UTF-16 representation into a UTF-32 representation,
decodes surrogate pairs and concatenates the bits of the UTF-32 representation.
|
static <T> Iterable<BitVector> |
wrap(Iterable<T> iterable,
TransformationStrategy<? super T> transformationStrategy)
Wraps a given iterable, returning an iterable that contains bit vectors.
|
static <T> Iterator<BitVector> |
wrap(Iterator<T> iterator,
TransformationStrategy<? super T> transformationStrategy)
Wraps a given iterator, returning an iterator that emits bit vectors.
|
static <T> List<BitVector> |
wrap(List<T> list,
TransformationStrategy<? super T> transformationStrategy)
Wraps a given list, returning a list that contains bit vectors.
|
public static <T extends BitVector> TransformationStrategy<T> identity()
BitVector form.public static <T extends CharSequence> TransformationStrategy<T> rawUtf32()
Warning: this transformation is not lexicographic.
public static <T extends CharSequence> TransformationStrategy<T> utf32()
public static <T extends CharSequence> TransformationStrategy<T> prefixFreeUtf32()
Note that strings provided to this strategy must not contain NULs.
public static <T extends CharSequence> TransformationStrategy<T> rawUtf16()
Warning: this transformation is not lexicographic.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
public static <T extends CharSequence> TransformationStrategy<T> utf16()
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
public static <T extends CharSequence> TransformationStrategy<T> prefixFreeUtf16()
Note that strings provided to this strategy must not contain NULs.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
public static <T extends CharSequence> TransformationStrategy<T> rawIso()
Warning: this transformation is not lexicographic.
Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
public static <T extends CharSequence> TransformationStrategy<T> iso()
Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
public static <T extends CharSequence> TransformationStrategy<T> prefixFreeIso()
Note that this transformation is sensible only for strings that are known to be contain just characters in the ISO-8859-1 charset, and that strings provided to this strategy must not contain ASCII NULs.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
public static TransformationStrategy<byte[]> rawByteArray()
Warning: this transformation is not lexicographic.
Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.
TransformationStrategiespublic static TransformationStrategy<byte[]> byteArray()
Warning: bit vectors returned by this strategy are adaptors around the original array. If the array changes while the bit vector is being accessed, the results will be unpredictable.
TransformationStrategiespublic static <T> Iterator<BitVector> wrap(Iterator<T> iterator, TransformationStrategy<? super T> transformationStrategy)
iterator - an iterator.transformationStrategy - a strategy to transform the object returned by iterator.iterator passed through transformationStrategy.public static <T> Iterable<BitVector> wrap(Iterable<T> iterable, TransformationStrategy<? super T> transformationStrategy)
iterable - an iterable.transformationStrategy - a strategy to transform the object contained in iterable.iterable passed through transformationStrategy.public static <T> List<BitVector> wrap(List<T> list, TransformationStrategy<? super T> transformationStrategy)
list - a list.transformationStrategy - a strategy to transform the object contained in list.list passed through transformationStrategy.public static <T extends BitVector> TransformationStrategy<T> prefixFree()
More in detail, we map 0 to 10, 1 to 11, and we add a 0 at the end of all strings.
Warning: bit vectors returned by this strategy are adaptors around the original string. If the string changes while the bit vector is being accessed, the results will be unpredictable.
public static TransformationStrategy<Long> fixedLong()
Long.SIZE-bit vector. Note that the
bit vectors have as first bit the most significant bit of the underlying long integer, so
lexicographical and numerical order do coincide for positive numbers.public static TransformationStrategy<Long> rawFixedLong()
Long.SIZE-bit vector.