public class WarcReader extends Object implements Iterable<WarcRecord>, Closeable
| Constructor and Description |
|---|
WarcReader(InputStream stream) |
WarcReader(Path path) |
WarcReader(ReadableByteChannel channel) |
WarcReader(ReadableByteChannel channel,
ByteBuffer buffer)
Create WarcReader with user-provided buffer.
|
| Modifier and Type | Method and Description |
|---|---|
void |
calculateBlockDigest()
Enable calculation of block digests for all WARC records which include the
header "WARC-Block-Digest" and using the same digest algorithm as mentioned
in the header.
|
void |
close()
Closes the underlying channel.
|
WarcCompression |
compression()
The type of WARC compression that was detected.
|
Iterator<WarcRecord> |
iterator()
Returns an iterator over the records in the WARC file.
|
Optional<WarcRecord> |
next()
Reads the next WARC record.
|
void |
onWarning(java.util.function.Consumer<String> warningHandler)
Registers a handler that will be called when the reader encounters an error it was able to recover from.
|
long |
position()
Returns the byte position of the most recently read record.
|
void |
position(long newPosition)
Seeks to the record at the given position in the underlying channel.
|
java.util.stream.Stream<WarcRecord> |
records()
Returns a Stream over the records in the WARC file.
|
void |
registerType(String type,
WarcRecord.Constructor<WarcRecord> constructor)
Registers a new extension record type.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitforEach, spliteratorpublic WarcReader(ReadableByteChannel channel, ByteBuffer buffer) throws IOException, IllegalArgumentException
Buffer.flip() called).channel - read WARC data frombuffer - buffer to read initial data from, later used to buffer data
from channelIOExceptionIllegalArgumentException - if buffer is not readable or is not backed
by an arraypublic WarcReader(ReadableByteChannel channel) throws IOException
IOExceptionpublic WarcReader(InputStream stream) throws IOException
IOExceptionpublic WarcReader(Path path) throws IOException
IOExceptionpublic Optional<WarcRecord> next() throws IOException
This method will construct an appropriate subclass of WarcRecord based on the value of the
WARC-Type header. New types may be registered using
registerType(String, WarcRecord.Constructor).
The body channel of any previously read record will be closed.
WarcRecord or an empty Optional at the end of the channel.IOException - if an I/O error occurs.ParsingException - if the WARC record is invalid.public void registerType(String type, WarcRecord.Constructor<WarcRecord> constructor)
Builtin types like "resource" and "response" may be overridden with a subclass that adds extension methods. The special type name "default" is used when a unregistered record type is encountered.
type - a value of the WARC-Type headerconstructor - a constructor for a corresponding subclass of WarcRecordpublic void calculateBlockDigest()
WarcRecord.calculatedBlockDigest()) can be then compared to the
pre-calculated digests (WarcRecord.blockDigest()). See also
DigestingMessageBody.public long position()
For compressed WARCs this method will only return a meaningful value if the compression was applied in such a way that the start of a new record corresponds to the start of a compression block.
public void position(long newPosition)
throws IOException
newPosition - byte offset of the beginning of the record to seek toIOException - if an I/O error occursIllegalArgumentException - if the position is negativeUnsupportedOperationException - if the underlying channel does not support seekingpublic WarcCompression compression()
public Iterator<WarcRecord> iterator()
iterator in interface Iterable<WarcRecord>public java.util.stream.Stream<WarcRecord> records()
public void onWarning(java.util.function.Consumer<String> warningHandler)
public void close()
throws IOException
close in interface Closeableclose in interface AutoCloseableIOExceptionCopyright © 2023. All rights reserved.