WebGraph is a framework for graph compression aimed at studying web graphs. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:
In the end, with WebGraph you can access and analyse very large web graphs. Using WebGraph is as easy as installing a few jar files and downloading a data set.
You are welcome to use and improve WebGraph! If you find our software useful for your research, please quote our paper “The WebGraph Framework I: Compression Techniques”, by Paolo Boldi and Sebastiano Vigna, in Proc. of the Thirteenth World–Wide Web Conference, pages 595−601, 2004, ACM Press.
For in-depth information on the Webgraph framework, you should have a look at its home page, where you can find some papers about the compression techniques it uses. Datasets are available at the LAW web site.
The classes of interest for the casual Webgraph user are {@link it.unimi.dsi.webgraph.ImmutableGraph}, which specifies the access methods for an immutable graph, {@link it.unimi.dsi.webgraph.BVGraph}, which makes it possible to retrieve or recompress a graph stored in the format described in The WebGraph Framework I: Compression Techniques, {@link it.unimi.dsi.webgraph.EFGraph}, which provides a quasi-succinct representation using the Elias–Fano representation of monotone sequences, and {@link it.unimi.dsi.webgraph.Transform}, which provides several ways to transform an {@link it.unimi.dsi.webgraph.ImmutableGraph}.
If you plan on building your graphs dynamically, the class {@link it.unimi.dsi.webgraph.ArrayListMutableGraph} makes it possible to create incrementally a graph and then extract an {@linkplain it.unimi.dsi.webgraph.ArrayListMutableGraph#immutableView() immutable view}.
The package {@link it.unimi.dsi.webgraph.examples} contains useful examples that show how to access sequentially and randomly an immutable graph.
{@link it.unimi.dsi.webgraph.ASCIIGraph} and {@link it.unimi.dsi.webgraph.ArcListASCIIGraph} have main methods that can be used to save an immutable graph, as long as you can load it, in ASCII form. With data in {@link it.unimi.dsi.webgraph.BVGraph} or {@link it.unimi.dsi.webgraph.EFGraph} format this is as simple as
java -server it.unimi.dsi.webgraph.ASCIIGraph sourcebasename destor
java -server it.unimi.dsi.webgraph.ArcListASCIIGraph sourcebasename dest
Please consult the documentation and the command-line help of these two classes to get more information.
If you want to import your own data into WebGraph, you must write
an implementation of {@link it.unimi.dsi.webgraph.ImmutableGraph} that
exposes your data. A simple example is given in {@link it.unimi.dsi.webgraph.examples.IntegerListImmutableGraph},
a stub class exposing a simple, noncompressed binary format as an {@link it.unimi.dsi.webgraph.ImmutableGraph}.
Once your data is exposed in that way, you can get a compressed version
using the store() method of your class of interest. Often, there
is a main method (see, e.g., {@link it.unimi.dsi.webgraph.BVGraph}) that
will load your class and invoke store() for you.
For example, you can use an immutable graph inside the Jung framework using our {@link it.unimi.dsi.webgraph.jung.JungAdapter}.
As an alternative, the class {@link it.unimi.dsi.webgraph.ASCIIGraph}
can be used to read graphs specified in a very simple ASCII format. The class
implements {@link it.unimi.dsi.webgraph.ASCIIGraph#loadOnce(java.io.InputStream)} so
that the file can be just piped into a class offering a main method that supports
loadOnce() (e.g., {@link it.unimi.dsi.webgraph.BVGraph}).
You can also generate a graph in ASCII format and read it using
{@link it.unimi.dsi.webgraph.ASCIIGraph#loadOffline(CharSequence)}—the
graph will not be loaded into main memory.
{@link it.unimi.dsi.webgraph.ASCIIGraph} requires listing the successors of each node on a separate line. If your graph is specified arc by arc (one arc per line) you can use {@link it.unimi.dsi.webgraph.ArcListASCIIGraph} instead. {@link it.unimi.dsi.webgraph.ShiftedByOneArcListASCIIGraph} can be used if your input data numbers (rather insensibly) nodes starting from one.
Another possibility is to specify your graph {@linkplain it.unimi.dsi.webgraph.IncrementalImmutableSequentialGraph incrementally}. which just involves enumerating arrays of successors for each node.
Arc-labelled graphs are represented using implementations of {@link it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph}. Most arc-labelled graphs are based on an underlying {@link it.unimi.dsi.webgraph.ImmutableGraph}, and the {@link it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph} implementation just provides label handling. The example {@link it.unimi.dsi.webgraph.examples.IntegerTriplesArcLabelledImmutableGraph} shows how to expose your data as an instance of {@link it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph}, so you can save your data using your preferred combination of implementations.