GEDLIB  1.0
GEDLIB Documentation

GEDLIB (1.0)

1. About GEDLIB

GEDLIB is a C++ library for (suboptimally) computing edit distances between graphs using various state-of-the-art methods. GEDLIB allows you to build your graphs via the C++ API or load them from GXL files. Several benchmark datasets are distributed with GEDLIB. For these datasets, GEDLIB provides predefined edit cost functions. You can easily extend GEDLIB by implementing new edit cost functions or new methods for computing the graph edit distance. An extensive Doxygen documentation is availabe here.

2. License and Citing

The source code of GEDLIB is distributed under the GNU Lesser General Public License. If you want to use GEDLIB in a publication, please refer to the following papers:

3. Installation under Unix

GEDLIB uses the following external libraries:

After having installed CMake, Doxygen, and OpenMP and having downloaded Boost, execute the script install.py for installing GEDLIB and the external libraries distributed with GEDLIB:

python install.py [--help] [-h] [--doc] [--tests all|sspr2018|vldbj2019|tkde2019|unit_tests|ged_env_tests|lsap_solver_tests] [--median] [--boost <BOOST_ROOT>] [--gurobi <GUROBI_ROOT>] [--debug] [--clean] [--lib gxl|<indentifier>,<UserNodeID>,<UserNodeLabel>,<UserEdgeLabel>]

Use the option --doc to build the Doxygen documentation, the option --clean to delete the build directoy and update the makefile before the build, the option --lib gxl to build the shared library lib/libgxlgedlib.so for usage with graphs given in the GXL file format, the option --tests all|sspr2018|vldbj2019|tkde2019|unit_tests|ged_env_tests|lsap_solver_tests to build test executables, and the option --median to build a GEDLIB implementation of median graph computation for graphs from the Letter dataset (see Section 7 below). These options require that you also specify the option --boost <BOOST_ROOT>, where <BOOST_ROOT> is the path to the directory which contains the Boost sources. Use --debug if you want to build shared libraries or test executables in debug mode. Specify --gurobi <GUROBI_ROOT> if you want to install GEDLIB with Gurobi.

Use the option --lib <indentifier>,<UserNodeID>,<UserNodeLabel>,<UserEdgeLabel> to build GEDLIB as a shared for graphs with custom node ID, node label, and edge label types:

Doing so builds the shared library lib/lib<indentifier>gedlib.so, which can be used for graphs with node IDs of type <UserNodeID>, node labels of type <UserNodeLabel>, and edge labels of type <UserEdgeLabel>. For example, executing $ python install.py --lib mytypes,int,double,double builds the shared library lib/libmytypesgedlib.so for usage with graphs whose node IDs are of type int and whose node and edge labels are of type double.

4. Building an Application that Uses GEDLIB

4.1 Building an Application that Uses GEDLIB as Header-Only Library

For building an application that uses GEDLIB as a header-only library, it suffices to execute the installation script install.py without any options. Subsequently, carry out the following steps:

4.2 Building an Application that Uses GEDLIB as Shared Library

4.2.1 Graphs Given as GXL Files

If you want to build an application that uses GEDLIB as a shared for graphs given as GXL files, make sure that you have installed GEDLIB with the option --lib gxl. Subsequently, carry out the folowing steps:

4.2.2 Graphs with User-Defined Node ID, Node Label, and Edge Label Types

If you want to build an application that uses GEDLIB as a shared library for graphs with custom node ID, node label, and edge label types, make sure that you have installed GEDLIB with the option --lib <indentifier>,<UserNodeID>,<UserNodeLabel>,<UserEdgeLabel>. Subsequently, carry out the folowing steps:

5. Using GEDLIB

5.1 General Usage

In your source file, include the header src/env/ged_env.hpp and create your environment object:

All functionality of GEDLIB is accessible via the environment class ged::GEDEnv, whose template parameters UserNodeID, UserNodeLabel, and UserEdgeLabel must be set to the types of the node IDs, the node labels, and the edge labels of your graphs. Use the member functions ged::GEDEnv::add_graph(), ged::GEDEnv::add_node(), and ged::GEDEnv::add_edge(), and ged::GEDEnv::set_edit_costs() for adding graphs and edit costs to the environment. Once you're done with this, call ged::GEDEnv::init() to initialize the environment.

When you have initialized the environment, you are ready to run the GED methods. To select the method, call ged::GEDEnv::set_method(). You can then initialize the selected method via ged::GEDEnv::init_method(), and run it via ged::GEDEnv::run_method(). Note that most method can be run also without initialization, but doing so typically has a negative impact on both runtime and accuracy. Use the member functions ged::GEDEnv::get_lower_bound(), ged::GEDEnv::get_upper_bound(), ged::GEDEnv::get_node_map(), ged::GEDEnv::get_runtime(), ged::GEDEnv::get_init_time(), ged::GEDEnv::get_graph_class(), and ged::GEDEnv::get_graph_name() for accessing the results of your GED computations. Use ged::GEDEnv::quasimetric_costs() to check if your edit costs are quasimetric on the graphs contained in your environment.

5.2 Additional Functionality for Graphs Given as GXL Files

If you use GEDLIB with the template parameter UserNodeID set to ged::GXLNodeID a.k.a. std::string and the template parameters UserNodeLabel and UserEdgeLabel set to ged::GXLLabel a.k.a. std::map<std::string,std::string>, GEDLIB offers additional functionality for loading graphs given in the GXL file format. For those graphs, you do not have to use the member functions ged::GEDEnv::add_graph(), ged::GEDEnv::add_node(), and ged::GEDEnv::add_edge() for adding graphs to your environment. Instead, you can simply load all of them at once by one call to ged::GEDEnv::load_gxl_graphs().

5.3 Using GEDLIB as a Shared Library

5.3.1 Graphs Given as GXL Files

If you want to use GEDLIB as a shared library for graphs given as GXL files, make sure that you have installed GEDLIB with the option --lib gxl. In your source file, you have to define GXL_GEDLIB_SHARED before including src/env/ged_env.hpp:

5.3.2 Graphs with User-Defined Node ID, Node Label, and Edge Label Types

If you want to use GEDLIB as a shared library for graphs with custom node ID, node label, and edge label types, make sure that you have installed GEDLIB with the option --lib <indentifier>,<UserNodeID>,<UserNodeLabel>,<UserEdgeLabel>. In your source file, you have to define <IDENTIFIER>_GEDLIB_SHARED before including src/env/ged_env.hpp, where <IDENTIFIER> is the upper case transformation of <identifier>. For example, if you have installed GEDLIB by executing $ python install.py --lib mytypes,int,double,double, you have to do the following:

#define MYTYPES_GEDLIB_SHARED
// ...

5.4 Examples

For exacmples of how to use GEDLIB, have a look at median/src/median_letter.cpp and at the .cpp files contained in the subdirectories of tests/.

6. Reproducability Packages

GEDLIB has been used for several research papers. For reproducing the experiments reported in these papers, follow the instructions below.

D. B. Blumenthal, S. Bougleux, J. Gamper, and L. Brun. “Ring based approximation of graph edit distance”, S+SSPR 2018, vol. 11004 of LNCS, pp. 293-303, https://doi.org/10.1007/978-3-319-97785-0_28

In order to reproduce the experiments reported in this paper, install GEDLIB with the option --tests sspr2018. After installation, open a shell and execute the following commands:

$ cd <GEDLIB_ROOT>/tests/sspr2018/bin
$ ./learn_ring_params
$ ./learn_subgraph_depths
$ ./learn_walks_depth
$ ./test_lsape_based_methods

After having executed these commands, the results of the experiments are contained in the folder tests/sspr2018/output/.

D. B. Blumenthal, N. Boria, J. Gamper, S. Bougleux, and L. Brun. “Comparing heuristics for graph edit distance computation”, VLDB J. 2019

In order to reproduce the experiments reported in this paper, install GEDLIB with the options --tests vldbj2019 and --gurobi <GUROBI_ROOT>. After installation, open a shell and execute the following commands:

$ cd <GEDLIB_ROOT>/tests/vldbj2019/bin
$ ./vldbj_train_subgraph
$ ./vldbj_train_walks
$ ./vldbj_train_ring
$ ./vldbj_train_ml
$ ./vldbj_test_lsape_based_methods
$ ./vldbj_test_lp_based_methods
$ ./vldbj_test_ls_based_methods
$ ./vldbj_test_misc_methods
$ ./vldbj_test_best_methods

After having executed these commands, the results of the experiments are contained in the folder tests/vldbj2019/results/. For creating TikZ figures and tables that visualize the results, run the script process_results.py.

7. Datasets

GEDLIB comes with several datassets which contain graphs given in the GXL file format. They are contained in the following subdirectories of the directory data/datasets/:

For each dataset, the directory data/collections/ contains an XML file which lists the contained graphs' GXL files along with their classes. These files match the document type definition data/collections/GraphCollection.dtd and can hence be used as input for ged::GEDEnv::load_gxl_graphs(). The Python script data/collections/sample.py can be used to generate samples of the datasets.

8. Directory Structure

After executing install.py, the directoy <GEDLIB_ROOT> has the following internal structure:

.
├── CMakeLists.txt --------------- used for building GEDLIB
├── doxyfile.in ------------------ used for creating the documentation
├── LICENSE.md ------------------- a copy of GNU LGPL
├── README.md -------------------- this file
├── install.py ------------------- installs GEDLIB
├── _data
| ├── _datasets ---------------- contains several datasets
| | └── ...
| └── _collections ------------- contains XML files that list the graphs in the datasets
| ├── GraphCollection.dtd -- document type definition for graph collections
| ├── sample.py ------------ generates a sample from a collection file
| └── ...
├── _ext ------------------------- contains external libraries
| └── ...
├── _docs ------------------------ contains documentation if GEDLIB has been installed with option --doc
| └── ...
├── _lib ------------------------- contains shared libraries if GEDLIB has been installed with option --lib
| └── ...
├── _src ------------------------- contains the sources of GEDLIB
| ├── CMakeLists.txt ----------- used for building GEDLIB
| ├── _edit_costs -------------- contains edit costs for several datasets
| | └── ...
| ├── _env --------------------- contains the architecture of GEDLIB
| | ├── ged_env.hpp ---------- include this header in your application
| | └── ...
| ├── _methods ----------------- contains various methods for computing GED
| | └── ...
| └── _util -------------------- contains utility classes and functions used by the GED methods
| └── ...
├── _median ---------------------- contains GEDLIB implementation of median graph computation
| ├── CMakeLists.txt ------- used for building the executables
| ├── _bin ----------------- contains the executables if GEDLIB has been built with option --median
| | └── ...
| ├── _collections --------- contains graph collections used by the median graph computation
| | └── ...
| ├── _output -------------- contains the median graphs once the executables have been run
| | └── ...
| └── _src ----------------- contains the sources
| └── ...
└── _tests ----------------------- contains tests
├── CMakeLists.txt
├── _pr2018 ------------------ tests for PR paper "Designing Heuristics for Graph Edit Distance"
| ├── CMakeLists.txt ------- used for building the test executables
| ├── _bin ----------------- contains test executables if GEDLIB has been built with option --tests
| | └── ...
| ├── _collections --------- contains graph collections used by the tests
| | └── ...
| ├── _output -------------- contains the results once the tests have been run
| | └── ...
| └── _src ----------------- contains the test sources
| └── ...
├── _sspr2018 ---------------- tests for SSPR paper "Ring Based Approximation of Graph Edit Distance"
| ├── CMakeLists.txt ------- used for building the test executables
| ├── _bin ----------------- contains test executables if GEDLIB has been built with option --tests
| | └── ...
| ├── _collections --------- contains graph collections used by the tests
| | └── ...
| ├── _output -------------- contains the results once the tests have been run
| | └── ...
| └── _src ----------------- contains the test sources
| └── ...
└── _unit_tests -------------- unit tests
├── CMakeLists.txt ------- used for building the test executables
├── _bin ----------------- contains test executables if GEDLIB has been built with option --tests
| └── ...
├── _collections --------- contains graph collections used by the tests
| └── ...
├── _output -------------- contains the results once the tests have been run
| └── ...
└── _src ----------------- contains the test sources
└── ...