1 Introduction
With strong representative abilities, graphs have a broad range of applications in a wide variety of fields, including social network study, computational chemistry gilmer2017neural , and biomedical image analysis ktena2017distance . Recently, especially with the development of deep learning technique, there aroused surging interests on graphrelated problems including graph similarity computation, one of the fundamental challenges.
Traditionally, Graph Edit Distance (GED) 6313167 and Maximum Common Subgraph (MCS) BUNKE1998255 are two dominant metrics developed for graph similarity computation problem. GED is the least number of elementary operations (nodes or edges insertion, removal, substitution) to transform one graph to the other. MCS evaluates graph similarity with an induced subgraph of the pair of graphs with as many vertices as possible (Maximum common induced subgraph) or a subgraph isomorphic to the pair of graphs with as many edges as possible (Maximum common edge subgraph). Existing algorithms for GED or MCS computation can be divided into two classes. The first one includes algorithms like A* for the exact values. The exact values for sure help us better evaluate the similarity between graphs, however, it is indeed an NPhard problem and requires exponential time complexity. The second class, which includes algorithms like Beam, VJ and Hungarian, only computes the approximate values and saves time in return. However, these algorithms still run with polynomial or even subexponential time complexity.
To date, graph neural networks (GNNs) have been proved to be potential datadriven solution for various graphrelated tasks with relatively high accuracy and far lower time cost in inference. Various graph convolution layers like GCN
NIPS2016_6081 , GAT velivckovic2017graph and GIN Xu2018HowPA are proposed and demonstrated powerful in embedding node features with both original labels and local topology. Also, as the structure of graph is quite different from that of image, advanced pooling techniques on graphs are developed, including TopKPooling gao2019graph , SAGPool lee2019self , DiffPool ying2018hierarchical and MemoryPool Khasahmadi2020MemoryBased .When we look at deep learning techniques in graph similarity computation problem, there always involves two stages: embedding, which map graphs to feature vectors and similarity computation based on these feature vectors. And we can clearly classify these techniques into two categories with the different methods in first stage as illustrated in fig. 1. The first category, exemplified by Hierarchically Mean and Hierarchical Max defferrard2016convolutional , directly maps graphs to feature vectors by hierarchically coarsening the graphs while the second category embraced by GMNs li2019graph embeds pair of graphs at the same time with a crossgraph matching mechanism. Other techniques like simGNN bai2018simgnn and GRAPHSIM bai2020learning are similar to the second category except that them interactively embed two graphs to one feature vector but not two for similarity computation. For simplicity, in the remaining content of this paper we will call the first category embedding models and the second matching models.
On the one hand, embedding models can be quite faster than matching models at inference stage. Assuming we have graphs, embedding models first embed all graphs to feature vectors for times and only compute similarity times based on feature vectors while matching models have to forward whole graphs times. Taking graph similarity search problem as another example, in which we need to compute the similarity of a new graph with all graphs in our database, embedding models can save the feature vectors of graphs in the database in advance, map the new graph to its feature vector and compute similarity only with feature vectors. Meanwhile matching models need to forward the whole new graphs with all the other graphs in the database every time when there comes a new graph, which is absolutely timeconsuming. Details about time complexity analysis will be shown in Section 3. On the other hand, as the feature vectors generated by embedding models do not involve interaction between graphs, the performance of embedding models are worse than matching models.
Intuitively, matching across two whole graph is always redundant, especially when we talk about large graphs, like large biological or chemical molecular, where similarity between graphs largely depends on "abstracted smaller graphs", or we can call them molecular groups in biology and chemistry. Thus as illustrated in fig. 2, we propose our hierarchical matchingbased graph similarity computation model, in which graph embedding layers and pooling layers are applied to encode and coarsen the graphs to "abstracted smaller graphs" and then matching models are used on "abstracted smaller graph pairs" to compute similarity between graphs. In the first two stages of our model, there is no interaction across graphs so this part can be finished in advance in inference time or similarity search problem in order to save time. And the matching model part guarantees the performance, which is even slightly better than to directly apply the matching models.
Thus we highlight our main contributions as follows:

We propose the first framework, which makes a trade off between embedding and matching models, to address the challenging problem of similarity computation between large graphs.

We provide an improved graph pooling layer based on Memorybased Pooling Layer introduced in Khasahmadi2020MemoryBased and achieve significantly better performance on graph similarity computation problem.

Our framework shows significant improvement in time complexity as compared to matching models and achieves even slightly better performance than matching models, thus far better performance than embedding models.

We conduct extensive experiments on both real graph datasets and selfbuilt datasets consisted of large graphs, which will be detailed in Section 4 to demonstrate the effectiveness and efficiency of our proposed model.
The remaining of this paper will be structured as follows. In Section 2, we introduce our model in detail. Then we conduct analysis about time complexity of our proposed model and the former algorithm in Section 3. Details about experiments are respectively shown in Section 4. And we make a conclusion in Section 5.
2 Proposed Framework
We will use bold font for matrix and tilt font for function in this part. The right superscript of a matrix stands for graph number indicator and the right subscript stands for the stage of the matrix. For example, means the first input graph and means the second graph after encoding.
As illustrated in fig. 2, there are three stages in our proposed model: encoding, coarsening (or say pooling) and matching. Assuming we get two input node sets denoted as and (the number of nodes in each graph is always different) as well as the adjacency matrix and , the whole model can be described as below. Note that in each stage, we first give the overall equation on how the stage works and then make detailed explanations on the overall equation.
2.1 Encoding
In the first stage, we employ encoding layers, , on the two input graphs respectively for embedding feature of nodes and transform the dimension of two node set respectively to and , just as shown in equation 1 below:
(1) 
where and is consisted of a graph convolution layer and a nonlinear activation as equation 2 shows.
(2) 
We use the classical for here and graph embedding layers can be GCN defferrard2016convolutional , GAT velivckovic2017graph or GIN Xu2018HowPA .
2.2 Coarsening (or Pooling)
What this stage does is to pool the encoded graphs, and , to "abstracted smaller graphs", and . As mentioned before, our pooling layer is inspired by Memorybased Pooling Layer introduced in Khasahmadi2020MemoryBased and we make significant improvement over their structure. The overall transformation is shown in equation 3, where is an assignment matrix and is a trainable matrix standing for a linear transformation.
(3) 
We will go through how we calculate the assignment matrix . Similar to the mechanism in Khasahmadi2020MemoryBased , we first generate memory heads (or say memory keys, standing for multiple batches of centroids for pooling operation) . Different from Khasahmadi2020MemoryBased , which proposes to randomly generate the memory heads at the very beginning and make them trainable, we apply linear transformation to transform to a matrix of and then reshape it to , as shown in equation 4. That is to say that our generation of memory keys is dependent on the input node set of the pooling layer, which makes more sense.
(4) 
Then we compute the respective relationship between memory head and . This portion is also different from Khasahmadi2020MemoryBased , where a normalized Student’s tdistribution is deployed. We here apply a normalized reciprocal of cosine similarity here, as described in equation 5, where is a very small number for avoiding the division by zero.
(5) 
We finally aggregate the information of relationship . As equation 6 shows, we concatenate and perform a trainable weighted sum ( convolution) () and a softmax activation () to the concatenated matrix, leading to our assignment matrix .
(6) 
2.3 Matching
This part similar to Graph Matching Network li2019graph aims to compute similarity based on the interaction of two coarsened graphs. There are three propagators and one aggregator to transform the pooled graphs to seperate feature vectors (D is the dimension for final feature vector). Cosine similarity computation is applied subsequently for the final similarity calculation. Equation 7 is an overall description of this part.
(7) 
Here in the propagators , we first propagate features inside respective graph with and across graphs with and then apply to embrace all these features including the original input. As shown in equation 8, an extract function is first applied on , to extract node pairs in single graph with edge indicated by , generating matrix . Then , which involves linear transformation layers , is deployed on . in equation 9 first generate a relationship mask the relationship matrix across the graph pair with , which simply involve matrix multiply for the normalized graphs and a softmax activation (), and then apply the mask to and subtract with the masked . After these two steps, we again concatenate all the feature we have and apply linear transformation layers in .
(8) 
(9) 
(10) 
As for the aggregator, we use the following module proposed in (Li et al., 2015).
(11) 
3 Time Complexity Analysis
For a pair of input graphs and , we can assume that they separately have , edges and the final embedded feature vectors for both graphs are and . Then we can analyse the time complexity over , and
. Note that due to the fact that there exists a lot of variance for each model, we here analyse the simplest cases for each category and the real time consumption will be presented later in the
Section 4. In the first three parts here, we discuss the time consumption from each input graph to its feature vector for embedding models, matching models and our framework. Finally in the fourth part, we use similarity search as an example to specify the time consumption comparisons in applications.3.1 Embedding models
Considering the simplest case here for embedding models, we only visit every edge once and deploy two computational operational on the two nodes it connect, which contributes to the feature of local topology. Thus the computation complexity for these cases is .
3.2 Matching models
Assuming the simplest case here for matching models, we first compute the relationship across and . This part involves computational operations because we have to calculate the connection between every node in to all nodes in . Then for each input graph, we also visit every edge once and deploy two computational operational on the two nodes it connect, which also makes contribution to the feature of local topology. Thus the computation complexity for these cases is .
3.3 Our framework
For our framework, we take the denotation in Section 2, that is the number of nodes in coarsened graphs is . In the embedding stage, we similarly have to visit every edge once and compute twice. Then in the pooling part, we make an transformation from dimensional space to dimensional space, which costs us computational operations. Finally in the matching stage, like what have been analysed in the previous section, we need operations. Thus the resulting complexity is .
3.4 Graph similarity search
In similarity search problem, we assume that we have a database consisted of graphs, each of which has nodes and edges, for simplicity. What we need to finish is to compute the similarity between all the graphs in the database and an incoming new graph (also with nodes and edges). In embedding models, we can compute all the feature vectors for graphs in the database in advance. And then when the new graph comes, we encode it to its feature vector, which cost us , and only compute similarity based on the feature vectors. Thus the time consumption is . In matching models, we can only forward pairs of graphs every time because the computation across graphs can not be finished at the very beginning. Thus the time consumption is extremely high at . Then the time complexity for our framework is . In our settings, is around , thus our framework costs far less time than matching models do, especially when the node number of the graphs is very large.
4 Experiments and Results
4.1 Dataset Information
Currently the number of nodes is relatively small in each graph in most of the existing graph datasets used in graph similarity computation, such as AIDS, LINUX, etc. Thus the characteristics of the entire graphs can be easily characterized and there is not that much need to extract "abstracted smaller graphs" by our coarsening step for better similarity computation results. From this point of view, we here use two types of datasets for evaluating the performance of our framework. The first one is relatively large graphs (with 15 or more nodes) in IMDB dataset. The second one is graphs created artificially with more nodes. We will show how we process the IMDB dataset and how we create the artificial dataset in the following parts.
4.1.1 Processing IMDB dataset
In this paper, we focus on similarity computation of large graphs, so we filter the original IMDB dataset and choose all the graphs that have 15 or more nodes. The new dataset is called IMDBL.
4.1.2 Artificial dataset information
To generate an artificial dataset, we need to first generate graphs and then give ground truth similarity for graph pairs. If we generate graphs with a large number of nodes randomly, all these graphs might be highly dissimilar to each other thus the similarity distribution of the graph pairs in the dataset will be uneven (only a very small portion of graph pairs are similar). To make the similarity distribution more uniform, we first generate a small number of basic graphs, and then generate some more similar derived graphs by pruning the basic graphs. We apply two different categories of rules here for generating the datasets: Barabási–Albert preferential attachment model (BA model) jeong2003measuring and ErdősRényi graph (ER model) erdos1959random bollobas2001random . The detailed description about this two models will be presented respectively in appendix B. We here generate 4 datasets here: BA60, BA100, BA200 and ER100, where the first two characters stand for generation rule and behind is a number n standing for the approximate number of nodes in the graph in that dataset. As for the ground truth calculation, A star algorithm cannot be used to calculate the ged distance because of the large number of nodes in our dataset. While there exists three wellknown approximation algorithms, Hungarian and VJ, and Beam, they are not able to ensure the accuracy. Thus we propose to use the minimum among four evaluation indicators: the three values calculate respectively by Hungarian, VJ, and Beam and the GED value we obtain while generating the graph, which will be detailed in appendix C. Note that we take the minimum value here but not the average value due to the fact that GED is the upper bound and the real GED value must be less than or equal to the GED generated here. We finally convert this minimum indicator to the similarity score with a normalization of the minimums followed by a exponential function to map the normalized value to . This conversion is shown in equation 12, where represents the total number of nodes in graph .
(12) 
Method  BA60  BA100  BA200  

MSE  MAE  TIME  MSE  MAE  TIME  MSE  MAE  TIME  
hungarian  186.19  332.20  >100  205.38  343.83  >100  259.06  379.38  >100 
vj  258.73  294.83  >100  273.94  404.61  >100  314.45  426.86  >100 
beam  59.14  129.26  >100  114.02  206.76  >100  186.03  287.88  >100 
GCNMean  5.85  53.92  2.10  12.53  90.88  2.37  23.66  127.82  2.87 
GCNMax  13.66  91.38  2.11  12.04  85.44  2.34  22.77  107.58  2.96 
SimGNN  8.57  63.80  4.68  6.23  47.80  5.19  3.06  32.77  8.41 
GSimCNN  5.97  56.06  3.31  2.28  32.54  4.37  3.16  35.88  8.25 
GMN  2.66  38.20  9.23  1.47  27.06  14.98  1.17  26.63  45.71 
CoSimGNN  1.83  31.55  4.57  1.03  23.16  4.86  0.50  17.44  5.72 
CoSimCNN  2.50  35.53  3.18  1.49  27.20  3.45  0.53  18.44  4.27 
CoSimMem  3.14  39.70  4.46  1.68  29.45  4.74  0.41  16.18  5.46 

The results for MSE and MAE are in

The results for time consumption over the whole test set is in seconds.
Method  ER100  IMDBL  

MSE  MAE  TIME  MSE  MAE  TIME  
hungarian  236.15  421.66  >100  2.67  16.60  >100 
vj  275.22  463.53  >100  7.68  22.73  >100 
beam  104.73  226.42  >100  0.39  3.93  >100 
GCNMean  16.58  92.98  2.42  22.17  55.35  9.72 
GCNMax  79.09  211.07  2.38  47.14  123.16  9.10 
SimGNN  6.37  45.30  5.53  7.42  33.74  25.05 
GSimCNN  3.53  35.59  4.57  5.01  30.43  18.26 
GMN  2.97  29.58  16.00  3.34  23.50  57.15 
CoSimGNN  1.16  26.11  4.82  2.29  21.78  17.52 
CoSimCNN  2.78  33.36  3.50  10.37  38.03  12.77 
CoSimMem  1.94  31.22  4.80  5.12  29.46  17.68 

The results for MSE and MAE are in

The results for time consumption over the whole test set is in seconds.
4.2 Experiment Settings
4.2.1 Data splitting
For each dataset, we randomly split 60%, 20% and 20% of all graphs as training set, validation set and test set respectively at the beginning and then fix these three parts through the whole process of experiments.
4.2.2 Evaluation Metrics
We apply six metrics to evaluate all the models: TIME, Mean Squared Error (MSE), Mean Absolute Error (MAE), Spearman’s Rank Correlation Coefficient () spearman1961proof , Kendall’s Rank Correlation Coefficient () kendall1938new and Precision at k (p@k). TIME is least time a model need to compute similarity over the whole test set. For embedding model, we first map all the graphs to feature vectors and then only compute similarity based on them. For our model, we first map all the graphs to coarsened graphs and every time forward a pair of coarsened graphs for matching stage. And we simply forward all pairs of graphs for matching models because interaction across whole graphs is required. MSE measures the average squared difference between the predicted similarities and the groundtruth similarities. and also measures how well the predicted results match the groundtruth. We compute the intersection of predicted top k similar results and the groundtruth top k similar results and divide it by k for p@k. Due to the limited length of this article, we present the results of MSE, DEV and TIME in the major part of this paper and the results for , and p@k will be shown in the appendix A.
4.2.3 Experiment Device
Intel(R) i58600K@3.60GHz cpu MHz 800.015, 6 cores
4.2.4 Baseline
There are three types of baselines involved in this part. The first category is traditional methods for GED computation, where we include A*Beamsearch (Beam) 10.1007/11815921_17 , Hungarian kuhn1955hungarian riesen2009approximate , and Vj jonker1987shortest 10.1007/9783642208447_11 . Beam is one of the variants of A* algorithm and its time complexity is subexponential. Hungarian, which is based on the Hungarian Algorithm for bipartite graph matching, and Vj based on the Volgenant and Jonker algorithm, are two algorithms in cubictime. The second category is the embedding models. We include GCNMean and GCNMax defferrard2016convolutional here. The third category is the matching models. We here involve SimGNN bai2018simgnn , GSimCNN bai2018convolutional and GMN li2019graph .
4.2.5 Setting in our proposed framework
We here provide experiments on three kinds of variants of our framework to demonstrate its ability in graph similarity computation problem. CosimGNN stands for our most powerful model, in which we first apply GIN Xu2018HowPA in the embedding stage, then our proposed pooling layer and finally our proposed matching mechanism. CosimMem is a model set to compare the performance of the MemPool Khasahmadi2020MemoryBased and our proposed pooling layer. Thus the only difference between the CosimMem and CosimGNN is that the pooling layer of CosimMem is MemPool Khasahmadi2020MemoryBased instead of our proposed pooling layer. And finally, we propose a fastest model CosimCNN, in which we deploy GIN Xu2018HowPA in the embedding stage then our proposed pooling layer and finally the matching mechanism in bai2018convolutional . And the parameters in our framework is as below: , (same as all baselines), , , , and .
4.2.6 Training settings
On BA datasets, we train each model for 2000 iterations. On ER and IMDBL dataset, we train each model for 10000 and 5000 iterations respectively because it is more difficult for all the models to obtain a good approximation on these two datasets. And batch size is set to 128 for all models.
4.3 Results and Analysis
Statistic results are shown respectively in Table 1 and 2. We here highlight best results respectively among four different categories: traditional methods, embedding models, matching models and our proposed models, for better understanding the results. We can find that the our framework always outperforms all the traditional methods, embedding models and matching models on MSE and DEV. And the least time consumption among three categories are highlighted in Table 1 and 2. We can find that our CoSimCNN runs faster than any matching models. When we take a comparisons among BA datasets, we can find that our time consumption keeps low as the node number increases. And the gaps on time consumption between our models and matching models keep increasing. Then we can take a look at the CosimMem and CosimGNN models and we will find out that in four out of the all 5 datasets, our pooling layer performs better than the MemPool layer Khasahmadi2020MemoryBased .
5 Conclusion
In this paper, we propose a framework for large graphs similarity computation, which first embeds graphs to a feature space, then coarsen (pool) the graphs and then compute the final feature vectors with interaction across graphs. We conduct experiments on various baselines and evaluation metrics to demonstrate that our novel framework not only makes a trade off between embedding models and matching models and reduce the time consumption of matching models, but it is also able to achieve highest accuracy among current graph similarity computation techniques.
References
 [1] Réka Albert and AlbertLászló Barabási. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002.
 [2] Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, and Wei Wang. Simgnn: A neural network approach to fast graph similarity computation, 2018.
 [3] Yunsheng Bai, Hao Ding, Ken Gu, Yizhou Sun, and Wei Wang. Learningbased efficient graph similarity computation via multiscale convolutional set matching.
 [4] Yunsheng Bai, Hao Ding, Yizhou Sun, and Wei Wang. Convolutional set matching for graph similarity. arXiv preprint arXiv:1810.10866, 2018.
 [5] Béla Bollobás and Bollobás Béla. Random graphs. Number 73. Cambridge university press, 2001.
 [6] Horst Bunke and Kim Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(3):255 – 259, 1998.
 [7] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3844–3852. Curran Associates, Inc., 2016.
 [8] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844–3852, 2016.
 [9] Paul Erdos. On random graphs. Publicationes mathematicae, 6:290–297, 1959.
 [10] Stefan Fankhauser, Kaspar Riesen, and Horst Bunke. Speeding up graph edit distance computation through fast bipartite matching. In Xiaoyi Jiang, Miquel Ferrer, and Andrea Torsello, editors, GraphBased Representations in Pattern Recognition, pages 102–111, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
 [11] Hongyang Gao and Shuiwang Ji. Graph unets, 2019.
 [12] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry, 2017.
 [13] Hawoong Jeong, Zoltan Néda, and AlbertLászló Barabási. Measuring preferential attachment in evolving networks. EPL (Europhysics Letters), 61(4):567, 2003.
 [14] Roy Jonker and Anton Volgenant. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing, 38(4):325–340, 1987.
 [15] Maurice G Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
 [16] Amir Hosein Khasahmadi, Kaveh Hassani, Parsa Moradi, Leo Lee, and Quaid Morris. Memorybased graph networks. In International Conference on Learning Representations, 2020.
 [17] Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, and Daniel Rueckert. Distance metric learning using graph convolutional networks: Application to functional brain networks, 2017.
 [18] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(12):83–97, 1955.
 [19] Junhyun Lee, Inyeop Lee, and Jaewoo Kang. Selfattention graph pooling. arXiv preprint arXiv:1904.08082, 2019.
 [20] Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. Graph matching networks for learning the similarity of graph structured objects, 2019.
 [21] Michel Neuhaus, Kaspar Riesen, and Horst Bunke. Fast suboptimal algorithms for the computation of graph edit distance. In DitYan Yeung, James T. Kwok, Ana Fred, Fabio Roli, and Dick de Ridder, editors, Structural, Syntactic, and Statistical Pattern Recognition, pages 163–172, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
 [22] Kaspar Riesen and Horst Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing, 27(7):950–959, 2009.
 [23] A. Sanfeliu and K. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, SMC13(3):353–362, 1983.
 [24] Charles Spearman. The proof and measurement of association between two things. 1961.
 [25] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
 [26] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? ArXiv, abs/1810.00826, 2018.
 [27] Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling, 2018.
Appendix A Results for , and p@k
Method  BA60  BA100  

p@10  p@20  p@10  p@20  
hungarian  0.5772  0.7598  0.7425  0.8438  0.6036  0.8110  0.6100  0.9900 
vj  0.0229  0.0329  0.3500  0.5050  0.4156  0.5837  0.4625  0.8262 
beam  0.7434  0.8580  0.6775  0.9000  0.6283  0.7867  0.6275  0.9000 
GCNMean  0.5329  0.7564  0.5800  0.8688  0.5338  0.7639  0.5650  1.0000 
GCNMax  0.5230  0.7461  0.5450  0.8662  0.5304  0.7617  0.5250  0.9987 
SimGNN  0.5678  0.7730  0.7100  0.8887  0.5383  0.7637  0.5800  1.0000 
GSimCNN  0.6047  0.8078  0.6775  0.9050  0.6169  0.8232  0.6700  1.0000 
GMN  0.5467  0.7636  0.6000  0.8900  0.5450  0.7722  0.5325  1.0000 
CoSimGNN  0.6369  0.8273  0.6950  0.9112  0.6168  0.8210  0.6850  1.0000 
CoSimCNN  0.6059  0.8057  0.7475  0.9050  0.6335  0.8332  0.7100  1.0000 
CoSimMem  0.5727  0.7881  0.6050  0.8950  0.5585  0.7840  0.5325  1.0000 
Method  BA200  ER100  

p@10  p@20  p@10  p@20  
hungarian  0.5810  0.7938  0.6425  0.9400  0.3757  0.5404  0.6600  0.7400 
vj  0.4310  0.6191  0.4850  0.8037  0.2876  0.4030  0.5950  0.6637 
beam  0.6521  0.7724  0.5600  0.8350  0.4255  0.5261  0.5800  0.6937 
GCNMean  0.4946  0.7347  0.5000  0.9500  0.4708  0.6704  0.5175  0.7862 
GCNMax  0.5169  0.7499  0.5375  0.9425  0.4313  0.6244  0.5150  0.7738 
SimGNN  0.4889  0.7347  0.5275  0.9513  0.5848  0.7212  0.6575  0.8177 
GSimCNN  0.5682  0.7968  0.5900  0.9500  0.6085  0.8047  0.6950  0.8500 
GMN  0.5787  0.7958  0.6025  0.9500  0.6471  0.8256  0.8000  0.8338 
CoSimGNN  0.5814  0.8010  0.6200  0.9500  0.6141  0.8154  0.7650  0.8325 
CoSimCNN  0.6020  0.8092  0.6850  0.9512  0.5998  0.8012  0.7050  0.8150 
CoSimMem  0.5857  0.8006  0.5925  0.9512  0.5820  0.7787  0.6000  0.8212 
Method  IMDBL  

p@10  p@20  
hungarian  0.8318  0.9309  0.7487  0.8092 
vj  0.8334  0.9332  0.7460  0.8151 
beam  0.9040  0.9601  0.9092  0.9072 
GCNMean  0.3620  0.4664  0.4868  0.7026 
GCNMax  0.1736  0.2462  0.3908  0.4460 
SimGNN  0.3935  0.5270  0.5566  0.6158 
GSimCNN  0.4987  0.6626  0.6210  0.6414 
GMN  0.5498  0.6866  0.6732  0.6908 
CoSimGNN  0.5709  0.7164  0.6671  0.7039 
CoSimCNN  0.5499  0.6855  0.6421  0.7032 
CoSimMem  0.4777  0.6141  0.5697  0.6414 
Appendix B BA and ER model
b.1 Barabási–Albert preferential attachment model.
Here we introduce the concept of BAmodel, the rules for generating a BAgraph, and how our datasets are produced. The Barabási–Albert (BA) model jeong2003measuring is an algorithm for generating random scalefree networks using a preferential attachment mechanism. Several natural and humanmade systems, including the Internet, the world wide web, citation networks, and some social networks are thought to be approximately scalefree and certainly contain few nodes (called hubs) with unusually high degree as compared to the other nodes of the network. The BAmodel tries to explain the existence of such nodes in real networks and it incorporates two important general concepts: growth and preferantial attachment, which exist widely in real networks. Growth means that the number of nodes in the network increases over time and preferential attachment means that the more connected a node is, the more likely it is to rceive new links. Nodes with a higher degree have a stronger ability to grab links added to the network.
The BAmodel begins with an initial connected network of nodes. New nodes are added to the network one at a time. Each new node is connected to
existing nodes with a probability that is proportional to the number of links that the existing nodes already have. Formally, the probability
that the new node is connected to node is albert2002statistical , where is the degree of node and the sum is made over all preexisting nodes (i.e. the denominator results in twice the current number of edges in the network). Heavily linked nodes ("hubs") tend to quickly accumulate even more links, while nodes with only a few links are unlikely to be chosen as the destination for a new link. The new nodes have a "preference" to attach themselves to the already heavily linked nodes.Our dataset is made up of some basic graphs and derivative graphs that have been trimmed, which solve several problems:

When generating a graph with a large number of nodes randomly, there is a high probability that the generated graphs are dissimilar between each other, which result in an uneven similarity distribution after the normalization described in equation 12.

Due to the large number of nodes in each graph, the approximate GED algorithm cannot guarantee that the calculated similarity can fully reflect the similarity of the graph pairs. We trim and generate derivative graphs while recording the number of trimming steps. These steps and the values calculated by the approximation algorithm take the minimum value as the GED with the basic graph, thereby obtaining a more accurate similarity.

By trimming different steps, we can generate graphs with different similarities, which is more conducive to the experiment of graph similarity query.
There are three types of trimming methods here: delete a leaf node, add an edge, and add a node. Since deleting an edge may have a greater impact on the result of the generated graph, we will not consider this method. We try to trim the base graph without changing the global features of the base graph to generate more similar graph pairs. In this way, we get three datasets according to the following generation rule.
A BAgraph of nodes is grown by attaching new nodes each with edges that are preferentially attached to existing nodes with high degree. We set to be 60, 100, and 200, respectively, and is fixed to 1, to generate basic graphs. Each subdataset generates two basic graphs, and each base graph is trimmed with different ged distances. For each basic graph, generate 99 trimmed graphs in the range of ged distance 1 to 10. So each subdataset consists of two basic graphs and 198 trimmed graphs.
b.2 ErdősRényi graph or a binomial graph model.
In game theory, ErdősRényi model stands for either of two similar models for generating random graphs, named respectively after mathematicians Paul Erdős and Alfréd Rényi. The definition of these two models are as follows:

In the model, a graph is chosen uniformly at random from the collection of all graphs which have n nodes and M edges. For example, in the G(3, 2) model, each of the three possible graphs on three vertices and two edges are included with probability 1/3.

In the G(n, p) model, a graph is constructed by connecting nodes randomly. Each edge is included in the graph with probability p independent from every other edge. Equivalently, all graphs with n nodes and M edges have equal probability of . The parameter p in this model can be thought of as a weighting function; as p increases from 0 to 1, the model becomes more and more likely to include graphs with more edges and less and less likely to include graphs with fewer edges. In particular, the case p = 0.5 corresponds to the case where all graphs on n vertices are chosen with equal probability.
There are three types of trimming methods here: delete a leaf node, add an edge, and add a node. Since deleting an edge may have a greater impact on the result of the generated graph, we will not consider this method. We try to trim the base graph without changing the global features of the base graph to generate more similar graph pairs. In this way, we get three datasets according to the following generation rule.
A ERgraph connects each pair of n nodes with probability p. Like the BA model, we set n to be 100, and adjust the value of p so that the graph is not too dense. We generate the base graphs and trim them in the same way, so each dataset consists of two basic graphs and 198 pruned graphs too.
b.3 Statistic of datasets
We here present all the statistic of datasets in Table 6.
Dataset  #Graphs  #Pairs  Min #Nodes  Max #Nodes  Avg #Nodes  Min #Edges  Max #Edges  Avg #Edges 

BA60  200  40K  54  65  59.50  54  66  60.06 
BA100  200  40K  96  105  100.01  96  107  100.56 
BA200  200  40K  192  205  199.63  193  206  200.16 
ER100  200  40K  94  104  99.90  98  107  101.43 
IMDBL  381  145K  15  89  24.10  33  1467  166.58 
Appendix C Details about the GED value obtained when converting basic graphs to derived ones.
As shown in Figure 3, when generating a derivative graph with a specific GED from the basic graph, we randomly select among the above three methods (randomly delete a leaf node, add a leaf node or add an edge) to generate a set of operations, and the sum of GED accumulated by all operations is the specific GED value. As proposed formerly, we then take the minimum value of the trimming GED and the calculated GED with the three algorithms as the final GED value.
Comments
There are no comments yet.