Information cascade prediction of complex networks based on physics-informed graph convolutional network

Cascade prediction aims to estimate the popularity of information diffusion in complex networks, which is beneficial to many applications from identifying viral marketing to fake news propagation in social media, estimating the scientific impact (citations) of a new publication, and so on. How to effectively predict cascade growth size has become a significant problem. Most previous methods based on deep learning have achieved remarkable results, while concentrating on mining structural and temporal features from diffusion networks and propagation paths. Whereas, the ignorance of spread dynamic information restricts the improvement of prediction performance. In this paper, we propose a novel framework called Physics-informed graph convolutional network (PiGCN) for cascade prediction, which combines explicit features (structural and temporal features) and propagation dynamic status in learning diffusion ability of cascades. Specifically, PiGCN is an end-to-end predictor, firstly splitting a given cascade into sub-cascade graph sequence and learning local structures of each sub-cascade via graph convolutional network , then adopting multi-layer perceptron to predict the cascade growth size. Moreover, our dynamic neural network, combining PDE-like equations and a deep learning method, is designed to extract potential dynamics of cascade diffusion, which captures dynamic evolution rate both on structural and temporal changes. To evaluate the performance of our proposed PiGCN model, we have conducted extensive experiment on two well-known large-scale datasets from Sina Weibo and ArXIv subject listing HEP-PH to verify the effectiveness of our model. The results of our proposed model outperform the mainstream model, and show that dynamic features have great significance for cascade size prediction.


Introduction
Recent years have witnessed an increase in the prevalence of social media, which provides a convenient platform for individuals to express their various opinions.Social media platforms such as Weibo and Twitter have become the main access for users to obtain a large amount of information, bringing significant convenience for generating, delivering and propagating the messages, which spur the phenomenon of information cascades [1,2].On the other hand, scientific information propagation prediction [3] is important for understanding the academic/scientific network [4] and scientific impact, potentially allowing us to better understand the principles underlying academic development [5].A picture, a text, or anything else considered information can be shared by people more than once, and that information propagation is termed an information cascade.Most information cascade sizes are small, but when a super cascade occurs, it can result a significant influence on the Internet [6].On the other hand, information cascades play vital roles in many areas, such as advertising influence maximization [7][8][9] and influence prediction [10,11].For example, the negative information cascade for society event, especially fake news, can do harm to society stability.Therefore, exploiting the information propagation law and predicting information cascade size after a certain time-period becomes significant.
The main purpose of cascade growth prediction is to capture the observed information diffusion features in an early state and to predict the exact popularity of a cascade at a future moment [12].Existing methods focus on three categories in general [1]: the feature engineering approaches, generative approaches and deep learning-based method.Early works focus on feature-based approaches, which heavily rely on handcrafted features extracting (e.g.user profile [13], content features [14][15][16], structural features [6,17], temporal features [18]), and general work in a supervised classifier with machine learning.But these methods strongly depend on experts' prior knowledge and rarely transfer features to new domain [3].Generative approaches [19,20] simulate future retweet dynamics by modeling a function of arrival process for each message and predicting information popularity.However, such methods have lower desirable predictive power and are not compatible with large-scale data.Moreover, they only consider explicit patterns of propagation and fail to include other useful factors (e.g.human activity, individual difference and social effect).Those two approaches have the limitation in combining both structural and temporal features.
Fortunately, the deep learning network can overcome the challenge in the combination of structural and temporal features.Deep learning methods, especially the technique for sequence learning [21] and graph learning [22][23][24] , entail various related diffusion models being applied to make predictions of information population size [25,26].Deep learning frameworks automatically learn node features and underlying propagation mechanism through end-to-end models, mapping an early cascade graph into a low-dimension space and then predict the information cascade size.Most deep learning-based works focus on the frameworks of graph embedding and recurrent neural networks (RNNs), specializing in dealing with sequence and spatial information, respectively.Existing graph embedding methods, such as random walk and graph convolution network (GCN), are introduced for cascade graphs to learn the graph embedding and obtain nodes representations.Cascade graphs are considered to be the evolved sequence of directed acyclic graph [3] where a directed path represents a diffusion path on social network.Cascade graph embedding tends to explore microscopic diffusion process, as well as obtain cascade structural features of propagation.However, general graph embedding methods ignore the dynamic evolution of time-series variation.Therefore, RNN methods such as long short-term memory (LSTM) are applied to learn temporal dynamics of propagation and simulate next propagation state from current state for better performance.
More recently, deep learning methods are widely applied in various physical science and can be carefully analyzed with the input and output of physical data, such as material sciences [27], fluid mechanics [28] and complex networks [29].The information diffusion is taken as a dynamical complex system, traditional RNNs only split time data into small periods (fixed physical snapshots) and also fail to delve into dissemination rules.Designing a suitable architecture to learn nodes and cascade continuous dynamic embedding of both spatial and temporal features is a major challenge.To tackle those challenges, partial differential equations (PDEs) methods have emerged by delving into the underlying physics rules and obtaining dynamics through its derivative, obtaining physical properties of information propagation, such as the propagation velocity and the topology network evolution.The latest advances in deep learning algorithms have shown that it is possible to embed PDEs into deep learning models.For example, Zhang and Wang [30] proposed that the properties of derivation have ability to obtain dynamic changes of time and space.On the other hand, the physics-informed neural network (PINN) framework [28] was presented to solve PDEs by incorporating physics rules as a penalization term for the cost function of the neural network training process.When the explicit physics expression is not available, we can also discover PDEs embedded from the observed data, which provides the opportunity to us to create a dynamic framework from spatial and temporal evolution of information diffusion.
In this paper, motivated by PINN, we present a novel framework called Physics-informed graph convolutional network (PiGCN) for cascade prediction, which incorporates both explicit features (structural and temporal features) and potential dynamic features for predicting the future popularity size of a cascade.Specifically, structural features consist of the number of nodes, the in-degree and the out-degree of each node, the activation state of node and topology features.Temporal features are represented in each snapshot, allowing us to calculate the retweet time of information diffusion.Inspired by neural dynamics on complex network [30] and the physical definition of derivatives, we found that a PDE-like framework can be applied for the modeling of cascade information diffusion.The derivative with respect to time captures implicit velocity changes during the propagation process, while the derivative with respect to space obtains the implicit evolution feature of the propagation structure.Additionally, PiGCN divides an information cascade Figure 1.An simple and intuitive example of information cascade on Sina Weibo.High-impact Weibo Users rapidly propagate information to a large number of followers.On the contrary, low-impact users do not experience immediate retweets and fail to accumulate a substantial number of retweets within the same time step as high-impact users.
into sub-cascade graph sequences and then learns the local structure of each sub-cascade using a graph convolutional network (GCN).Our proposed model adopts the topology structure and the time information as input data and outputs particular properties of information diffusion such as propagation velocity, structure evolution and the cascade size.For the real world, such as Sina Weibo-is a popular continuous media that has been studied many times-we can see a very simple and intuitive example in figure 1. High impact Weibo users can instantly spread the information to many followers, and will be retweeted many times in the future, in other words, the initial sub-cascade of high impact user has many nodes in a short space of time.Conversely, low impact Weibo users cannot be retweeted instantly, and will not be retweeted many times.Hence, we cannot find the initial sub-cascade with many nodes for low impact users.
From this perspective, we design a dynamic neural network combining PDE-like equation and deep learning method to extract potential dynamic statues of cascade diffusion, obtaining the dynamic evolution rate both on structural and temporal changes.To summarize, the main contributions of this work are as follows: (1) Simplifying feature handling: we introduce a new approach to employ explicit features, such as structural, temporal and user impact features of cascades, avoiding the challenge of dealing with text and image features.In addition, we utilize potential dynamic features (propagation velocity and structure evolution) of information diffusion for predicting the growth size of a given cascade.(2) Method for extracting dynamic features: our proposed framework takes the cascade diffusion as a dynamical system with states evolving over propagation time and space.Inspired by PINN and PDE methods, it utilizes a dynamic time-and space-dependent PDE-like network to calculate the derivatives of the propagation process with respect to time and space to extract the latent changes in the propagation information.(3) Introduction of physics-informed framework: the framework innovatively incorporates available, yet incomplete, physics-informed knowledge (scientific principles) into popularity prediction network.The physics constraints are embedded in the loss function to capture dynamic information of cascade diffusion.
The remainder of this paper is organized as follows.Section 2 introduces related work.In section 3, we propose model PiGCN in detail, and we present the experimental results and discussion in section 4. Finally, section 5 is the conclusion.

Related work 2.1. Cascades growth prediction
Cascade prediction is mainly divided into two approaches: classification and regression [31].Classification focuses on whether a piece of information will be retweeted by many times or not [6].Regression predicts the size of a cascade [32].The existing approaches for information cascades prediction is mainly divided into three categories: feature engineering, stochastic process models and deep learning models.
Feature extraction plays an important role in predicting information cascade.Feature-based methods are regarded as a classification or regression task and pay attention to bespoke features that have a greatest impact on information propagation.The characteristics are categorized into four groups, including content features, user attributes, cascade structure and temporal features [1].These obtained features are used as input information to feed into classical machine learning models to predict the popularity of information cascades.Therefore, the suitable features or the right combination of features is of great importance for the performance.However, for feature-based models, the performance heavily relies on quality bespoke features, which are difficult to obtain effective information for prediction and also cost time.
On the other hand, generative process methods always regard the cascade increment or the popularity cumulation as an arrival process of messages and model the intensity function for each message arrival independently [26,33].That describes the evolution of information cascade distribution over time [34] and models a stochastic process of interactive behavior [35].There exist two typical generative processes: Poisson process and Hawkes process [3].Shen et al [36] first employed reinforced Poisson processes to model the stochastic popularity dynamics and incorporate it into a Bayesian framework to simulate the arrival process of individual popularity.Hawkes processes [33] construct a general framework that combine Hawkes self-exciting point process to model each cascades and distinguish the incentive size of each forward, to improve the prediction performance.However, generative model are directly designed to simulate the popularity rather than predict it.Those models rely on strong assumptions, which lead to over simplification and thus usually underperform in real tasks.
Recently, deep learning methods have been employed to overcome drawbacks of feature engineering and stochastic process models.Li et al [25] proposed an end-to-end predictor to automatically learn low dimensional representation of individual cascade graph rather than adopt hand-crafted features.Cao et al [19] extend Hawkes process by considering the time decay effect and employ deep learning methods to enhance the performance.However, these methods fell in learning the global cascade graph.Hence, Chen et al [3] leverage GCN to learn the representations of whole cascade graph with both structural and temporal information.Feng et al [37] learns low dimension representation of cascade graph by adopting constructed structure and content proximity-based higher-order graph.Wang et al [38] combine a GCN to learn the representations of cascade graph and a LSTM to extract temporal information, which take both structure and temporal feature into account.These works on GCNs, which all delve into propagation behavior by exploring spatial and temporal features of that behavior.Hence, in this paper, our model tries to only use the spatial and temporal features, and aims to explore the impact of spatial and temporal dynamic features for information propagation system.We do not address the context of the information.

Physics-informed deep learning
Deep Learning methods are considered as black box functions and the model was fully driven by data, which mean the potential relationship between data can only be found by network information without interpretation.Fortunately, PINN [28,39] were proposed to integrate PDEs into deep learning network.This method adds prior knowledge to improve generalization ability, discover hidden information from the available data in space-time domain, and enhance the causality of deep learning methods.
Hence, to improve the prediction performance and lend physical meaning to the neural networks, we consider the following PDE for the solution, the general PDEs can be defined as: where Π is a nonlinear function of time t, space x, complex-valued solution u and partial derivatives of u with respect to x and t [28].u x , u xx indicate partial one-order and two-order partial derivatives of function u, respectively.In this paper, we only adopt the result of partial one-order.
In PINN method, the solution u is regarded as a deep learning network to replace the partial differential expression, avoiding the need to solve complex differential problem by leverage the fitting ability of deep learning.Beside, this method takes full advantage of the automatic partial differential mechanism of deep learning, which contributes to simplifying solution process.Then, we approximate both the solution u and the nonlinear function Π with a shared deep neural network and define a deep physics model f, which is formulated as: The increment size of p i after observation time window g The general idea of PINN is that the solution neural network u(t, x) shares the same network and parameters with the physics-informed network f(x, t).We apply the chain rule to obtain the derivations of the solution network u with respect to time t and space x.Physically speaking, t is each time step within the observed time window and x is the subgraph of cascade network at the corresponding time step.Initially, research of PINN methods only focused on how to solve the complex PDEs and their inverse problems.Then, inspired by recent development in PINN learning, a large number of related works appeared, applying PINN method to explore many other fields.For instance, Shaier et al [40] proposed disease informed neural network based on PINN by combining the neural network and SIR compartmental models to predict disease spread and their progression.Wang et al [41] incorporated a PINN into super resolution technique to reconstruct high-resolution images from low-resolution images.Shi et al [42] reconstructed the traffic variables by introducing a hybrid framework, physics-informed deep learning, combining traffic flow models and deep learning to predict traffic state.In the field of power systems, Stiasny et al [43] leveraged PINN for discovering the frequency dynamics of power system to achieve dynamic security assessment.
We acquire the derivatives to compute the residual networks as the dynamics evolution of time and space when information cascade propagation occurs.

Methodology
Since there are no available PDEs to directly describe the information cascade as a complex system, we utilize a GCN-based neural framework with PDE-like penalization function to explore the physics rules of the propagation system through a latent space for better cascade size prediction performance.

Problem definition
We consider the cascade size prediction as a regression problem aiming at making a size prediction of information cascades to describe the process of information diffusion.We now introduce a formal description of our problem setting and preliminary knowledge necessary to understand our work.The notations and symbols used as summarized in table 1.
Definition 1 cascade graph.Let G = (U, E) be a global social network, where U is the set of users (nodes) in the network and E is the edges between nodes.Suppose we have n posts, Post-i covers a part of the network G, producing a subset of nodes that retweet or adopt the message, which form the cascade graph C i = (U i , E ij , T) of the post.U i ⊆ U is donated as a set of users that have participated in the cascade C i within a duration T after the source post.An edge E ij represents the user-U i retweet the information from user-U j .A time label T denotes the observed time of the information cascade.
Definition 2 sub-cascade graph.Cascade graph C T i denotes global propagation information of post-i within an observed time window T. We get different sub-cascade g i , t j } of cascade information according time t j .For example, figure 2 illustrates how the information cascade is divided into several sub-cascade.Here, U 0 represent the source post, we can define the sub-cascade ..,which forms the set of sub-cascade graphs sequences and captures all topological information of cascade graph C T i in observation.Here, the intervals among time steps are equal.More importantly, different sub-cascade will provide different space input (topological information) for the method of neural networks.
Definition 3 growth size.In our study, information cascade size is defined as the number of incremental retweets ∆y Tp i of post p i after the observation time window T, which is the prediction time window T p shown in figure 2.More specifically, the growth size ∆y

PiGCN: model overview
The basic idea of PiGCN is to learn both structural and temporal dynamic changes from information diffusion on social networks for accurate cascade prediction.In this section, we present the construction of our proposed framework for popularity prediction based on continuous-space dynamics over time.
As shown in figure 3, the cascade prediction framework is divided into three main components: structural representation capture, cascade size prediction and the dynamic neural network.Firstly, PiGCN splits the global cascade graph into sequence sub-cascade graphs according the time.Secondly, structure representation capturing module of PiGCN takes the sub-cascade graph sequence as input variable to learn latent spatial features over time.This module adopts an undirected GCN to learn the global and local graph structural features of information diffusion.Then, in the cascade size prediction step, both spatial features and explicit time are taken as input values to make the popularity size prediction of the diffusion system by deep learning method.Besides, we compute the spatial and temporal derivatives to acquire both spatial and temporal dynamic changes of information cascade diffusion.Finally, we leverage an DNN framework to i is derived from the cascade graph based on time intervals.The source post U0 adds a self-connect to safeguard against the loss of its own information.(c) The corresponding adjacency matrix sequence A T i is obtained from sub-cascade graph.This matrix series enables us to capture local structure of the propagation and explore the relationship between users.A value of 1 indicates a connection between two users, while 0 indicates the absence of a retweet.model the dynamics equation Π(t, x, cas, cas t , cas x )7 as dynamics of information propagation system.Moreover, our model has set up a dynamics between cascade prediction and spatial features.

Structural representation capturing
Given a post p i , we use the observed cascade graph C T i and generate new snapshots when new nodes are activated, forming a sequence of sub-cascade graphs G T i , which represents different network topology at different time.G T i is denoted as: The first sub-cascade graph g t0 i is the source post and only contains a single node, so we add a self-connection of the source node.As it is shown in figure 4, the adjacency matrix enables us to capture local structure of the propagation substrate and explore the relationship between users, with a value of 1 indicating there is a connection between the two users and 0 indicating that there is no retweet.Since adjacency matrix A T i can be on behalf of cascade structure, G T i is represented with a sequence of adjacency matrix A T i = {a t0 i , a t1 i , a t2 i , . .., a tm i }.In order to capture the structural features of the cascade graph, we utilize GCN in our model.GCN is based on neighbor aggregation and learns the structural features of graph.For a cascade C T i , the input to GCN layers consists of two parts, the node features and the related adjacency matrix.We first generate an initial embedding for each node in a global cascade graph C T i .Node x can be represented as the vector x ∈ R N (1 × N matrix), where N is the number of nodes, the xth element is 1 and the remaining elements are 0. In addition, considering high-impact and low-impact users, we take into account both in-degree and New J. Phys.26 (2024) 013031 D Yu et al out-degree features of each node.Since feature representations are high-dimension and sparse, we employ a feature embedding layer to map them into low-dimension space to acquire dense real-value vectors.Formally, the initial embedding vector of each nodes can be defined as: where W e ∈ R N * d is the embedding matrix, d is the vector dimension.After obtaining the initial nodes features, we firstly build the operators of convolutional graph neural network (GNN) of each sub-cascade graph, which is defined as: where D t j and I t j are the degree matrix (D ii = Σ j A ij ) and identity matrix of the sub-cascade graph at time t j , respectively.
We then acquire sub-cascade structural features by a network consisting of two GCN layers for graph.The equation is written as follows: where H l represents the current structure state of information cascade, and l is the number of layers.W l and b l is weight and bias of the network.σ represents the activation function, and ReLU is used as activation function.
Finally, we design a graph pooling to obtain global structure representation of each sub-cascade graph, which is defined as: (7) where N is the number of nodes in sub-cascade graph, and H un is the embedding of nodes u n .

Cascade size prediction
To make the cascade size prediction, we take time as an explicit features.So the input of this module can be defined as: where t is the time after the source post is published, and [X∥Y] is the concatenation of vectors X and Y.
Then we apply a fully connected layer for computing popularity size the module comprised of two hidden layers to yield the popularity size.Formally, the predicted output can be defined as: where W l and b l denotes the weights and biases of the DNN model at layer-l, respectively.ŷcas represents the predicted growth size of information diffusion.

Dynamic neural network
Since there are no exact equations directly to describe the process and dynamic change of information cascade diffusion, we propose a dynamic neural network based on a PDE framework to explore the physics information of the diffusion system, which is defined as : The PDE-like function f represents the dynamics of information propagation system, which creates a dynamical relationship between cascade prediction and the structural features H sub through the derivative dŷ cas /dH sub .Besides, it explores the dynamic temporal features (such as propagation speed) in cascade propagation described as derivative dŷ cas /dt.Operation Π denotes a deep learning framework.The dynamics neural network takes derivatives dŷ cas /dt and dŷ cas /dH sub as input values.Therefore the operator is denoted as: Finally, we leverage the penalization function to construct a PDE-like penalization term added to cost function.As it shown in equation ( 11), the penalization consists of the spatial and temporal derivative from the cascade information.So the cost function is defined as: where N is cascade number, ∆y Tp i denotes the ground-truth labels.The dynamic features function is added to the cost function as a penalization terms.

Dataset
We evaluate the effectiveness of PiGCN with two large-scale publicly available information cascade datasets, Sina Weibo and HEP-PH.
Sina Weibo dataset is the most popular dataset for information cascades prediction, which was collected from the largest microblog platform in China [19].It contains 119 313 tweets posted in 1 June 2016, which is shown in table 2. Weibo dataset contains information such as message identity, publication time, retweet time and retweet relationships.In this paper, our model only use those spatial and temporal features to explore the impact of spatial and temporal dynamic features for information propagation system.We ignore the impact of content on information diffusion.The tracks of all retweets within the next 24 h and cascades with fewer than 10 retweets are filtered out.Each tweet generates an information cascade.In this experiment, we follow similar experimental setup as in DeepHawkes [19] and CasCN [3], setting observation time T = 1 h, 2 h and 3 h, the cascades with the publication time from 8am to 6pm and the prediction time was the remain hours to 24 h.Finally, we sort the cascade according to their publication time after propagation.
Following the previous work [3], we utilize the dataset HEP-PH (high energy physics phenomenology citation graph) to validate the performance of our proposed model.The dataset is usually used to demonstrate the generalization capability of the information cascades prediction algorithm, and available on e-print arXiv and consists of 34 546 papers with 421 578 edges, which are published in the period from January 1993 to April 2003.It was originally released for the 2003 KDD (Knowledge Discovery and Data Mining) Cup and contains information such as paper identity, citation relationships and publish times.In our experiment, we excluded cascade sequences with lengths fewer than 10 times and set the observation time T to 1 year, 2 years and 3 years.
For all datasets, we choose the first 70% of all cascades for training data, 15% for validation, and the rest for testing.

BaseLines
As we mentioned in section 2, existing approaches for popularity prediction are categorized into three categories.Therefore, we have selected several common and state-of-the-art methods from each category.We adopt features-based methods proposed by [6,44] to predict cascade growth and DeepHawkes [19] for generative process methods.For deep learning methods, we choose the high accuracy and popular work such as DeepCas [25], CasCN [3], CasGCN [45], CCasGNN [46] as baselines.
Feature-based model [6,44] employs pre-selected features and linear regression to make information cascade predictions.The method is based on handcrafted features include structural features and temporal features, which are sent to two predictors Feature-Linear and Feature-Deep.Feature-Linear uses linear regression to predict growth size, while Feature-Deep introduces a fully connected network to make the prediction.DeepCas [25] is the first end to end deep learning model in information cascade prediction field.It applies random walks to generate propagation sequences and uses GRU and attention mechanisms for cascade prediction.
DeepHawkes [19] incorporates deep learning method into point process for cascade prediction, which aims to bridge the gap between prediction and interpretation of information cascade.
CasCN [3] first employs a GNN to make cascade predictions.It divides the cascade graph into several sub-cascade graphs by time information.Then, the structural and temporal features are obtained by GNN and LSTM respectively.
CasGCN [45] leverages graph convolutional learning method to combine nodes' structural feature with their relative temporal features to predict growth size of information cascades.
CCasGNN [46] learns user embeddings using a collaborative framework of GAT and GCN with stacking positional encoding into the layers of each GNN.It employs the multi-head attention mechanism to capture global relationships.

Evaluation metric
We utilize two most widely used evaluation metrics to assess the performance of our proposed model.
Mean squared log-transformed error (MSLE) is the most widely used metric for cascade prediction performance evaluation.It is calculated as: Mean absolute percentage error (MAPE) is a commonly used evaluation metric, which measures the average deviation between predicted and true cascade popularity.It is defined as: For both evaluation functions, N is the total number of information cascade, ∆y tp i is the true growth size and ŷcas is the predicted growth size.

Parameter settings
For implementing these baselines for Sina Weibo, we adopt the parameter settings applied in CasCN [3].We set the embedding dimension of users to 32, and the other parameters of each model are set to their default values.The detailed parameters of our own model PiGCN are set up as follows referring to model CasCN [3] and CCasGNN [46].In Structural Representation Capturing module, we employ two GCN layers with 32 units to capture latent spatial embedding of cascade.In Cascade Size Prediction module, there are two hidden layers of 32 units and one unit as the output layers.As for Dynamic Neural Network, the network is comprised of three hidden layers (chosen from 1, 3, 5, 7 layers) of 32 units each, one output units and ReLU as the activation function.We optimize the model by Adam algorithm.The learning rate is initialized as 0.005 and dropout is 0.5.For HEP-PH, we use the same parameter settings as we did for Sina Weibo dataset.The description of our proposed model is shown in table 3.In addition, we investigate the impact of the coefficient λ in the loss function.We test the model with different values of λ ranging from 0.5, 1.0, 1.5 to 2.0.And we have manually evaluated performance of our model to determine the optimal parameter.As shown in figure 5, generally, the experimental results for both MSLE and MAPE are best when λ = 1.Therefore, we choose λ = 1 as the parameter for our loss function.

Performance comparison
In this section, the result of our proposed model PiGCN is compared with the baselines on three datasets of cascade prediction scenarios, the retweet popularity of Sina Weibo.From table 4, we can draw the following observations.It should be noted that we adopt the best result of each baseline (see the section BaseLines), which use the same Weibo dataset as us.
Firstly, it can be seen that the performances of all models become better with growing cascade scales selected, which shows that larger cascade information make contribute to the performance but not much.
Secondly, compared with feature based models, it is shown that the baselines based on deep learning approaches like DeepCas, DeepHawkes,CasCN, CasGCN, CCasGNN and our proposed model outperform the methods using traditional handcrafted features, which demonstrates an obvious advantage of deep learning in growth size prediction.This is because deep leaning methods have an excellent ability to automatically learn effectiveness and high-dimension representations of users.
Thirdly, it is obvious that Feature-Linear method performs better than Feature-Deep methods in Sina Weibo dataset, which illustrates that only increasing the number of network layers does not lead to better performance.Besides, the methods with a GNN and adjacent information module such as CasCN, CasGCN, CCasGNN have a better performance than using DeepHawkes methods.This is because GNN have a strong ability to obtain structural features from graph data.
Fourthly, our proposed method PiGCN performs best both on MSLE and MAPE among all baselines on three different time periods of Sina Weibo and HEP-PH.Our model achieves MSLE at 3.486, 3.207 and Finally, under the same experimental conditions, CasCN creates a transition matrix and the time complexity is O(N 3 ), while CCasGCN produces two adjacency matrices whose space complexity is N(N 2 ).Our model only needs to create only one adjacency matrix with time complexity O(N), and the dynamic neural network only use partial derivative to obtain dynamic information with time complexity O(N) as well.

Ablation study
We perform ablation studies8 over the advanced performance of PiGCN.To better investigate the contribution of each module in PiGCN, we implement the following variants: PiGCN_noTime: to demonstrate the effectiveness of Dynamic Neural Network component, we delete this module and only remaining Structural Representation Capturing and Cascade Size Prediction component for popularity prediction.
PiGCN_noSpace: to delve into the effectiveness of spatial dynamics, We delete the derivative dŷ cas /dH sub and remain time information of the model.
PiGCN_LSTM: to explore the contribution of dynamic spreading status to the model compared with RNN, we replace dynamic neural network component with LSTM.
The experimental results are presented in table 5, we can observe that whole PiGCN has performed better than other variants.
The performance of PiGCN_noTime was worse than PiGCN_noSpace, which illustrates that temporal dynamic features have a greater contributing than spatial dynamic features in cascade prediction field.Moreover, the comparison between PiGCN_LSTM and PiGCN demonstrates that dynamic evolution and spreading status of cascade propagation at any time makes a greater contribution to better performance.Since LSTM only tackles with sequence data to obtain temporal features but fails to extract the dynamic changes contained in the sequence, our model PiGCN leverages derivatives of PDE and constructs a PDE-like function to explore potential dynamic changes of information cascade diffusion, which leads to a better performance for growth size prediction.
In summary, spreading status, such as propagation velocity and propagation depth, are critical features in PiGCN.A PDE-like network captures diffusion status at any time by its ability of derivative form, which are essential in the performance improvement.

Conclusion
In this paper, we propose a novel model called PiGCN for information cascade prediction, which takes structural, temporal features and dynamic spreading status into account to learn the representation of each information cascade graph.To obtain temporal features, we first divide the cascade graph into sub-cascade graph sequence according retweet time.Since GCN has the strong ability to tackle graph-structural data and learn interaction relationship between nodes, we use GCN and graph pooling to capture potential structural features.Furthermore, derivatives are able to extract the dynamic spreading status, such as propagation velocity and structure evolution, so we design a dynamic module using PDE-like network to extract the propagation dynamic representation of cascade graph.Finally, we conduct extensive experiments on real-world datasets, and the experimental results demonstrate that our proposed approach outperforms mainstream models for cascade prediction.In brief, PDEs can be solved via neural networks.Hence, we use a PDE-based diffusion model to simulate information cascade in complex networks.Conversely, our results also shows the causality of prediction.For the real world, from our results, we demonstrate that the information spreading speed (time) and the initial spreading graph (space) are significant for the cascade prediction.
Tpi of an information cascade can be denoted as ∆yTp i = |y T+Tp i | − |y T i |, when T p is large enough, |y T+Tp i| is regarded as the final cascade size.

Figure 2 .
Figure 2.An illustration of the information cascade graph C T i of source post p i .Here, U0 represents the user of source post, and the set U = {u i } includes the remaining users within the observation window of the cascade in the social network.We aim to predict the next user to be influence and the final number of users during the time interval Tp.

Figure 3 .
Figure 3.An overview of the proposed PiGCN framework employing a PDE-like penalization function and graph convolution network(GCN) for predicting cascade size growth.The framework consists of three major components: (a) Structural representation: this module firstly divides the global cascade graph into sequence sub-cascade graphs according to the observation time interval.Then taking each sub-cascade graph as input, leveraging two GCN layers to learn latent spatial features over time.(b) Cascade size prediction: this component takes advantage of both spatial and temporal features as input variables to predict the final cascade popularity size of information diffusion by utilizing a fully-connected neural network.(c) Dynamic neural network: this module uses physical properties of derivatives to compute the derivatives with respect to time and space, respectively.The module enables the extraction of dynamic representations, including the temporal propagation rate cast and the spatial evolutionary characteristics casx of the diffusion process.Then, a deep neural network framework is used to model the dynamics equation Π(t, x, cas, cast, casx) as spatial dynamics.(d) To enhance the robustness and improve the performance of prediction, loss function integrates spatial dynamics and temporal dynamics as inputs to formulate a PDE-like equation, subsequently incorporating it as a penalty term within the loss function during the training process to obtain optimal parameters.

Figure 4 .
Figure 4. Illustration of a sample cascade, its sub-cascade graph sequences and the corresponding adjacency matrix.(a) A cascade graph in observation time.The set U = u i is active users within observation windows of the cascade in social network.(b) The sequence of subcascade graphs G Ti is derived from the cascade graph based on time intervals.The source post U0 adds a self-connect to safeguard against the loss of its own information.(c) The corresponding adjacency matrix sequence A T i is obtained from sub-cascade graph.This matrix series enables us to capture local structure of the propagation and explore the relationship between users.A value of 1 indicates a connection between two users, while 0 indicates the absence of a retweet.

Figure 5 .
Figure 5.Under the metric of MSLE and MAPE, we conducted the performance comparison of our proposed model PiGCN on three Sina Weibo datasets with observation time of one hour, two hours and three hours.The evaluation included four values of coefficient λ in the loss function, ranging from 0.5, 1.0, 1.5 to 2.0.Through manual assessment, we determined the optimal parameter, and generally, the experimental results for both MSLE and MAPE were most favorable when λ = 1.

Table 1 .
Experimental results of cascade prediction on Sina Weibo.

Table 2 .
Basic statistics of our datasets.

Table 3 .
The description of PiGCN parameters.

Table 4 .
Experimental results of cascade prediction on Sina Weibo and HEP-PH.

Table 5 .
Prediction performances of variants..726and 0.421, while the MAPE values are 0.834,0.705and 0.623.This is because PiGCN is able to capture the dynamic evolution of cascade propagation and acquire information spreading status such as propagation velocity and structure evolution.However, the improvement effect of our proposed model on HEP-PH is not as obvious as on Sina Weibo.Overall, the comparison between our proposed model and other baselines demonstrates the effectiveness of incorporating the dynamic changes both on structural and temporal characteristic for popularity prediction.