Early Warning Analysis for Social Diffusion Events

There is considerable interest in developing predictive capabilities for social diffusion processes, for instance to permit early identification of emerging contentious situations, rapid detection of disease outbreaks, or accurate forecasting of the ultimate reach of potentially viral ideas or behaviors. This paper proposes a new approach to this predictive analytics problem, in which analysis of meso-scale network dynamics is leveraged to generate useful predictions for complex social phenomena. We begin by deriving a stochastic hybrid dynamical systems (S-HDS) model for diffusion processes taking place over social networks with realistic topologies; this modeling approach is inspired by recent work in biology demonstrating that S-HDS offer a useful mathematical formalism with which to represent complex, multi-scale biological network dynamics. We then perform formal stochastic reachability analysis with this S-HDS model and conclude that the outcomes of social diffusion processes may depend crucially upon the way the early dynamics of the process interacts with the underlying network's community structure and core-periphery structure. This theoretical finding provides the foundations for developing a machine learning algorithm that enables accurate early warning analysis for social diffusion events. The utility of the warning algorithm, and the power of network-based predictive metrics, are demonstrated through an empirical investigation of the propagation of political memes over social media networks. Additionally, we illustrate the potential of the approach for security informatics applications through case studies involving early warning analysis of large-scale protests events and politically-motivated cyber attacks.


Introduction
Understanding the way information, behaviors, innovations, and diseases propagate over social networks is of great importance in a wide variety of domains [e.g., [1][2][3][4], including national security [e.g., [5][6][7][8][9][10][11][12][13]. Of particular interest are predictive capabilities for social diffusion, for instance to enable early warning concerning the emergence of a violent conflict or outbreak of an epidemic. As a consequence, vast resources are devoted to the task of predicting the outcomes of diffusion processes, but the quality of such predictions is often poor. It is tempting to conclude that the problem is one of insufficient information. Clearly diffusion phenomena which "go viral" are qualitatively different from those that don't or they wouldn't be so dominant, the conventional wisdom goes, so in order to make good predictions we must collect enough data to allow these crucial differences to be identified.
Recent research calls into question this intuitively plausible premise and, indeed, indicates that intuition can be an unreliable guide to constructing successful prediction methods. For example, studies of the predictability of popular culture indicate that the intrinsic attributes commonly believed to be important when assessing the likelihood of adoption of cultural products, such as the quality of the product itself, do not possess much predictive power [14][15][16]. This research offers evidence that, when individuals are influenced by the actions of others, it may not be possible to obtain reliable predictions using methods which focus on intrinsics alone; instead, it may be necessary to incorporate aspects of social influence into the prediction process. Very recently a handful of investigations have shown the value of considering even simple and indirect measures of social influence, such as early social media "buzz", when forming predictions. This work has produced useful prediction algorithms for an array of social phenomena, including markets [16][17][18][19][20][21], political and social movements [17,22], mobilization and protest behavior [23,24], epidemics [17,25], social media dynamics [26,27], and the evolution of cyber threats [28].
Recognizing the importance of accounting for social influence, this paper proposes a predictive methodology which explicitly considers the way individuals influence one another through their social networks. It is expected that prediction algorithms which are based, in part, on network dynamics metrics will outperform existing methods and be applicable to a wider range of diffusion systems. We begin by developing a stochastic hybrid dynamical systems (S-HDS) model for diffusion processes taking place over social networks with realistic topologies. This modeling approach is inspired by recent work in biology demonstrating that S-HDS offer a useful mathematical formalism with which to represent multi-scale biological network dynamics [29][30][31][32][33]]. An S-HDS is a feedback interconnection of a discrete-state stochastic process, such as a Markov chain, with a family of continuous-state stochastic dynamical systems [34]. Combining discrete and continuous dynamics in this way provides a rigorous, expressive, and computationally-tractable framework for modeling the dynamics of the complex, highly-evolved networks that are ubiquitous in biological systems [35], and we show in this paper that the S-HDS framework is also well-suited to the task of modeling the network dynamics which underlie social diffusion. dure and the warning algorithm that is derived based on these results. A detailed mathematical presentation of the modeling and analysis methods is provided in Appendices One and Two.

Problem Formulation
The objective of this paper is to develop a scientifically-rigorous, practically-implementable methodology for performing early warning analysis for social diffusion events. Roughly speaking, we suppose that some "triggering event" has taken place or contentious issue is emerging, and we wish to determine, as early as possible, whether this event or issue will ultimately generate a large, self-sustaining reaction, involving the diffusion of discussions and actions through a substantial segment of a population, or will instead quickly dissipate. An illustrative example of the basic idea is provided by the contrasting reactions to 1.) the publication in September 2005 of cartoons depicting Mohammad in the Danish newspaper Jyllands-Posten, and 2.) the lecture given by Pope Benedict XVI in September 2006 quoting controversial material concerning Islam. While each event appeared at the outset to have the potential to trigger significant protests, the "Danish cartoons" incident ultimately led to substantial Muslim mobilization, including massive protests and considerable violence, while outrage triggered by the pope lecture quickly subsided with essentially no violence. It would obviously be very useful to have the capability to distinguish these two types of reaction as early in the event lifecycle as possible.
In order to state the early warning problem more precisely, we make a few assumptions: ▪ We suppose that the triggering event or emerging situation is given. Note that this is often the case in national security settings, and that additionally there exist techniques for discovering such events or issues in an automated or semi-automated manner [e.g., 24,27]. ▪ It is assumed that data are available which provide a view of the early reaction of a relevant population to the trigger or issue of interest. These data can be only indirectly related to the event; for example, in this paper the primary data source is social media discussions (e.g., blog posts) while the events of interest are "real-world" activities such as protests. ▪ It is expected that the "customer" for the analysis provides at least qualitative definitions of the population of interest and the scale of reaction for which a warning is desired. Thus, for instance, in the example above, it might be of interest to anticipate Muslim reaction to the triggering incident, and to obtain a warning alert if the reaction is likely to eventually include self-sustaining, violent protests.
We formulate the early warning problem as a classification task. More specifically, given a triggering incident, one or more information sources which reflect (perhaps indirectly) the reaction to this trigger by a population of interest (e.g., social media discussions, intelligence reporting), and a definition for what constitutes an "alarming" reaction, the goal is to design a classifier which accurately predicts, as early as possible, whether or not reaction to the event will ultimately become alarming. Note that a more mathematically precise statement of this warning problem is given in Appendix Two. Observe that this type of warning analysis is both important in applications and "easier" to accomplish than more standard prediction or forecasting goals.
Consider, as a familiar non-security example, the case of movie success. It is shown in [14][15][16] that it is likely to be impossible to predict movie revenues, even very roughly, based on the intrinsic information available concerning the movie ex ante (e.g., personnel, genre, critic reviews).

S-HDS Social Diffusion Model
In social diffusion, individuals are affected by what others do. This is easy to visualize in the case of disease transmission, with infections being passed from person to person. Information, innovations, behaviors, and so on can also propagate through a population, as individuals become aware of a new piece of information or an activity and are persuaded of its relevance and utility through their social and information networks. The dynamics of social diffusion can therefore depend upon the topological features of the pertinent networks, such as the presence of highly connected blogs in a social media network (see, e.g., [4]). Indeed, social scientists have developed extensive theories explaining the role of social networks in the dynamics of social diffusion and mobilization (see the books [2][3][4] and the references therein, and also Appendix One, for discussions of this work). This dependence suggests that, in order to understand the predictability of social diffusion phenomena and in particular to identify features which possess predictive power, it is necessary to conduct the analysis using social and information network models with realistic topologies.
The social diffusion models examined in this study possess networks with three topological properties that are ubiquitous in real-world social and information networks and which have the potential to impact diffusion dynamics [36]: ▪ transitivity -the property that the network neighbors of a given individual have a heightened probability of being connected to one another; ▪ community structure -the presence of densely connected groupings of individuals which have only relatively few links to other groups; ▪ core-periphery structure -the presence of a small group of "core" individuals which are densely connected to each other and are also close to the other individuals in the network.
Additionally, we permit our network models to possess right-skewed degree distributions, in which most individuals have only a few network neighbors while a few individuals have a great many neighbors, as such networks are common in online settings. The manner in which the communities and the core-periphery are arranged will be said to define the network's meso-scale structure. For convenience of exposition, the subsets of individuals specified by a partitioning of the network into communities and into a core and periphery will sometimes be referred to as the partition elements, and the collection of these (community and core-periphery) subsets will be called the network partition.
In order to deal effectively with networks possessing realistic topologies, and in particular to represent and analyze the way social dynamics is affected by the meso-scale structure, we model social diffusion in a manner which explicitly separates the individual, or "micro", dynamics from the collective dynamics. More specifically, we adopt a multi-scale modeling framework consisting of three network scales: ▪ a micro-scale, for modeling the behavior of individuals; ▪ a meso-scale, which represents the interaction dynamics of individuals within the same network partition element (community or core/periphery); ▪ a macro-scale, which characterizes the interaction between partition elements.
The micro-scale quantifies the way individuals combine their own inherent preferences or attributes with the influences of others to arrive at their chosen courses of action. It is shown in Appendix One that separating the micro-scale dynamics from the meso-and macro-scale activity permits the dependence of this decision-making process on the social network to be characterized in a surprisingly straightforward way. The meso-and macro-scale components of the proposed modeling framework together quantify the way the decision-making processes of individuals interact to produce collective behavior at the population level. The role of the meso-scale model is to quantify and illuminate the manner in which behaviors within each network partition element (communities, core or periphery), while the macro-scale model captures the interactions between these elements. The primary assumptions are that interactions between individuals belonging to the same network partition element can be modeled more simply than those between individuals from distinct partition elements, and that the latter interactions are constrained by the "meta-network" which defines the dependencies between the partition elements.
This perspective offers a number of advantages. For example, at the micro-scale it is possible to unify behaviors which appear different phenomenologically but actually possess equivalent dynamics. We show in Appendix One that the social dynamics associated with classical "utility-maximizing" behavior and those arising from individuals attempting to infer information by observing the actions of others can be represented with the same micro-scale model. Addi-tionally, separating the individual and collective dynamics supports efficient and flexible model building and simplifies the process of estimating model components from empirical data [39].
Dividing the collective dynamics into meso-and macro-scales also provides a mathematicallytractable, sociologically-sensible means of representing complex social network dynamics. For instance, because network communities are topological structures corresponding to localized social settings in the real world, determined by workplace, family, physical neighborhood, and so on, it is natural both mathematically and sociologically to model the interactions of individuals within communities as qualitatively different (e.g., more frequent and homogeneous) than those between communities.
Developing a mathematically-rigorous, expressive, scalable, and computationally-tractable framework within which multi-scale social network diffusion models can be constructed is, of course, a challenging undertaking. Recent work in systems biology has demonstrated that stochastic hybrid dynamical systems (S-HDS) provide a useful mathematical formalism with which to represent biological network dynamics that possess multiple temporal and spatial scales [29][30][31][32][33]]. An S-HDS is a feedback interconnection of a discrete-state stochastic process, such as a Markov chain, with a family of continuous-state stochastic dynamical systems [34]. Thus the discrete system dynamics depends on the continuous system state, perhaps because different regions of the continuous state space are associated with different matrices of Markov state transition probabilities, and the particular continuous system which is "active" at a given time depends on the discrete system state. Combining discrete and continuous dynamics in this way provides an effective framework for modeling the dynamics of the complex, highly-evolved networks that are ubiquitous in biological systems [35]. For example, the rigorous yet tractable integration of switching behavior with continuous dynamics enabled by the S-HDS model allows accurate and efficient representation of biological phenomena evolving over disparate temporal scales [29][30][31] and spatial scales [32,33].
Inspired by this work, in this paper we apply the S-HDS framework to social diffusion dynamics evolving over multiple network scales. Appendix One provides a detailed discussion of the proposed S-HDS social diffusion model and demonstrates the effectiveness with which this formalism captures multi-scale network dynamics. As an intuitive illustration of the way S-HDS enable complex network phenomena to be efficiently represented, consider the task of modeling diffusion on a network that possesses community structure. As shown in Figure 1, this diffusion consists of two components: 1.) intra-community dynamics, involving frequent interactions between individuals within the same community and the resulting gradual change in the concentrations of "infected" (red) individuals, and 2.) inter-community dynamics, in which the "infection" jumps from one community to another, for instance because an infected individual "visits" a new community. S-HDS models offer a natural framework for representing these dynamics, with the S-HDS continuous system modeling the intra-community dynamics (e.g., via stochastic differential equations), the discrete system capturing the inter-community dynamics (e.g., using a Markov chain), and the interplay between these dynamics being represented by the S-HDS feedback structure. A detailed description of the manner in which S-HDS models can be used to capture social diffusion on networks with realistic topologies is given in Appendix One.

Predictability Assessment
One hallmark of social diffusion processes is their ostensible unpredictability: phenomena from hits and flops in cultural markets to financial system bubbles and crashes to political upheavals appear resistant to predictive analysis (although there is no shortage of ex post explanations for their occurrence!). It is not difficult to gain an intuitive understanding of the basis for this unpredictability. Individual preferences and susceptibilities are mapped to collective outcomes through an intricate, dynamical process in which people react individually to an environment consisting largely of others who are reacting likewise. Because of this feedback dynamics, the collective outcome can be quite different from one implied by a simple aggregation of indi- Modeling diffusion on networks with community structure via S-HDS. The cartoon at top left depicts a network with three communities. The cartoon at right illustrates diffusion within a community k and between communities i and j. The schematic at bottom left shows the basic S-HDS feedback structure; the discrete and continuous systems in this framework model the inter-community and intra-community diffusion dynamics, respectively. This section provides a brief, intuitive introduction to a systematic approach to assessing the predictability of social diffusion processes and identifying process observables which have exploitable predictive power (see Appendix Two, and also [17,39], for the mathematical details).
Consider a simple model for product adoption, in which individuals combine their own preferences and opinions regarding the available options with their observations of the actions of others to arrive at their decisions about which product to adopt. As discussed above, it can be quite difficult to determine which characteristics of the process by which adoption decisions propagate, if any, are predictive of things like the speed or ultimate reach of the propagation [15][16][17].
In Appendix Two we propose a mathematically rigorous approach to predictability assessment which, among other things, permits identification of features of social dynamics which should have predictive power. We now summarize this assessment methodology.
The basic idea behind the proposed approach to predictability analysis is simple and natural: we assess predictability by answering questions about the reachability of diffusion events. To obtain a mathematical formulation of this strategy, the behavior about which predictions are to be made is used to define the system state space subsets of interest (SSI), while the particular set of candidate measurables under consideration allows identification of the candidate starting set Roughly speaking, the proposed approach to predictability assessment involves determining how probable it is to reach the SSI from a CSS and deciding if these reachability properties are compatible with the prediction goals. If a system's reachability characteristics are incompatible with the given prediction question -if, say, "hit" and "flop" states in the online market example are both fairly likely to be reached from the CSS -then the situation is deemed unpredictable.
This setup permits the identification of candidate predictive measurables: these are the measur-able states and/or parameters for which predictability is most sensitive (see Appendix Two).
Continuing with the online market example, if trajectories with positive early market share rates r(A) are much more likely to yield market share dominance for A than are trajectories with negative early r(A), then the situation is unpredictable (because the outcome depends sensitively on r(A) and this quantity is not measured). Moreover, this analysis suggests that market share rate is likely to possess predictive power, so it may be possible to increase predictability by adding the capacity to measure this quantity.
A key element of this approach to predictability assessment is the proposed method of estimating the probability of reaching the SSI from a CSS. Note that in a typical assessment such estimates must be computed for several CSS in order to adequately explore the space of candidate predictive features, so that it is crucial to perform these estimates efficiently. In Appendix Two we develop an "altitude function" approach to this reachability problem, in which we seek a scalar function of the system state that permits conclusions to be made regarding reachability without computing system trajectories. We refer to these as altitude functions to provide an intuitive sense of their analytic role: if some measure of "altitude" is low on the CSS and high on an SSI, and if the expected rate of change of altitude along system trajectories is nonincreasing, then it is unlikely for trajectories to reach this SSI from the CSS. Moreover, the difference in altitudes between the CSS and SSI gives a measure of the probability of reaching the latter from the former. Because the reach probability is computed for sets of states without simulating system trajectories, the altitude function method offers an extremely efficient way to explore the space of candidate predictive features.
We have applied the predictability assessment methodology summarized above to the social diffusion prediction problem, and we now summarize the main conclusions of this study; a more complete discussion of this investigation is given in Appendix Two. The analysis uses the mathematically rigorous predictability assessment procedure summarized above, in combination with empirically-grounded S-HDS models for social dynamics, to characterize the predictability of social diffusion on networks with realistic degree distributions, transitivity, community structure, and core-periphery structure. The main finding of the study, from the perspective of the present paper, is that the predictability of these diffusion models depends crucially upon social and information network topology, and in particular on the community and core-periphery structures of these networks.
In order to describe these theoretical results more quantitatively and leverage them for prediction, it is necessary to specify mathematical definitions for network communities and coreperiphery structure. There exist several qualitative and quantitative definitions for the concept of community structure in networks. Here we adopt the modularity-based definition proposed in [40], whereby a good partitioning of a network's vertices into communities is one for which the number of edges between putative communities is smaller than would be expected in a random partitioning. To be concrete, a modularity-based partitioning of a network into two communities maximizes the modularity Q, defined as where m is the total number of edges in the network, the partition is specified with the elements of vector s by setting s i  1 if vertex i belongs to community 1 and s i  1 if it belongs to community 2, and the matrix B has elements B ij  A ij  k i k j / 2m, with A ij and k i denoting the network adjacency matrix and degree of vertex i, respectively. Partitions of the network into more than two communities can be constructed recursively [40]. Note that modularity-based community partitions can be efficiently computed for large social networks, and can be constructed even with incomplete network topology data [39].
With this definition in hand, we are in a position to present the first candidate predictive feature nominated by the theoretical predictability assessment: the presence of early diffusion activity in numerous distinct network communities should be a reliable predictor that the ultimate reach of the diffusion will be large (see Appendix Two). In what follows, propagation dynamics which possess this characteristic will be said to exhibit significant early dispersion across network communities. Note that this measure should be more predictive than the early volume of diffusion activity (the latter has recently become a fairly standard measure [e.g., 19,20]). A cartoon illustrating the basic idea behind this result is given in Figure 2. Early dispersion across communities is predictive. The cartoon illustrates the predictive feature associated with community structure: social diffusion initiated with five "seed" individuals is much more likely to propagate widely if these seeds are dispersed across three communities (left) rather than concentrated within a single community (right). Note that in Appendix Two this result is established for networks of realistic scale and not simply for "toy" networks like the one shown here.
Analogously to the situation with network communities, there exists a wide range of qualitative and quantitative descriptions of the core-periphery structure found in real-world networks.
Here we adopt the characterization of network core-periphery which results from k-shell decomposition, a well-established technique in graph theory that is summarized in, for instance, [41].
To partition a network into its k-shells, one first removes all vertices with degree one, repeating this step if necessary until all remaining vertices have degree two or higher; the removed vertices constitute the 1-shell. Continuing in the same way, all vertices with degree two (or less) are recursively removed, creating the 2-shell. This process is repeated until all vertices have been assigned to a k-shell. The shell with the highest index, the k max -shell, is deemed to be the core of the network.
Given this definition, we are in a position to report the second candidate predictive feature nominated by our theoretical predictability assessment: early diffusion activity within the network k max -shell should be a reliable predictor that the ultimate reach of the diffusion will be significant (see Appendix Two). In particular, this measure should be more predictive than the early volume of diffusion activity. An intuitive illustration of this result is depicted in Figure 3.

Early Warning Method
We are now in a position to present an early warning method which is capable of accurately predicting, very early in the lifecycle of a diffusion process of interest, whether or not the process will propagate widely. We adopt a machine learning-based classification approach to this prob- Figure 3. Early diffusion within the core is predictive. The cartoon illustrates the predictive feature associated with k-shell structure: social diffusion initiated with three "seed" individuals is much more likely to propagate widely if these seeds reside within the network's core (left) rather than at its periphery (right). Note that in Appendix Two this result is established for networks of realistic scale and not simply for "toy" networks like the one shown here.
lem: given a triggering incident, one or more information sources which reflect the reaction to this trigger by a population of interest, and a definition for what constitutes an "alarming" reaction, the goal is to learn classifier that accurately predicts, as early as possible, whether or not reaction to the event will ultimately become alarming. The classifier used in the empirical studies described in this paper is the Avatar ensembles of decision trees (A-EDT) algorithm [42]. Other classification algorithm were also explored to allow the robustness of the proposed early warning approach to be evaluated, and these alternative methods produced qualitatively similar results [39]. Prediction accuracy in all tests is estimated using standard N-fold cross-validation, in which the set of diffusion events of interest is randomly partitioned into N subsets of equal size, and the A-EDT algorithm is successively "trained" on N1 of the subsets and "tested" on the held-out subset in such a way that each of the N subsets is used as the test set exactly once.
A key aspect of the proposed approach to early warning analysis is determining which characteristics of the social diffusion event of interest, if any, possess exploitable predictive power.
We consider three classes of features: ▪ intrinsics-based features -measures of the inherent properties and attributes of the "object" being diffused; ▪ simple dynamics-based features -metrics which capturing simple properties of the diffusion dynamics, such as the early extent of the diffusion and the rate at which the diffusion is propagating; ▪ network dynamics-based features -measures that characterize the way the early diffusion is progressing relative to topological properties of the underlying social and information networks (e.g., community structure).
Consider, as an illustrative example, the diffusion of "memes", that is, short textual phrases which propagate relatively unchanged online (e.g., 'lipstick on a pig'). Suppose it is of interest to predict which memes will "go viral", appearing in thousands of blog posts, and which will not.
In this case, intrinsic-based features could include language measures, such as the sentiment or emotion expressed in the text surrounding the memes in blog posts or news articles. Simple dynamics-based features for memes might measure the cumulative number of posts or articles mentioning the meme of interest at some early time  and the rate at which this volume is increasing. The proposed approach to early warning analysis is to collect features from these classes for the event of interest, input the feature values to the (trained) A-EDT classifier, and then run the classifier to generate the warning prediction (i.e., a forecast that the event is expected to become 'alarming' or remain 'not alarming'). In the algorithm presented below this procedure in specified in general terms; more specific instantiations of the procedure are presented in the discussions of the three case studies in Section 3. In what follows it is assumed that the primary source of information concerning the event of interest is social media, as that is emerging as a very useful data source for predictive analysis [e.g., [17][18][19][20][21][22][23][24]26,27]. However, the analytic process is quite similar when other data sources (e.g., intelligence reporting) are employed [24].
Thus we have the following early warning algorithm:

Algorithm EW
Given: a triggering incident, a definition for what constitutes an 'alarming' reaction, and a set of social media sites (e.g., blogs) B which are relevant to early warning task.
Initialization: train the A-EDT classifier on a set of events which are qualitatively similar to the triggering event of interest and are labeled as 'alarming' or 'not alarming' according to the definition given above (see the case study discussions for additional details on this training process). Procedure: 1. Assemble a lexicon of keywords L that pertain to the triggering event under study.

Conduct a sequence of blog graph crawls and construct a time series of blog graphs G B (t).
For the lexicon L and each time period t, label each blog in G B (t) as 'active' if it contains a post mentioning any of the keywords in L and 'inactive' otherwise.
3. Form the union G B =  t G B (t), partition G B into network communities and into k-shells, and map the partition element structure of G B back to each of the graphs G B (t). 4. Compute the values of appropriate measures for the intrinsics, simple dynamics, and network dynamics features for each of the graphs G B (t). We now offer additional details concerning this procedure; more application-specific discussions of the methodology are provided in the case studies in Section 3. Identifying appropriate keywords in Step 1 can be accomplished with the help of subject matter experts and also through various automated means (e.g., via meme analysis [38,27]). Step 2 is by now standard, and various tools exist which can perform these tasks [e.g., 43]. In Step 3, blog network communities are identified with a modularity-based community extraction algorithm applied to the blog graph [40], while the decomposition of the graph into its k-shells is achieved through standard methods [41]. The particular choices of metrics for the intrinsics, simple dynamics, and network dynamics features computed in Step 4 tend to be problem specific, and typical examples are given in the case studies below. It is worth noting, however, that we have found it useful in a range of applications to quantify the dispersion of activity over the communities of G B (t) using a blog entropy measure BE:

Apply the
where f i (t) is the fraction of total posts containing one or more keywords and made during interval t which occur in community i. Finally, in Step 5 the feature values obtained in Step 4 serve as inputs to the A-EDT classifier and the output is used to decide whether an alert should be issued.

Case Studies
This section applies Algorithm EW to three early warning case studies involving social phenomena that have proved to be both practically important and challenging to analyze: 1.) diffusion of information through social media, 2.) mobilization/protest events response to "triggering" incidents, and 3.) planning/coordination/execution of politically-motivated cyber attacks.

Case Study One: Meme Diffusion
The goal of this case study is to apply Algorithm EW to the task of predicting whether or not a given "meme", that is, a short textual phrase which propagates relatively unchanged online, will "go viral". Our main source of data on meme dynamics is the publicly available datasets archived at http://memetracker.org [44] by the authors of [38]. Briefly, the archive [44] contains time series data characterizing the diffusion of ~70 000 memes through social media and other online sites during the five month period between 1 August and 31 December 2008. We are interested in using Algorithm EW to distinguish successful and unsuccessful memes early in their lifecycle. More precisely, the task of interest is to classify memes into two groups -those which will ultimately be successful (acquire more than S posts) and those that will be unsuccessful (attract fewer than U posts) -very early in the meme lifecycle.
To support an empirical evaluation of the utility of Algorithm EW for this problems, we downloaded from [44] the time series data for slightly more than 70 000 memes. These data contain, for each meme M, a sequence of pairs (t 1 , URL 1 ) M , (t 2 , URL 2 ) M , …, (t T , URL T ) M , where t k is the time of appearance of the kth blog post or news article that contains at least one mention of meme M, URL k is the URL of the blog or news site on which that post/article was published, and T is the total number of posts that mention meme M. From this set of time series we randomly selected 100 "successful" meme trajectories, defined as those corresponding to memes which attracted at least 1000 posts during their lifetimes, and 100 "unsuccessful" meme trajectories, defined as those whose memes acquired no more than 100 total posts. It is worth noting that, in assembling the data in [44], all memes which received fewer than 15 total posts were deleted, and that ~50% of the remaining memes have Recall that Algorithm EW employs three types of features: intrinsics-based, simple dynamics-based, and network dynamics-based. We now describe the instantiation of each of these feature classes for the meme problem. Consider first the intrinsics-based features, which for the meme application become language-based measures. Each "document" of text surrounding a meme in its (sample) posts is represented by a simple "bag of words" feature vector x |V| , where the entries of x are the frequencies with which the words in the vocabulary set V appear in the document. A very simple way to quantify the sentiment or emotion of a document is through the use of appropriate lexicons. Let s |V| denote a lexicon vector, in which each entry of s is a numerical "score" quantifying the sentiment/emotion intensity of the corresponding word in the vocabulary V. The aggregate sentiment/emotion score of document x can be computed as where 1 is a vector of ones. Thus score(.) estimates the sentiment or emotion of a document as a weighted average of the sentiment or emotion scores for the words comprising the document.
(Note that if no sentiment or emotion information is available for a particular word in V then the corresponding entry of s is set to zero.) To characterize the emotion content of a document we use the Affective Norms for English Words (ANEW) lexicon, which consists of 1034 words that were assigned numerical scores with respect to three emotional "axes" -happiness, arousal, and dominance -by human subjects [45].
Previous work had identified this set of words to bear meaningful emotional content [45]. Positive or negative sentiment is quantified by employing the "IBM lexicon", a collection of 2968 words that were assigned {positive, negative} sentiment labels by human subjects [46]. This simple approach generates four language features for each meme: the happiness, arousal, dominance, and positive/negative sentiment of the text surrounding that meme in the (sample) posts containing it. As a preliminary test, we computed the mean emotion and sentiment of content surrounding the 100 successful and 100 unsuccessful memes in our dataset. On average the text surrounding successful memes is happier, more active, more dominant, and more positive than that surrounding unsuccessful memes, and this difference is statistically significant (p0.0001).
Thus it is at least plausible that these four language features may possess some predictive power regarding meme success. Here we adopt a simple finite difference definition for post rate given by post rate()  (#posts()  #posts(/2)) / (/2); of course, more robust rate estimates could be used.
The simple dynamics-based measures of early meme diffusion defined above, while potentially useful, do not characterize the manner in which a meme propagates over the underlying social or information networks. Recall that the predictability assessment summarized in Section 2.3 suggests that both early dispersion of diffusion activity across network communities and early diffusion activity within the network core ought to be predictive of meme success. The insights offered by this theoretical analysis motivate the definition of two network dynamics-based features for meme prediction: ▪ community dispersion() -the cumulative number of network communities in the blog graph G B that, by time , contain at least one post which mentions the meme; ▪ #k-core blogs() -the cumulative number of blogs in the k max -shell of blog graph G B that, by time , contain at least one post which mentions the meme.
These quantities can be efficiently computed using fast algorithms for partitioning a graph into its communities and for identifying a graph's k max -shell [39]. Thus these features are readily computable even for very large graphs.
We now summarize the results of this case study. First, using only the four language features with the A-EDT classifier to predict which memes will be successful yields a prediction accuracy of 66.5% (ten-fold cross-validation). Since simply guessing "successful" for all memes gives an accuracy of 50%, it can be seen that these simple language intrinsics are not very predictive. For completeness it is mentioned that the ANEW score for "arousal" and the IBM measure of sentiment are the most predictive of these four features. In contrast, the features characterizing the early network dynamics of memes possess significant predictive power, and in fact are useful even if only very limited early time series is available for use in prediction. More quantitatively, applying Algorithm EW with the four meme dynamics features produces the following results (ten-fold cross-validation): ▪   12hr, accuracy = 84%, most predictive features: 1.) community dispersion, 2.) #k-core blogs, 3.) #posts; ▪   24hr, accuracy = 92%, most predictive features: 1.) community dispersion, 2.) post rate, 3.) #posts; ▪   48hr, accuracy = 94%, most predictive features: 1.) community dispersion, 2.) post rate, 3.) #posts.
These results show that useful predictions can be obtained within the first twelve hours after a meme is detected (this corresponds to 0.5% of the average meme lifespan), and that accurate prediction is possible after about a day or two. Note also that, as has been found with other social dynamics phenomena [e.g., [16][17][18], dynamics features appear to be more predictive than "intrinsics", at least for the features employed here.
It is worth mentioning that the fact that a particular meme goes viral does not imply that it will influence behavior in the real world. The next two case studies focus on the important issue of behavioral consequences of information diffusion.

Case Study Two: Mobilization and Protest
There is considerable interest to develop methods for distinguishing successful mobilization and protest events, that is, mobilizations that become large and self-sustaining, from unsuccessful ones early in their lifecycle. It is natural to pose this question as an early warning problem and to approach it using Algorithm EW. In order to examine the efficacy of this approach, we collected together fourteen recent events, each of which appeared at the outset to have the potential to trigger significant protests. This set of events contains seven triggering incidents which ultimately led to substantial mobilization, including massive protests and significant violence, and seven triggers with reactions that subsided quickly with essentially no violence. Taken together, these events provide a useful setting for testing the applicability of Algorithm EW to mobilization/protest phenomena.
The events employed in this study are listed below.  To examine this possibility more carefully, we applied Algorithm EW to the task of distinguishing triggers which led to large protests from those that did not. For simplicity, in this case study we did not use any intrinsics-based features (e.g., language metrics) in the A-EDT classi-fier, and instead relied upon the four dynamics-based features defined in Case Study One. In the case of the seven triggering events which led to protest behavior, the blog data made available to Algorithm EW was limited to posts made during the eight week period which ended two weeks before the protests began. For the seven triggers which did not lead to protests, the blog data included all posts collected during the eight week period immediately following the triggering event.
Because the set of events in this case study included only fourteen incidents, we applied Algorithm EW with two-fold cross-validation. More specifically, the set of incidents was randomly partitioned into two equal subsets, the algorithm was trained on one subset of seven incidents and tested on the other subset, and then the roles of the two data sets were switched. In this evaluation Algorithm EW achieved perfect accuracy, correctly distinguishing the 'protest' and 'nonprotest' triggers. An examination of the predictive power of the four features used as inputs to the A-EDT classifier reveals that, as suggested by Figure 4, the community dispersion feature was the most predictive measure.

Case Study Three: Cyber Attack Early Warning
This case study explores the ability of Algorithm EW to provide reliable early warning for politically-motivated distributed denial-of-service (DDoS) attacks. Toward this end, we first identified a set of Internet "disturbances" that included examples from three distinct classes of events: In each plot, the red curve is blog volume and the blue curve is blog entropy; the Danish cartoon plot also shows two measures of violence (cyan and magenta curves). Note that while the volume and violence data are scaled to allow multiple data sets to be graphed on each plot, the scale for entropy is consistent across plots to enable cross-event comparison. 1. successful politically-motivated DDoS attacks -these are the events for which Algorithm EW is intended to give warning with sufficient lead time to allow mitigating actions to be taken; 2. natural events which disrupt Internet service -these are disturbances, such as earthquakes and electric power outages, that impact the Internet but for which it is known that no early warning signal exists in social media; 3. quiet periods -these are periods during which there is social media "chatter" concerning impending DDoS attacks but ultimately no (successful) attacks occurred.
Including in the case study events selected from these three classes is intended to afford a fairly comprehensive test of Algorithm EW. For instance, these classes correspond to 1.) the domain of interest (DDoS attacks), 2.) a set of disruptions which impact the Internet but have no social media warning signal, and 3.) a set of "non-events" which do not impact the Internet but do possess putative social media warning signals (online discussion of DDoS attacks).
We selected twenty events from these three classes: Politically-motivated DDoS attacks: For brevity a detailed discussion of these twenty events is not given here; the interested reader is referred to [39] and the references therein for additional information on these disruptions.
We collected two forms of data for each of the twenty events: cyber data and social data.
The cyber data consist of time series of routing updates which were issued by Internet routers during a one month period surrounding each event. More precisely, these data are the Border Gateway Protocol (BGP) routing updates exchanged between gateway hosts in the Autonomous System network of the Internet. The data was downloaded from the publicly-accessible RIPE collection site [47] using the process described in [48] (see [48] for additional details and background information on BGP routing dynamics). The temporal evolution of the volume of BGP routing updates (e.g., withdrawal messages) gives a coarse-grained measure of the timing and magnitude of large Internet disruptions and thus offers a simple and objective way to characterize the impact of each of the events in our collection. The social data consist of time series of social media mentions of cyber attack-related keywords and memes detected during a one month period surrounding each of the twenty events. These data were collected using the procedure specified in Algorithm EW.
As in the preceding case study, we performed a preliminary examination of the possibility to obtain useful early warning indicators from analysis of social media discussions by completing  Figure 5) experiences a dramatic increase several days before the event, while in the case of the Japan earthquake blog entropy is small for the entire collection period. Similar social media behavior is observed for all events in the case study, suggesting that network dynamics-based features, such as dispersion of discussions across blog network communities, may be a useful early indicator for large mobilization events.
To examine this possibility more carefully, we applied Algorithm EW to the task of distinguishing the seven DDoS attacks from the thirteen other events in the set. For simplicity, in this case study we did not use any intrinsics-based features (e.g., language metrics) in the A-EDT classifier, and instead relied upon the four dynamics-based features defined in Case Study One.
Because the set of events in this case study included only twenty incidents, we applied Algorithm EW with two-fold cross-validation, exactly as described in Case Study Two. In the case of DDoS events, the blog data made available to Algorithm EW was limited to posts made during the five week period which ended one week before the attack. For the six natural disturbances, the blog data included all posts collected during the six week period immediately prior to the event, while in the case of the seven non-events, the blog data included the posts collected during a six week interval which spanned discussions of DDoS attacks on U.S. government agencies. In this evaluation, Algorithm EW achieved perfect accuracy, correctly distinguishing the 'attack' and 'non-attack' events. If the test is made more difficult, so that the blog data made available to Algorithm EW for attack events is limited to a four week period that ends two weeks before the attack, the proposed approach still achieves 95% accuracy, An examination of the predictive power of the four features used as inputs to the A-EDT classifier reveals that, as suggested by Figure 5, the community dispersion feature was the most predictive measure. It is worth emphasizing that, in this case study, accurately distinguishing 'attack' from 'non-attack' events is equivalent to providing practically-useful early warning for attack events, because the data which serves as input to Algorithm EW reflects online discussions that took place prior to the events under investigation.

Conclusions
This paper presents a new approach to early warning analysis for social diffusion events. We begin by introducing a biologically-inspired S-HDS model for social dynamics on multi-scale networks, and then perform stochastic reachability analysis with this model to show that the outcomes of social diffusion processes may depend crucially upon the way the early dynamics of the process interacts with the underlying network's meso-scale topological structures. This theoretical finding provides the foundations for developing a machine learning algorithm that enables accurate early warning analysis for diffusion events. The utility of the warning algorithm, and the power of network-based predictive metrics, are demonstrated through empirical case studies involving meme propagation, large-scale protests events, and politically-motivated cyber attacks.

A1. Appendix One: S-HDS Social Diffusion Model
In this Appendix we propose a multi-scale structure for modeling social network dynamics, establish a few facts concerning this representation, and introduce an S-HDS formulation of the model that is well-suited for predictive analysis.

A1.1 Multi-Scale Social Dynamics Model
In many social situations, people are influenced by the behavior of others, for instance be- We model PEP in a manner which explicitly separates the individual, or "micro", dynamics from the collective dynamics. More specifically, we adopt a modeling framework consisting of three modeling scales: ▪ a micro-scale, for modeling the behavior of individuals; ▪ a meso-scale, which represents the interaction dynamics of individuals within the same network partition element (community or core/periphery); ▪ a macro-scale, which characterizes the interaction between partition elements. communities. The primary assumption is that interactions between individuals within social network communities can be modeled as "fully-mixed" -all pairwise interactions between individuals within a network community are equally likely -while interactions between communities are constrained by the network defining the relationships between the communities. We argue below that this assumption is reasonable and useful.
One advantage of identifying a scale at which agent interaction is (approximately) homogeneous is that this enables the leveraging of an extensive literature on collective dynamics. To be concrete, we derive two examples. Consider first the social movement model proposed in [49,50]. In this model, each individual can be in one of three states: member (of the movement), potential member, and ex-member. Individuals interact in a fully-mixed way, with each interaction between a potential member and a member resulting in the potential member becoming a member with probability , and each interaction between a member and an ex-member resulting in the member becoming an ex-member with probability  1 ; members also "spontaneously" become ex-members with probability  2 . The connection between this representation and standard epidemiological models [1] is clear.
Under the assumption of fully-mixed interactions at the meso-scale, standard manipulations yield the following representation for the social dynamics within network communities: where P, M, and E denote the fractions of potential members, members, and ex-members in the community population, ,  1 , and  2 are nonnegative constants related to the probabilities ,  1 , and  2  defined above, and the  i (t) are appropriate random processes [e.g., 17]. The deterministic version of this basic model (i.e., with  1 (t) 2 (t) 3 (t)0) is discussed by Hedstrom and coauthors in [49,50], and therefore we denote the model  H . The deterministic version is shown in [49] to provide a useful description for the local growth of a real world social movement.
The second example incorporates the fact that innovations often have both enthusiasts and skeptics, each of whom may actively attempt to recruit the uncommitted. The model  H can be modified to account for this competition in recruitment: where P and E denote the fractions of potential members and ex-members, as before, M 1  The meso-scale model describes the way individual agent decision functions interact to produce collective behavior within social network communities. Individuals also interact with people from other communities, of course, and receive information from channels that transmit to many communities simultaneously (e.g., mass media). These inter-community interactions and "global" social signals are quantified at the macro-scale level of the multi-scale modeling framework. The basic idea is simple and natural: we model interdependence between social network communities with a graph G sc = {V sc , E sc }, where V sc and E sc are the vertex and edge sets, respectively, |V sc | = K, each vertex v  V sc is a community, and each directed edge e = (v,v)  E sc represents a potential inter-community interaction. More specifically, an edge (v,v) indicates that an agent in community v can receive decision-relevant information from one in community v. The way agents act upon this information is specified by their decision functions r i . The broadcast of global social signals to individuals is modeled as a community-dependent input u v to each individual in community v. Thus G sc and the u v define the macro-scale model structure.
A key task in deriving a macro-scale model is specifying the topology of G sc , as this graph encodes the social network structure for the phenomenon of interest. The most direct approach to constructing G sc is to infer communities directly from social network data, by partitioning the network so as to maximizing the graph modularity Q m . The main challenge with this method for building social community graphs is obtaining the requisite social network data. While this task is certainly nontrivial, availability of such data has increased dramatically over the past decade.
For instance, social relationships and interactions increasingly leave "fingerprints" in electronic databases (e.g., communication via email and cell phones, financial transactions), making convenient the acquisition, manipulation, storage, and analysis of these records [e.g., 4].
Alternatively, demographics data can sometimes be used to define both the communities themselves (e.g., families, physical neighborhoods) and their proximity. The basic idea is familiar: individuals belong to social groups, which in turn belong to "groups of groups", and so on, giving rise to a hierarchical organization of communities. For instance, in academics, research groups often belong to academic departments, which are organized into colleges, which in turn form universities, and so on. The proximity of two communities is specified by their relationship within the hierarchy, and this distance defines the likelihood that individuals from the two communities will interact. The probability of inter-community interaction, in turn, can be used to define the network community graph G sc [39].

A1.2 S-HDS Model Formulation
We now show that the stochastic hybrid dynamical system formalism provides a rigorous, tractable, and expressive framework within which to represent multi-scale social dynamics models. Consider the following where qQ is the discrete state, xX n is the continuous state, p p is a vector of system parameters, {f q } and {G q } are sets of vector and matrix fields characterizing the continuous system dynamics, w is an m-valued Weiner process, and (x) is the matrix of (x-dependent) Markov chain transition rates; the entries of (x) satisfy  qq (x)  0 if q  q and  q  qq (x) = 0 q, and are related to the standard Markov state transition probabilities as follows [e.g., 34]: A general discussion of S-HDS theory and applications is beyond the scope of this paper and may be found in, for instance, [34] and the references therein. We now develop an S-HDS representation for multi-scale social diffusion processes. It is assumed that: ▪ the social system consists of N individuals distributed over K network communities; ▪ individuals can influence each other via positive externalities; ▪ intra-community interactions are fully-mixed; ▪ inter-community interactions involve the (possibly temporary) migration of individuals from one community to another.
The phenomenon of interest is the diffusion of innovations, in which an innovation of some kind (e.g., a new technology or idea) is introduced into a social system, and individuals may learn about the innovation from others and decide to adopt it [e.g., 2]. By definition an innovation is "new", and therefore it is supposed that initially only a few of the network communities have been exposed to it. An important task in applications is to be able to characterize the likelihood that the innovation will spread to a significant fraction of the population [17]. We model social diffusion as follows: where ▪ G sc = {V sc , E sc } is the social network community graph; ▪ QX is the system state set, with Q and X   n denoting the (finite) discrete and (bounded) continuous state sets, respectively; ▪ {f q (x),G q (x),H q (x)} qQ , Par, W, U is the S-HDS continuous system, a family of stochastic differential equations which characterizes the intra-community dynamics via vector field/ matrix families {f q },{G q },{H q }, system parameter vector pPar p , and system inputs wW m , uU r ; ▪ {Q, (x)} is the S-HDS discrete system, a continuous-time Markov chain which defines inter-community interactions via state set Q and transition rate matrix (x).
The social community graph G sc defines the feasible community-community innovation diffusion pathways: if (v,v)  E sc then it is not possible for the innovation to spread directly from community v to community v. The discrete state set Q = {0,1} K specifies which communities contain at least one adopter of the innovation by labeling such communities with a '1' (and a '0' otherwise). Thus, for example, state q = [1 0 0 … ] T indicates that community 1 has at least one adopter, community 2 and 3 do not, and so on. The continuous state space X has coordinates x ij  [0,1], where x ij is the ith state variable for the continuous system dynamics evolving in community j. For consistency we use the first coordinate for each community, x 1j , to refer to the fraction of adopters for that community. The continuous system dynamics is defined by a family of q-indexed stochastic differential equations { cs, q } qQ , with where wW is a standard Weiner process and uU is the exogenous input. Ordinarily w is interpreted as a stochastic "disturbance", while u is employed to represent influences from "global" sources such as mass media. These dynamics quantify intra-community diffusion of the innovation of interest, for instance through models of the form  H . The Markov chain matrix (x) specifies the transition rates for discrete state transitions q  q and depends on both G sc and x (e.g., the rate at which community v will "infect" other communities depends upon the fraction of adopters in v). It is worth noting that the model  S-HDS, diff naturally accommodates both probabilistic (via w and the Markov chain dynamics) and set-bounded (through parameter set Par) uncertainty descriptions, as this expressiveness is desirable in applications.

A1.3 A Simple Example
We now demonstrate the implementation of the proposed multi-scale S-HDS diffusion modeling framework, and illustrate its efficacy, through a simple example; a more complex example, with more interesting analytic goals, is investigated in Appendix Two below. Consider a social network consisting of two communities and a social movement process playing out on this network. We construct the social network using the method given in [52]. Briefly, a collection of N vertices is divided into two communities of equal size, denoted L and R (for 'left' and 'right', see Figure 6). For all vertex pairs, if both vertices belong to the same community then an edge is placed between them with probability p i , and if the vertices belong to different communities then they are connected with probability p e  p i . Increasing the ratio p i / p e makes the resulting network more "community-like" by increasing the relative intra-community edge density. Figure 6 shows two small example networks built in this way, with the network on the left corresponding to a larger p i / p e ratio.
The social movement dynamics evolving on this network is a "network version" of the model proposed in [49]. Thus each individual can be in one of three states -member, potential member, and ex-member -and individuals can change states in one of three ways: 1.) members persuade potential members to whom they are linked to become members with probability , 2.) ex-members likewise influence neighboring members to become ex-members with probability  1 , and 3.) members can spontaneously become ex-members with probability  2 . For convenience of reference this "agent-based" system representation is denoted  ABM . Sample results for ABM/S-HDS comparison study. The visualization at top is a cartoon of the basic setup, in which an innovation is introduced into one of the two network communities comprising a social system; possible outcomes include diffusion of the innovation throughout the community initially "infected" (left network, blue vertices are in state M) or across both communities (right network). The plot at bottom shows the probability of "global" diffusion as a function of inter-community interaction intensity for the models  ABM (blue curve) and  S-HDS, diff (red curve).
It is straightforward to derive an S-HDS version of the social movement model  ABM . Consider the diffusion model  S-HDS, diff  {G sc , QX, {f q (x), G q (x), H q (x)} qQ , Par, W, U, {Q, (x)}} specified in Definition A1.2. Note first that in this case the social network community graph G sc is very simple, consisting of two vertices corresponding to communities L and R and an undirected edge connecting them. The continuous system state is x = [P L M L P R M R ] T X, where the subscripts indicate communities (note that the concentrations of ex-members, E L and E R , are not independent states because the total concentration sums to one on each community). We approximate the agent-based social movement dynamics within each network community with the fully-mixed model  H , that is, with a set of stochastic differential equations governing the evolution of the concentrations of members M and potential members P.
It can be seen that  H together with the preceding discussion defines the model components X, {f q (x),G q (x),H q (x)} qQ , Par, W, U that make up the continuous system portion of  S-HDS, diff . Thus all that remains is to specify the discrete system {Q, (x)}. The discrete state set Q = {00, 10, 01, 11} indicates which communities contain at least one movement member, so that for instance state q = 10 indicates that community L has at least one member and community R has no members. The Markov chain matrix (x) specifies the transition rates for discrete state transitions q  q. These rates depend on the continuous system state x because the likelihood that one community will "infect" the other depends upon the current concentrations of members, potential members, and ex-members in that community.
We examine the utility of the S-HDS social diffusion model constructed above by using this model to estimate the probability that a small set of "seed" members introduced into community L will lead to the movement growing and eventually propagating to community R. Because the model  S-HDS, diff is derived from  ABM ,  ABM is taken to be ground truth and  S-HDS, diff is deemed a useful approximation if the cascade probability estimates obtained using the S-HDS representation are in good agreement with those computed based on  ABM . The following parameter values are chosen for  ABM : N  2000,   0.5,  1   0.01,  2   0.1 (the results reported are not sensitive to variation in these values). We build 50 random realizations of the social network for each of 15 p i / p e ratios. The values for p i / p e are selected to generate a collection of 15 network sets whose topologies interpolate smoothly between networks with essentially disconnected communities (large p i / p e ) and networks whose two communities are tightly coupled (small p i / p e ). A "global" cascade is said to occur if an initial seed set of five movement members in community R, chosen at random, results in the diffusion of the movement to community L. The probability of global cascade at a given p i / p e ratio is computed by running 20 simulations on each of the 50 social network realizations associated with that p i / p e , and counting up those for which the innovation propagates to community L. The results of this simulation study are presented in the plot at the bottom of Figure 6, with the blue curve showing the probability estimates as a function of p i / p e ratio and the error bars corresponding to  2 standard errors.
We now investigate the efficacy of the S-HDS social diffusion model by using this model to estimate the probability of global cascade. The social diffusion model  S-HDS, diff is instantiated to be equivalent to the agent-based representation  ABM described above. Note that, in particular, there are no free parameters available to permit the response of  S-HDS, diff to be "tuned" to match  ABM . For instance, the  ABM parameters ,  1 ,  2  uniquely define  S-HDS, diff parameters ,  1 ,  2 , and specifying values for the p i / p e ratios gives corresponding values for the S-HDS transition matrices (x) (to within a single "offset" parameter, see [39]). A Matlab program implementing the resulting model  S-HDS, diff is given in [39].
In order to compute the probability of global cascade using the S-HDS model  S-HDS, diff , we employ the "altitude function" method described in Appendix Two below. This method calculates provably-correct upper bounds on the probability of the social movement propagating to community L. The results of this analysis are given at the plot of the bottom of Figure 6 (red curve). Observe that the global cascade probability estimates obtained using the two models  ABM and  S-HDS, diff are in close agreement. As it is challenging to model "discontinuous" phenomena such as diffusion across social network communities, this agreement represents important evidence that the S-HDS provides a useful characterization of social diffusion on networks.
While the models  ABM and  S-HDS, diff generate similar results in this example, the S-HDS representation is much more efficient computationally. For instance, estimating the desired global cascade probabilities using the S-HDS model requires less than one percent of the computer time needed to obtain these estimates with the equivalent agent-based model. Moreover, this difference on efficiency increases with network size, which is important because realistic social networks have hundreds or thousands of communities rather than just two. This computational tractability hints at a more general, and more significant, mathematical tractability enjoyed by the S-HDS framework, a property we now leverage to develop a rigorous predictive analysis methodology for social diffusion events.

A2. Appendix Two: Predictive Analysis
In this Appendix we formulate the predictive analysis problem in terms of reachability assessment, show that these reachability questions can be addressed through an "altitude function" analysis without computing system trajectories, and apply this theoretical framework to demonstrate that predictability of a broad class of social diffusion models depends crucially upon the meso-scale topological structures of the underlying networks. For convenience of exposition, in this Appendix we focus on network communities as a representative meso-scale structure; how-ever, all results derived here are also applicable to the more general case in which the "network partition" (see Section 2.2) includes both community and core-periphery structures.

A2.1 Predictive Analysis as Reachability Assessment
We propose that accurate prediction requires careful consideration of the interplay between the intrinsics of a process and the social dynamics which are its realization. We therefore adopt an inherently dynamical approach to predictive analysis: given a social process, a set of measurables, and the behavior of interest, we formulate prediction problems as questions about the reachability properties of the system. Toward that end, the behavior about which predictions are to be made is used to define the system state space subsets of interest (SSI), while the particular set of candidate measurables under consideration allows identification of the candidate starting set (CSS), that is, the set of states and system parameter values which represent initializations that are equivalent under the assumed observational capability. This setup permits predictability assessment, and the related task of identifying useful measurables, to be performed in a systematic manner. Roughly speaking, the proposed approach to predictability assessment involves determining how probable it is to reach the SSI from a CSS and deciding if these reachability properties are compatible with the prediction goals. If a system's reachability characteristics are incompatible with the given prediction question -if, say, "hit" and "flop" in a cultural market are both likely to be reached from the CSS -then the prediction objectives should be refined in some way. Possible refinements include relaxing the level of detail to be predicted or introducing additional measurables.
We now make these notions more precise. Consider the multi-scale S-HDS social diffusion model  S-HDS, diff specified in Definition A1.2. Let P 0 be a subset of the parameter set Par and X 0 , X s1 , X s2 be subsets of the (bounded) continuous system state space X. Suppose X 0  P 0 and {X s1 , X s2 } are the CSS and SSI, respectively, corresponding to the prediction question. Let a specification   0 be given for the minimum acceptable level of variation in system behavior relative to {X s1 , X s2 }. Consider the following Definition A2.1: A situation is eventual state (ES) predictable if | 1   2 |  , where  1 and  2 are the probabilities of  S-HDS, diff reaching X s1 and X s2 , respectively, and is ES unpredictable otherwise.
Note that in ES predictability problems it is expected that the two sets {X s1 , X s2 } represent qualitatively different system behaviors (e.g., hit and flop in a cultural market), so that if the probabilities of reaching each from X 0  P 0 are similar then system behavior is unpredictable in a sense that is meaningful for many applications. Other useful forms of predictability are defined and investigated in [39].
The notion of predictability forms the basis for our definition of useful measurables: Definition A2.2: Let the components of the vectors (x 0 , p 0 )  X 0  P 0 which comprise the CSS be denoted x 0 = [x 01 … x 0n ] T and p 0 = [p 01 … p 0p ] T . The measurables with most predictive power are those state variables x 0j and/or parameters p 0k for which predictability is most sensitive.
Intuitively, those measurables for which predictability is most sensitive are likely to be the ones that can most dramatically affect the predictability of a given problem. Note that we do not specify a particular measure of sensitivity to be used when identifying measurables with maximum predictive power, as such considerations are ordinarily application-dependent (see [39] for some useful specifications). Definitions A2.1 and A2.2 focus on the role played by initial states in the predictability of social processes. In some cases it is useful to expand this formulation to allow consideration of states other than initial states. For instance, we show in [18] that very early time series are often predictive for PEP, suggesting that it can be valuable to consider initial state trajectory segments, rather than just initial states, when assessing predictability. This extension can be naturally accomplished by redefining the CSS, for instance by augmenting the state space X with an explicit time coordinate [18].
We now turn our attention to the "early warning" problem.

Definition A2.3:
Let the event of interest be specified in terms of  S-HDS, diff reaching or escaping some SSI X s , and suppose a warning signal is to be issued only if the probability of event occurrence exceeds some specified threshold . Reach warning analysis involves identifying a state set X w , where X s  X w necessarily, with the property that if the system trajectory enters X w then the probability that  S-HDS, diff will eventually reach X s is at least . Analogously, escape warning analysis involves identifying a state set X w , where X \ X w  X s necessarily, with the property that if the system trajectory enters X w then the probability that  S-HDS, diff will eventually escape from X s is at least .

A2.2 Stochastic Reachability Assessment
The previous section formulates predictive analysis problems as reachability questions. Here we show that these reachability questions can be addressed through an "altitude function" analysis, in which we seek a scalar function of the system state that permits conclusions to be made regarding reachability without computing system trajectories. We refer to these as altitude functions to provide an intuitive sense of their analytic role: if some measure of "altitude" is low on the CSS and high on an SSI, and if the expected rate of change of altitude along system trajectories is nonincreasing, then it is unlikely for trajectories to reach this SSI from the CSS.
Consider the S-HDS social diffusion model  S-HDS, diff evolving on a bounded state space Q  X. We quantify the uncertainty associated with  S-HDS, diff by specifying bounds on the possible values for some system parameters and perturbations and giving probabilistic descriptions for other uncertain system elements and disturbances. Given this representation, it is natural to seek a probabilistic assessment of system reachability.
We begin with an investigation of probabilistic reachability on infinite time horizons. The following "supermartingale lemma" is proved in [53] and is instrumental in our development: Lemma SM: Consider a stochastic process  s with bounded state space X, and let x(t) denote the "stopped" process associated with  s (i.e., x(t) is the trajectory of  s which starts at x 0 and is stopped if it encounters the boundary of X). If A(x(t)) is a nonnegative supermartingale then for any x 0 and  > 0 P{sup A(x(t))   | x(0) = x 0 }  A(x 0 ) / .
Denote by X 0  X and X s  X the initial state set and SSI, respectively, for the continuous system component of  S-HDS, diff , and assume that X and the parameter set Par   p are both bounded. Thus, for instance, the SSI is a subset of the continuous system state space X alone; this is typically the case in applications and is easily extended if necessary. We are now in a position to state our first stochastic reachability result: Theorem 2:  is an upper bound on the probability of trajectories of  S-HDS, diff reaching X s from X 0 , while remaining in Q  X, if there is a family of differentiable functions {A q (x)} qQ such that ▪ A q (x)   xX 0 , qQ; ▪ A q (x)  1 xX s , qQ; ▪ A q (x)  0 xX, qQ; ▪ (A q /x) (f q + H q u) + (1/2) tr [G q T ( 2 A q /x 2 ) G q ] +  qQ  qq A q  0 xX, qQ, uU, pPar.
Proof: Note first that BA q (x)  (A q /x) (f q + H q u) + (1/2) tr [G q T ( 2 A q /x 2 ) G q ] +  qQ  qq A q is the infinitesimal generator for  S-HDS, diff , and therefore quantifies the evolution of the expectation of A q (x) [53,34]. As a consequence, the third and fourth conditions of the theorem imply that A(q(t),x(t)) is a nonnegative supermartingale [53]. Thus, from Lemma SM, we can conclude that P{x(t)X s for some t}  P{sup A(q(t),x(t))  1 | x(0)=x 0 }  A(q,x 0 )   x 0 X 0 , qQ, uU, pPar.  The preceding result characterizes reachability of S-HDS on infinite time horizons. In some situations, including important applications involving social systems, it is of interest to study system behavior on finite time horizons. The following result is useful for such analysis: Once the set of conditions to be satisfied by A(x) are relaxed in this way, SOS programming can be used to compute  min , the minimum value for the probability bound , and A(x), the associated altitude function which certifies the correctness of this bound. Software for solving SOS programs is available as the third-party Matlab toolbox SOSTOOLS [56], and example SOS programs are given in [39]. Importantly, the approach is tractable: for fixed polynomial degrees, the computational complexity of the associated SOS program grows polynomially in the dimension of the continuous state space, the cardinality of the discrete state set, and the dimension of the parameter space.
For completeness, we outline an algorithm for computing the pair ( min , A(x)): Algorithm A2.1: altitude functions via SOS programming (outline)

A2.3 Application to Social Diffusion
The theoretical framework developed in the preceding sections is now used, in combination with empirically-grounded models for social diffusion [e.g., 17,[49][50][51], to demonstrate that predictability of this class of diffusion models depends crucially upon network community structure. We investigate the following predictability question: Is the diffusion of social movements and mobilizations ES predictable and, if so, which measurable quantities have predictive power?
We where ▪ the social network community graph G sc consists of K communities (so |V sc | = K), connected together with an Erdos-Renyi random graph topology, with community size drawn from a power law distribution [36]; ▪ each continuous system  cs, q : dx = f q (x,p)dt + G q (x,p)dw, qQ, is given by the meso-scale social movement model  H or  B with appropriate parameter vector p and system "noise" w; ▪ the discrete system {Q, (x)} is a Markov chain that defines inter-community interactions in the manner described in Definition A1.2.
A Matlab instantiation of this S-HDS diffusion model is given in [39] and is available upon request. The behavior of the model can be shown to be consistent with empirical observations of several historical social movements (e.g., various movements in Sweden) [39]. In order to assess ES predictability, SSI = {X s1 , X s2 } is defined so that X s1 , X s2 are state sets corresponding to global (affecting a significant fraction of the population) and local (remaining confined to a small fraction of the population) movement events, respectively. We then employ Algorithm A2.2 iteratively to search for a definition for CSS = X 0  P 0 which ensures that the probabilities of reaching X s1 and X s2 from X 0  P 0 are sufficiently different to yield an ES predictable situation. We use two models of the form  S-HDS, diff for this analysis, corresponding to the two definitions for the continuous system  H and  B . Each model is composed of K = 10 communities connected together with an Erdos-Renyi random graph topology. (Using different realizations of the Erdos-Renyi random graph does not affect the conclusions reported below.) ES predictability analysis yields two main results. First, both the intra-community and intercommunity dynamics exhibit threshold behavior: small changes in either the intra-community "infectivity" or inter-community interaction rate around their threshold values lead to large variations in the probability that the movement will propagate "globally". More quantitatively, for the diffusion model  S-HDS, diff with continuous system dynamics  H , threshold behavior is obtained when varying 1.) the generalized reproduction number R   /  2 and 2.) the rate  at which in-ter-community interactions between individuals take place. Thus in order for a social movement to propagate to a significant fraction of the population, the threshold conditions R1 and  0 must be satisfied simultaneously. An analogous conclusion holds when  H is replaced with the diffusion model  B in the S-HDS representation. This finding is reminiscent of and extends wellknown results for epidemic thresholds in disease propagation models [1].
This threshold behavior is illustrated in the plot at the top right of Figure 7, which shows the way probability of global propagation increases with inter-community interaction rate when the intra-community diffusion is sufficiently infective (i.e., R1). The probabilities which make up  this plot represents provably-correct (upper bound) estimates computed using Theorem 2 and Algorithm A2.1. A similar threshold response is observed when varying intra-community infectivity R, provided the inter-community interaction rate satisfies  0 . Importantly, the intercommunity interaction threshold  0 is seen to be quite small, indicating that even a few links between network communities enables rapid diffusion of the movement to otherwise disparate regions of the social network. This result suggests that a useful predictor of movement activity in a given community is the level of movement activity among that community's neighbors in G sc .
The second main ES predictability result characterizes the way probability of global propagation varies with the number of network communities across which a fixed set of "seed" movement members is distributed. To quantify this dependence, the social movement model  S-HDS, diff is initialized so that a small fraction of individuals in the population are movement members and the remainder of the population consists solely of potential members. We then vary the way this initial seed set of movement members is distributed across the K network communities, at one extreme assigning all seeds to the same community and at the other spreading the seeds uniformly over all K communities. For each distribution of seed movement members, the probability of global movement propagation is computed using Theorem 2 and Algorithm A2.1. Other than initialization strategy, the model is specified exactly as in the preceding analysis.
The results of this portion of the ES predictability assessment are summarized in the two plots at the bottom of Figure 7. It is seen that for both choices of meso-scale social movement dynamics,  H and  B , the probability of global movement propagation increases approximately linearly with the number of network communities across which the fixed set of seed members is distributed (here the number of initial members is set to one percent of the total population).