 Research
 Open Access
 Published:
An “Estimate & Score Algorithm” for simultaneous parameter estimation and reconstruction of incomplete data on social networks
Security Informatics volume 2, Article number: 1 (2013)
Abstract
Abstract
Dynamic activity involving social networks often has distinctive temporal patterns that can be exploited in situations involving incomplete information. Gang rivalry networks, in particular, display a high degree of temporal clustering of activity associated with retaliatory behavior. A recent study of a Los Angeles gang network shows that known gang activity between rivals can be modeled as a selfexciting point process on an edge of the rivalry network. In reallife situations, data is incomplete and lawenforcement agencies may not know which gang is involved. However, even when gang activity is highly stochastic, localized excitations in parts of the known dataset can help identify gangs responsible for unsolved crimes. Previous work successfully incorporated the observed clustering in time of the data to identify gangs responsible for unsolved crimes. However, the authors assumed that the parameters of the model are known, when in reality they have to be estimated from the data itself. We propose an iterative method that simultaneously estimates the parameters in the underlying point process and assigns weights to the unknown events with a directly calculable score function. The results of the estimation, weights, error propagation, convergence and runtime are presented.
Introduction
In this work we focus our attention on data sets of events involving rival gangs on a social network. Each event in the data set corresponds to a crime that occurs at a specified time and involves a pair of rival gangs. A subset of these events are unsolved crimes in which one or both of the rival gangs is not known. The method developed in this paper could be broadly applied to any social network involving activities in time between pairs of nodes on the network. However the interest in the problem came about by examining data from the Hollenbeck Division of the Los Angeles Police Department, home to 29 street gangs with a wellknown rivalry network [1][3].
Unlike other methods used to address incomplete data relating to social networks [4, 5], the question at hand is not if a rivalry exists, but rather to which rivalry a violent event belongs. This structure of between gang rivalries can be viewed as a social network [6] often embedded in space [1, 2]. Violent events involving gangs tend to be dyadic, and so we can formulate these events as a realization of a stochastic process occurring on the edges of the rivalry network. For each edge in the network there exists a different stochastic process. In our analysis however, we use identical parameters to generate synthetic data. The method does not assume that the underlying parameters generating each process are identical.
The first step to inferring the affiliation of the violent events is to understand the underlying stochastic process. This requires us to capture the behavior of criminal activity through computational means, much like in [7][9]. Recently methods have been proposed in the literature to mathematically model gang violence. The authors in [10] employ an agentbased model to investigate the geographic influences in the formation of the gang rivalry structure observed in Hollenbeck. These authors consider the longterm structure of the rivalry network embedded in space. In terms of the rivalry violence, a shorter timescale must be considered.
Violence among gangs exhibits retaliatory behavior [11]. In other words, given an event has happened between two gangs, the likelihood that another event will happen shortly after is increased. A problem such as this is modeled naturally by a selfexciting point process. It is interesting to note that these models were first used to analyze earthquakes [12][15]. Since then, they have been used to model financial contagion in credit markets [16, 17], viral videos on the web [18], terrorist activity in Indonesia [19], and the spread of infectious disease [20]. In this analysis we limit the scope of our model to include time only, thus providing a baseline model.
The authors in [21] and [22] have successfully modeled the pairwise gang violence as a Hawkes process [23]. All of the events are associated with exactly one rivalry, or edge of a social network. The violence on each edge, k, is assumed to have the conditional intensity
In this Hawkes process, the intensity λ_{ k }(tH_{τ,k}) depends on the history of the process ${H}_{\tau ,k}=\{{t}_{1},{t}_{2},\cdots {t}_{{M}_{k}}\}$, where M_{ k }is the number of events for process, k. In this framework, the window of time, [0,T], observed for each process in the network is the same. However, the number of events in each process, M_{ k }, is stochastic, and therefore varies from process to process. In practice the final time, T, is determined by the end of the data collection period. Further, the edges of the window introduce boundary effects that are adjusted for in the parameter approximation, see Equations 10 and 11.
The background rate of the process is defined by the constant μ_{ k }. In the context of gang rivalries, background events can be thought of as random occurrences between rival gangs that trigger retaliatory events. The expected number of offspring for any event is determined by the constant α_{ k }, and the decay of the intensity back to the background rate is ω_{ k }. Offspring events, in this context, could be interpreted as retaliatory events. Larger values for μ_{ k }and α_{ k }produce more background and offspring events respectively. Larger values of ω_{ k } do not influence the total number of events, but rather the amount of clustering in time.
The authors of [24] produce a mathematical framework to solve the incomplete data problem observed in gang violence data sets. In their work they use an optimization strategy that computes the weights to infer the rivalry affiliation of the incomplete data. In this formulation the authors prove that their optimization has a unique solution under mild constraints. This is substantial contribution in inferring the affiliation of the unknown violent events. However, the authors of [24] assume that the process parameters are known, an assumption that is often not feasible in practice. Further, finding the weights requires solving a computationally expensive optimization problem.
We propose an iterative method that (A) estimates the process parameters assuming the data is generated by the process defined by Equation 1 and (B) infers the process affiliation of simulated data via a direct method of computation. We iterate between (A) and (B) until the estimates for the unknown events converge. We call this the Estimate & Score Algorithm (ESA). The details of the ESA are described in Section “The Estimation & Score Algorithm (ESA)”. The ESA is tested on simulated data in Section “Results”, with analysis of the estimation of the parameters in the presence of incomplete data (see Subsection “Estimation analysis”) and comparison of the proposed score functions with that of the StomakhinShortBertozzi (SSB) method in [24] (see Subsection “Updating Weights analysis”). In Subsection “Runtime Analysis” there is an analysis of the runtime between the StomakhinShortBertozzi and the Forward Backward score functions used to update the weights (see Subsection “Runtime Analysis”). Subsection “Convergence Results” contains an analysis of the convergence of the Estimation & Score Algorithm. This method solves the more realistic problem of estimating the process and the weights. Further, the computation for the weight updates is more direct and therefore avoids performing the costly optimization scheme used in [24]. This is a novel piece of work with many exciting extensions. A final discussion of the results and future work is presented in Section “Discussion and Future Work”. As in [24] we do not use field data in this paper, rather we generated point process data using similar parameters as observed in the field data for Hollenbeck [22]. By using simulated data to test the algorithms we have actual ground truth evaluate the performance of the method.
Problem Formulation
The data is assumed to lie on a known social network containing K processes, where each of the K processes is a pairwise rivalry between two gangs. From this set of events, there are a total of N events where the time is known, but the processes affiliation is not known. These events are referred to as unknown events. Each of the N unknown events are placed into each of the K processes. Since the process affiliation is not known for all of the events in the network, each event is given an associated weight, S_{i,k}. Here S_{i,k}is the i th element of the k th process. If the event is known S_{i,k}=1. If S_{i,k} is unknown then it is assigned a number between 0 and 1 by our algorithm. We enforce the constraint that $\sum _{k=1}^{K}{S}_{i,k}=1$.
A simplified representation of our problem formulation can be found in Figure 1. The known events are represented by circles and the unknown event is represented by a triangle. Here we can see that since we do not know the affiliation of the triangle event, it is placed in all of the other processes. We emphasize that this represents our lack of information about which rivalry it belongs to.
As indicated in Figure 1, for each process in the network events e_{i,k} are indexed by increasing time, ${t}_{1}\le {t}_{2}\le {t}_{3}\cdots \le {t}_{{M}_{k}}$. Ordering the events in such a way has the consequence that the first unknown element in time, for example, may have different indexes for different processes. In Figure 1 the triangle index in first process is the third event, e_{1,3}. However the triangle in the K th process is the second event, e_{K,2}. One can easily keep track of the local index of a unknown event for each process.
The Estimation & Score Algorithm (ESA)
The proposed Estimation & Score Algorithm can be broken into three basic stages: initialization, parameter estimation, and updating the weights. This method is succinctly described in Figure 2.
Initialization
For this paper, there were two ways of initializing the Estimate & Score Algorithm. The first is used to infer rivalry affiliation given field data. After importing the data, the unknown events are identified and placed into each of the of the K processes. The weights, S_{i,k}, must also be initialized. If the event is known, then S_{i,k}=1. If the event is unknown then ${S}_{i,k}=\frac{1}{K}$.
An alternate initialization utilizes simulated data in order to test the components of the Estimate & Score Algorithm. In this case, data is generated from K independent Hawkes processes with given μ_{ k }, α_{ k }, and ω_{ k }. From these data, choose N events at random from the network to mark as unknown. Place these N unknown events into each of the other processes. Initialize the weights such that for known events S_{i,k}=1 and for unknown events S_{i,k}=1/K. This initialization process is used in this paper to test the method and produce the results in Section “Results”.
Parameter Estimation
In the presence of no unknown events, there are both parametric [12] and nonparametric [25][28] ways to model the underlying stochastic process on each edge of the social network. For this work, we chose a parametric form for the triggering density to validate the model but the results could easily be extended to the nonparametric case. We note that, as is usual with nonparametric estimates, speed would be compromised for the sake of flexibility.
For this paper, the data is assumed to be a realization of Equation 1, where the parameters are estimated using a method similar to the Expectation Maximization (EM) algorithm [29]. An EMlike approach is taken because of the branching structure present in a Hawkes process. In such a process each event can be associated with a background or response event. However, given a realization from this process it is not immediately obvious whether an event is a background or response event. We can view this information as a hidden variable that we must estimate. In this way, every event in each of the K processes is assigned a probability ${P}_{i,j}^{k}$. The probability that event i is a background event is denoted ${P}_{i,i}^{k}$, and probability that event i caused event j is denoted ${P}_{i,j}^{k}$. This assumes that t_{ i }<t_{ j }. From this EM estimation, the approximation for each of the variables is altered to include the weights for the unknown events. In fact, in the case where all the events are known, the estimation formulas are the same. This section derives the EM estimates when in the presence of incomplete data.
The classical loglikelihood function ${\widehat{\ell}}_{k}\left({H}_{\tau ,k}\right{\mu}_{k},{\alpha}_{k},{\omega}_{k})$ for a general point process with a fixed window [0,T] is
Incorporating the branching structure into the loglikelihood function, the event association is added as a random variable, χ_{i,j} such that
This branching allows us to separate those events associated with the background μ_{ k } and the response $g\left(t\right)={\alpha}_{k}{\omega}_{k}{e}^{{\omega}_{k}t}$. This leads to the altered loglikelihood function
Taking the expectation of ℓ_{ k }(H_{τ,k}μ_{ k },α_{ k },ω_{ k }) with respect to χ_{i,j}results in
In the EM algorithm, the quantity E_{ χ }[ℓ_{ k }(H_{τ,k}μ_{ k },α_{ k },ω_{ k })] is maximized with respect to each of the variables μ_{ k },α_{ k },ω_{ k } given the data H_{τ,k}. This leads to the EM estimates
Where ${P}_{i,j}^{k}$ is defined by
for t_{ i }<t_{ j }. The EM algorithm then becomes a matter of iterating between estimating the probabilities and the parameters. It has been proven that this algorithm will converge under mild assumptions [29]. Further, Equation 6 adjusts for boundary effects.
In the presence of events with unknown process affiliation in the network, we assign weights to the contribution of each event to the loglikelihood function. Specifically, each of the unknown events in process k have a weight S_{i,k}, such that ∑_{ k }S_{i,k}=1. For the known events S_{i,k}=1. These weights are incorporated for each process via
Note that L_{ k }(H_{τ,k}μ_{ k },α_{ k },ω_{ k }) is no longer an EM log likelihood in the presence of unknown data. Maximizing L_{ k }(H_{τ,k}μ_{ k },α_{ k },ω_{ k }) with respect to each of the parameters the estimates become
When all of the events are known, i.e. S_{i,k}=1 when unknown event i, k belongs to process k and is zero otherwise, these estimates become identical to the EM parameter estimates.
Updating weights
At the start of the Estimation & Score algorithm all of the weights for the unknown events are S_{i,k}=1/K. Once the parameters are estimated using the altered EM algorithm described in Equation 11, the weights, S_{i,k}, are updated, see Figure 2. Here we present four different score functions and the StomakhinShortBertozzi method [24], used to define, q_{i,k}, the intermediate process affiliation. Each of these score functions synthesize information from different portions of the data set. Given an event early in the data set, a score function that uses future events would be ideal. On the other hand, for later events a score function using previous events is desired. Similar considerations should be made if there are portion of the data with more incomplete data. After all of these intermediate weights, q_{i,k}, have been calculated, they are renormalized as a probability via ${S}_{i,k}=\frac{{q}_{i,k}}{\sum _{k}{q}_{i,k}}$. For simplicity we consider a response function of the form, ${g}_{k}\left(t\right)={\alpha}_{k}{\omega}_{k}{e}^{{\omega}_{k}\left(t\right)}$.
Ratio Score Function
The Ratio score function considers the ratio of the background rate μ_{ k } and the sum of all the future events, _{∑i<j}g_{ k }(t_{ j }−t_{ i }). Mathematically the score is determined by
Lambda Score Function
The Lambda score function uses only previous information by taking the ratio of the intensities evaluated at the unknown event time t_{ i }.
StomakhinShortBertozzi (SSB) method
The method defined in [24] is summarized by
subject to
This method is motivated by the Hawkes process defined in Equation 1.
Probability Score Function
The Probability score function uses the approximation of the branching structure of the underlying process. The idea behind this method is events that are background events with no corresponding response events should not belong in the process. An event that is a background with many response events or an event that is a response to another event should be part of that process.
Forward Backward Score Function
This method is the ratio of the summation of the response for the events in the future and the past, $\sum _{i\ne j}{g}_{k}\left(\right{t}_{i}{t}_{j}\left\right)$ over the background rate μ_{ k }.
Results
The Estimation & Score Algorithm is tested for accuracy on simulated data from the Hawkes process defined in Equation 1. An analysis of the parameter estimation method outlined in Subsection “Parameter Estimation” is conducted in Subsection “Estimation analysis”. A comparison of the score functions when assuming the true parameters is found in Subsection “Updating Weights analysis”. Subsection “Runtime Analysis” provides a comparison of the runtime between the Forward Backward score function and the StomakhinShortBertozzi method. A example of convergence of the Estimate & Score Algorithm is provided in Subsection “Convergence Results”.
Estimation analysis
There are many ways we could allow the unknown events to influence our estimates of the underlying parameters for each process. There are two extremes. On the one hand, we could exclude all of the unknown events from the parameter estimation. This would be equivalent to setting the S_{i,k}=0 for all unknown events. On the other hand, we could include all of the unknown events in the estimation of the parameters for each process. This would be equivalent to letting S_{i,k}=1 for all i and k. Another possible estimation method is some combination of these two. We propose this as a way of allowing the unaffiliated events to play some role in the estimation process. The naive choice is allowing each event to play the same role in each process. This amounts to setting S_{i,k}=1/K for the unknown events. We compare these three choices to the estimations obtained by the Estimate & Score Algorithm (ESA) using the Forward Backward score function. Finally, we want to compare all four of these possible estimation techniques to the best we could possibly do. In this case, that would mean we knew all the affiliations for the events (i.e. there are no unknown events).
Figures 3, 4, 5 displays the results for the μ_{ k }, α_{ k }, and ω_{ k } estimates for the five cases: S_{i,k}=0 for unknown events (dashtriangle), S_{i,k}=1 for unknown events (dashsquare), S_{i,k}=1/K for unknown events (dashx), the results using ESA (dashcircle), and the estimates you get when you know all the affiliations for the unknown events (solid). These results with standard deviations are displayed in Tables 1, 2, 3. In each of the three figures, the estimates are plotted vs the number of unknown events. Each network has five processes with the true parameters μ_{ k }=0.01, ω_{ k }=0.1, and α_{ k }=0.5. Different networks are created with 15, 30, 45, 60, and 75 unknown events. We estimate the parameters using each of the five methods explained above. This procedure is repeated 100 times with different random seed values and then the average estimate is calculated.
Notice in the estimates for μ_{ k }in Figure 3 and Table 1, the ESA performs the best compared to the true value and has only a slight reduction in accuracy as the number of unknown events increases. On average the other three estimates seem to degrade more rapidly as the number of unknown events increases. When S_{i,k}=1, the estimates for μ_{ k } are far above the true value and growing as the number of unknown events increases. This follows from the fact that letting S_{i,k}=1 means we are effectively adding events to the network. Take the case when K=2. Assume that each process has 1000 events, and there are 100 unknown events from each process. When we estimate the parameters for the first process, we will use the 900 events we know plus the 200 unknown events from the network. We will get the identical number of events in our estimation for process two. This creates 200 new events and thus biases the estimates for μ_{ k }. This motivated the idea of equal weighting for each unknown event, and that choice is validated by the estimates for μ_{ k }. A similar argument shows why S_{i,k}=0 (i.e. ignoring all the unknown events) has the lowest estimate for μ_{ k } at each level of incomplete data.
In the estimates for the branching ratio α_{ k }, the ESA on average yields the best estimates and maintains its accuracy in the presence of unknown events. It is interesting to note that equal weighting performs worse here than if we let S_{i,k}=0 for all unknown events. Using the ESA overcomes this drawback. Again, setting S_{i,k}=1 for all unknown events performs the worst. This could stem from the fact that most of the unknown events are being labeled background and thus this estimation technique underestimates the branching ratio because fewer events are considered offspring. Notice that the estimate for ESA (dashcircle) tracks the best possible estimate (solid) well while the other three start to trail off as more and more information is labeled as unknown.
Finally, in Figure 5 and Table 1, it is shown that the ESA estimate (dashcircle) for ω_{ k }tracks the behavior of the best estimate (solid) closer than the other methods. Including all of the unknown events (dashsquare) provides the poorest estimate for ω_{ k }. For the other three estimation techniques we see that they are all comparable.
Updating Weights analysis
To understand the strengths and weaknesses of each of the five score functions, defined in Subsection “Updating weights”, the score functions were evaluated for 100 incomplete events using the true values for μ_{ k }α_{ k }, and ω_{ k } when taking the Top 1, Top 2, and Top 3 best inferences. For comparison to [24], the true parameters were taken to be μ_{ k }=0.01, ω_{ k }=0.1, and α_{ k }=0.5. Due to the stochastic nature of the processes, for each level of process number 100 random networks were tested. The average results of this analysis are found in Figure 6. The number correctly identified by the each of the score functions is on the vertical axis. The horizontal axis displays the number of processes in the network.
From Figure 6 it is clear that the StomakhinShortBertozzi score function in solid dark blue, and the Forward Backward score function (cyan dashed diamond) perform nearly identically when looking at the Top 1, Top 2, and Top 3 inferences. These functions look both forward and backward in time from the incomplete event, and are therefore able to identify clusters of events in time. The Probability (black dashed asterisk) and Ratio (solid green square) score functions don’t do nearly as well the StomakhinShortBertozzi and Forward Backward score functions, but better than the Lambda (magenta dashed circle) score function. The Lambda score function appears to perform close to chance (dark green solid plus) for the Top 1, Top 2, and Top 3 inferred process affiliation. Due to the success of the Forward Backward score function and the StomakhinShortBertozzi method, only these are used for further analysis.
The analysis comparing the score functions assumed that the true parameters were known. However, when applying this method in practice there will be error in the estimated parameters. This estimation error will propagate through to the score functions. To understand how deviations of the estimated parameters influence the score functions pairwise combinations of the parameters were increased and decreased by 90% from the target values μ_{ k }=0.01, ω_{ k }=0.1, and α_{ k }=0.5 in 10% increments. In particular the Forward Backward and SSB score functions are computed for pairwise combinations of μ in the range of [0.001,0.019], ω in the range of [0.01,0.19], and α in the range of [0.05,0.95]. Further, in these pairwise combinations, the third parameter is kept at the target value. Notice that a 90% change is larger than the errors observed in the parameter estimates in Subsection “Estimation analysis”.
To examine the propagation of errors of the parameters to the score functions one event from a network with 10 processes is chosen to be unknown. The score function S_{1,true} with the target parameters, μ_{ k }=0.01, ω_{ k }=0.1, and α_{ k }=0.5 for the true process is calculated. Then, on the same network, the parameters are offset by
The offset score function ${\u015c}_{1,\mathit{\text{true}}}$ is calculated from these offset parameters. The difference between ${S}_{1,\mathit{\text{true}}}{\u015c}_{1,\mathit{\text{true}}}$ is taken for each pairwise combination of parameters. Again, due to the stochastic nature of the processes, each analysis was done for 100 runs and the average difference in score functions is recorded. The results of this analysis are displayed in Figure 7 with those of the Forward Backward score function (left), and those for the StomakhinShortBertozzi score function (right). In general the StomakhinShortBertozzi score function is more sensitive to the changes than the Forward Backward score functions for the μ_{ k } and α_{ k }parameters. Changes in the Forward Backward score functions are minimal for most changes of parameters except for small values of ω_{ k }. As ω_{ k } decreases then the approximated Forward Backward score function decreases, causing a positive difference. As seen in Subsection “Estimation analysis”, Figure 5, when estimating ω_{ k }, there is a tendency to over, not under estimate the parameter, and so this does not appear to occur within these parameters. The changes in the StomakhinShortBertozzi score function depend on all of the pairwise changes of the parameters. As μ_{ k }increases the computed StomakhinShortBertozzi decreases. On the other hand, as ω_{ k } or α_{ k }increase the score function increases. This analysis shows that though the StomakhinShortBertozzi method and the Forward Backward score functions preform similarly when the parameters are known exactly, under the influence of estimation error the StomakhinShortBertozzi score function varies more than the Forward Backward score function.
Runtime Analysis
Though the Forward Backward score function and the SSB method produce comparable results in terms of accuracy, there is a sizable difference in the time it takes to update the weights using these methods.
The Forward Backward score function is designed to be direct, meaning calculates the weights using available information without need for iteration. The StomakhinShortBertozzi method, however, determines the weight by solving a optimization problem. A closed form solution for the maximized weights is not known to these authors, so the weights are found by numerically approximating the weights that maximize Equation 14. In the implementation of the StomakhinShortBertozzi we employ a gradient ascent method which requires 411 iterations to reach convergence with a tolerance of 0.001. The direct methods, Forward Backward, Probability, Ratio, and Lambda score functions, are on the same order of operations as one iteration of the gradient ascent used to solve Equation 14. Specifically, one iteration of the gradient ascent method and calculating the direct score functions are O(N·K·M) where N is the number of unknown events, K is the number of processes and M is the expected number of events in process k. The expected number of events in process k can be further analyzed via,
The run time of both the Forward Backward function and the StomakhinShortBertozzi method are empirically examined in Figure 8. Both score functions were calculated with 20 networks for each level of number unknown events and number of processes with the known parameter values of μ_{ k }=0.01, ω_{ k }=0.1, and α_{ k }=0.5. All of the run times are calculated in milliseconds. It can be seen that the average run time needed to compute the Forward Backward function at every level of N and K is substantially less than that of the StomakhinShortBertozzi method. Also, it is clear from this figure that the time needed to calculate both of these methods increases as N and K increase.
Convergence Results
The Estimation & Score Algorithm converges quickly when either the Forward Backward score function or StomakhinShortBertozzi method are used. Figure 9 displays the parameter estimates for a typical run of the Estimation & Score Algorithm for both the Forward Backward (left) and StomakhinShortBertozzi (right). Both score functions produce qualitatively similar results, and it appears that the rate of convergence is comparable for both cases. The estimated weights for one unknown data event for this typical run versus the iteration for each process are plotted in Figure 10. The weights plotted are obtained from the Forward Backward score function (left) and the StomakhinShortBertozzi method (right). It is interesting to note that both methods of weighting choose the same process affiliation as the most likely. Further tests were conducted with a variable initial weighting. These runs showed similar behavior as initializing the Estimate & Score Algorithm with S_{i,k}=1/K, implying that the Estimate & Score Algorithm is robust to small perturbations of the initial weighting.
Discussion and Future Work
In this paper we propose an effective method for simultaneously estimating the parameters and assigning process affiliation in case of incomplete field data from selfexciting point processes on a network. This problem comes from the demand for law enforcement agencies to identify gang affiliation in the case of unsolved crimes in an area of highly complex gang rivalry activity. We present a new framework we name the Estimate & Score Algorithm for possible application to field data. By testing the method on simulated datasets we can understand its performance features and liabilities. The method is an iterative procedure in which process parameters are estimated alternately with the calculation of network affiliation probabilities. We identify several useful ‘score functions’ for calculating the network affiliations. We also compare the use of unknown events in the parameter estimation. One upshot of our analysis is that the inclusion of unknown events may increase the accuracy of the parameter estimation. Several score functions are considered and the Forward Backward score function shows the most promise with comparable results to that of the StomakhinShortBertozzi method of [24] in the parameter regime tested. The score function calculation is a direct method that does not rely on solving a variational problem, and thus is more computationally efficient than [24].
For future work, space often plays a role in understanding criminal activity [8, 30][33]. Further, criminal behavior has nonrandom structure and can often be framed in terms of routine activity theory [34, 35]. In the case of gang violence, there is a strong spatial component [1, 10, 36]. One can extend the Estimate & Score Algorithm to include space. There is a precedence in the earthquake literature of adding space to selfexciting point processes [13, 15], however, in the case of gang violence, the spatial response may be different. Instead of retaliatory events clustering around prior events, it appears that the data is clustered around regions in space. A spatial model similar to that of [37] could be employed, where the triggering density in space is related to their respective gang setspace, or center of activity [38]. Statistically when modeling spatial point processes one needs to tease out the difference between hot spots due to risk heterogeneity versus event dependence. The data given will be one realization of the underlying process, however using techniques such as prototyping [39], one could potentially reformulate the data into multiple realization of the same process and distinguish between these two phenomena.
There are other factors in the data that can be fused into the model, though more analysis would be required. For example, in earthquake modeling the magnitude of the earthquake is often included. To include such a factor to the intensity λ_{ k }(tH_{τ,k}) one would need to determine a numerical metric to define the impact of each event type. This is not a straightforward task and would require further investigation. Extending this model in this way could allow for the inclusion of events involving tagging, or other low level gang crimes, which could be a precursor to more extreme violent interactions between gangs. Including this data is outside of the scope of the current model but has a strong potential to enrich the overall data set allowing for better analysis.
It is important to note that there are other methods to approximate the underlying form of the self exciting process. For example the authors in [28] consider the general form of the intensity function λ_{ k }(tH_{τ,k}) to be
Using a nonparametric method, they are able to approximated the background function μ(t) and the response function g(t) for a broader class of functions. In this paper, the data was assumed to come from a Hawkes process with constant background rate and an exponential response to previous events. There are cases where the background rate is not constant [40]. Further it is conceivable that the response function could be of a form other than an exponential decay. In this circumstances, the model for λ(tH_{ τ }) in Equation 1 would not be appropriate.
Finally, this method has a great potential in the field of policing. Once such a model has been calibrated correctly, the Estimation & Score Algorithm using the quicker Forward Backward score function can be used to infer the gang association in real time, while the investigation is on going. Given an accurate model of the underlying process, such a method could identify rivalries that have heightened activity.
Author’s contributions
RH led in the construction of the Estimation & Score algorithm, coding, analysis of the results and writing of the manuscript. EL contributed to the construction of the Estimation & Score algorithm, coding, analysis of the results, and writing of the manuscript. AB contributed to the analysis of the results, editing of the manuscript, and conception and design of the Estimation & Score algorithm. All authors read and approved the final manuscript.
Abbreviations
 EM:

Expectation Maximization
 SSB:

StomakhinShortBertozzi
 ESA:

Estimation & Score Algorithm
 Avg:

Average.
References
 1.
Radilm SM, Flint C, Tita GE: Spatializing Social Networks: Using Social Network Analysis to Investigate Geographies of Gang Rivalry, Territoriality, and Violence in Los Angeles. Annals of the Association of American Geographers 2010, 100(2):307–326. http://www.tandfonline.com/doi/abs/10.1080/00045600903550428
 2.
Tita G, Radil S: Spatializing the social networks of gangs to explore patterns of violence. Journal of Quantitative Criminology 2011, 27: 1–25. 10.1007/s1094001090955
 3.
Tita G, Riley JK, Ridgeway G, Abrahamse AF, Greenwood P: Reducing Gun Violence: Results from an Intervention in East Los Angeles. RAND Press, Santa Monica, CA; 2004.
 4.
Hoff P: Multiplicative latent factor models for description and prediction of social networks. Computational & Mathematical Organization Theory 2009, 15: 261–272. http://dx.doi.org/10.1007/s10588–008–9040–4. [10.1007/s10588–008–9040–4]. [10.1007/s1058800890404]. 10.1007/s1058800890404
 5.
Koskinen JH, Robins GL, Pattison PE: Analysing exponential random graph (pstar) models with missing data using Bayesian data augmentation. Statistical Methodology 2010, 7(3):366–384.
 6.
Short MB, Mohler GO, Brantingham P, Tita GE: Gang rivalry dynamics via coupled point process networks. 2011.
 7.
Brantingham P, Glässer U, Jackson P, Vajihollahi M: Modeling Criminal Activity in Urban Landscapes. In Mathematical Methods in Counterterrorism. Edited by: Memon N, David Farley J, Hicks DL, Rosenorn T. Springer Vienna; 2009:9–31. http://dx.doi.org/10.1007/978–3211–09442–6\_2. [10.1007/978–3211–09442–6_2]
 8.
Brantingham P, Glasser U, Kinney B, Singh K, Vajihollahi M: A computational model for simulating spatial aspects of crime in urban environments. In Systems, Man and Cybernetics, 2005 IEEE International Conference on Volume 4. IEEE; 2005:3667–3674.
 9.
Brantingham P, Brantingham P: Computer simulation as a tool for environmental criminologistsm. Security Journal 2004, 17: 21–30.
 10.
Hegemann RA, Smith LM, Barbaro AB, Bertozzi AL, Reid SE, Tita GE: Geographical influences of an emerging network of gang rivalries. Physica A: Statistical Mechanics and its Applications 2011, 390(21–22):3894–3914. http://www.sciencedirect.com/science/article/pii/S037843711100447X
 11.
Decker S: Collective and normative features of gang violence*. Justice Quarterly 1996, 13(2):243–264. http://www.ingentaconnect.com/content/routledg/rjqy/1996/00000013/00000002/art00005
 12.
Veen A, Schoenberg FP: Estimation of Space–Time Branching Process Models in Seismology Using an EM Type Algorithm. Journal of the American Statistical Association 2008, 103(482):614–624. http://pubs.amstat.org/doi/abs/10.1198/016214508000000148
 13.
Ogata Y: SpaceTime PointProcess Models for Earthquake Occurrences. Annals of the Institute of Statistical Mathematics 1998, 50: 379–402. http://dx.doi.org/10.1023/A:1003403601725. [10.1023/A:1003403601725]
 14.
Ogata Y: Statistical models for earthquake occurrences and residual analysis for point processes. J. Amer. Statist. Assoc 1988, 83(401):9–27. 10.1080/01621459.1988.10478560
 15.
Zhuang J, Ogata Y, VereJones D: Stochastic declustering of spacetime earthquake occurrences. Journal of the American Statistical Association 2002, 97(458):369–380. 10.1198/016214502760046925
 16.
Errais E, Giesecke K, Goldberg LR: Affine Point Processes and Portfolio Credit Risk. SIAM Journal on Financial Mathematics 2010, 1: 642–665. http://link.aip.org/link/?SJF/1/642/1 http://link.aip.org/link/?SJF/1/642/1 10.1137/090771272
 17.
AïtSahalia Y, CachoDiaz J, Laeven R: Modeling financial contagion using mutually exciting jump processes, Tech. rep. National Bureau of Economic Research; 2010.
 18.
Crane R, Sornette D: Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences 2008, 105(41):15649–15653. http://www.pnas.org/content/105/41/15649.abstract 10.1073/pnas.0803685105
 19.
Porter M, White G: Selfexciting hurdle models for terrorist activity. The Annals of Applied Statistics 2011, 6: 106–124.
 20.
Meyer S, Elias J, Höhle M: A Space–Time Conditional Intensity Model for Invasive Meningococcal Disease Occurrence. Biometrics 2011, 68: 607–616.
 21.
Mohler GO, Short MB, Brantingham PJ, Schoenberg FP, Tita GE: SelfExciting Point Process Modeling of Crime. Journal of the American Statistical Association 2011, 106(493):100–108. http://pubs.amstat.org/doi/abs/10.1198/jasa.2011.ap09546 10.1198/jasa.2011.ap09546
 22.
Egesdal M, Fathauer C, Louie K, Neuman J: Statistical and Stochastic Modeling of Gang Rivalries in Los Angeles. SIAM Undergraduate Research Online 2010, 3: 72–94.
 23.
Hawkes AG: Spectra of some selfexciting and mutually exciting point processes. Biometrika 1971, 58: 83–90. http://biomet.oxfordjournals.org/content/58/1/83.abstract 10.1093/biomet/58.1.83
 24.
Stomakhin A, Short MB, Bertozzi AL: Reconstruction of missing data in social networks based on temporal patterns of interactions. Inverse Problems 2011, 27(11):115013. http://stacks.iop.org/0266–5611/27/i=11/a=115013 10.1088/02665611/27/11/115013
 25.
Marsan D, Lengline O: Extending earthquakes’ reach through cascading. Science 2008, 319(5866):1076. 10.1126/science.1148783
 26.
Marsan D, Lengliné O: A new estimation of the decay of aftershock density with distance to the mainshock. Journal of Geophysical Research 2010, 115(B9):B09302.
 27.
Sornette D, Utkin S: Limits of declustering methods for disentangling exogenous from endogenous events in time series with foreshocks, main shocks, and aftershocks. Physical Review E 2009, 79(6):61110.
 28.
Lewis E, Mohler G: A Nonparametric EM Algorithm for Multiscale Hawkes Processes. Preprint 2011.
 29.
Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 1977, 39: 1–38. [With discussion] [With discussion]
 30.
Brantingham PJ, Brantingham PL: Environmental Criminology. Sage Publications, Inc, Beverly Hills, CA; 1981.
 31.
Herbert DT: Geography of Urban Crime. Longman Inc, London; 1982.
 32.
Eck JE, Chainey S, Leitner JGCM, Wilson RE: Mapping Crime: Understanding Hot Spots, National Institute of Justice. 2005, 1–71. http://www.ojp.usdoj.gov/nij
 33.
Brantingham PL, Brantingham PJ: Criminality of place. European Journal on Criminal Policy and Research 1995, 3: 5–26. [10.1007/BF02242925] http://dx.doi.org/10.1007/BF02242925.
 34.
Brantingham PJ, Brantingham PL: Environment, routine situation: Toward a pattern theory of crime. Advances in Criminological Theory 1993, 5: 259–294.
 35.
Cohen LE, Felson M: Social Change and Crime Rate Trends: A Routine Activity Approach. American Sociological Review 1979, 44(4):588–608. http://www.jstor.org/stable/2094589 10.2307/2094589
 36.
Block R: Gang Activity and Overall Levels of Crime: A New Mapping Tool for Defining Areas of Gang Activity Using Police Records. Journal of Quantitative Criminology 2000, 16: 369–383. http://dx.doi.org/10.1023/A:1007579007011. [10.1023/A:1007579007011]
 37.
O’Leary M: Modeling Criminal Distance Decay. Cityscape: A Journal of Policy Development and Research 2011, 13(3):161–198.
 38.
Tita G, Cohen J, Engberg J: An Ecological Study of the Location of Gang “Set Space”. Soc. Probl 2005, 52(2):272–299. 10.1525/sp.2005.52.2.272
 39.
Nichols K, Schoenberg F, Keeley J, Bray A, Diez D: The application of prototype point processes for the summary and description of California wildfires. Journal of Time Series Analysis 2011, 32(4):420–429. 10.1111/j.14679892.2011.00734.x
 40.
Lewis E, Mohler G, Brantingham PJ, Bertozzi A: Selfexciting point process of Insurgency in Iraq. Security Journal 2011, 25: 0955–1662.
Acknowledgements
This work was supported by NSF grant DMS0968309, ONR grant N000141010221, ARO grants W911NF1010472 and W911NF1110332, and AFOSR MURI grant FA95501010569. Further, we would like to thank Martin Short, Jeff Brantingham, and George Mohler for their helpful comments and conversations.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Received
Accepted
Published
DOI
Keywords
 Inferring incomplete data
 Social networks
 Gang rivalries
 Hawkes process
 Selfexciting point processes