 Research
 Open Access
 Published:
The spatiotemporal modeling for criminal incidents
Security Informatics volume 1, Article number: 2 (2012)
Abstract
Law enforcement agencies monitor criminal incidents. With additional geographic and demographic data, law enforcement analysts look for spatiotemporal patterns in these incidents in order to predict future criminal activity. When done correctly these predictions can inform actions that can improve security and reduce the impact of crime. Effective prediction requires the development of models that can find and incorporate the important associative and causative variables available in the data. This paper describes a new approach that uses spatiotemporal generalized additive models (STGAMs) to discover underlying factors related to crimes and predict future incidents. In addition, the paper shows extensions of the STGAM approach to produce local spatiotemporal generalized additive models (LSTGAMs). These local models can better predict criminal incidents conditioned on regions. Both models can fully utilize a variety of data types, such as spatial, temporal, geographic, and demographic data, to make predictions. We describe how to estimate the parameters for STGAM using iteratively reweighted least squares and maximum likelihood and show that the resulting estimates provide for model interpretability. This paper also discusses methods to generate regions for LSTGAM. Lastly the paper discusses the evaluation of LSTGAM and STGAM with actual criminal incident data from Charlottesville, Virginia. The evaluation results show that both models from this new approach outperform previous spatial models in predicting future criminal incidents.
1 Introduction
Law enforcement agencies have the need to model the spatiotemporal patterns of crimes. With a model of criminal incidents, they can study the causality of crimes and predict the locations and time of future criminal activity. If the model can predict future crimes accurately, law enforcement can deploy resources to improve security and reduce threats. Typical actions taken by law enforcement include walking and driving patrols, surveillance systems, and neighborhood watch programs.
Many types of data are available to assist building such models. Law enforcement agencies in the United States usually monitor criminal incidents as they occur. For example, they have locations and times of criminal incidents, as well as victim and perpetrator information. In addition to criminal incident data, most agencies can also acquire spatial information from geographic information systems (GIS) and demographic and economic data from the census.
Several techniques and models have been developed to meet the need for predictive policing with available data. One of the most popular methods is the spatial hot spot model. In the hot spot model [1], current criminal incident data are collected and clustered over space. The locations of such clusters are socalled hot spots. The model assumes the current crime clusters to persist over the forecast horizon. Future criminal incidents are predicted to occur in these same areas. Methods to generate hot spots include spatial histograms, clustering, mixture models, scan statistics, and density estimation. The hot spot model only utilizes criminal incident data, such as types of crimes, locations and time of criminal incidents. It only shows the current patterns of crimes without the insight into the relationship between crimes and environment over time. As the local environment changes, the hot spot model cannot indicate the changes of crime patterns.
To address this problem, more sophisticated statistical models using both criminal data and environmental data have been built by researchers. Liu and Brown [2] applied a point pattern density model to criminal incidents. The spatial density of criminal incidents was assumed to be conditioned on features associated with locations. These features included geographic features, such as the distances to the nearest interstate highways, demographic features and consumer expenditure features. Xue and Brown [3] and Smith and Brown [4] developed a spatial choice model. They assumed criminals made choices to pick places that could be modeled by random utility maximization. This utility maximization is over all alternatives, where the utility is defined by the gain from crimes and the risk of being caught. Brown, Dalton, and Hoyle [5] then discussed a method that uses generalized linear models (GLM) to compute the risk over a territory. They first partitioned the space into grids. Each grid was associated with a response indicating whether incidents happened and features about the grid. Then, a spatial GLM was built with all grids. They applied the spatial GLM to predict terrorist events. Results showed the spatial GLM had better prediction performance than the density models. Rodrigues and Diggle [6] combined point process models and generalized additive models (GAM) to build a semiparametric point source model. In their model, features affected the risk nonlinearly. They applied the model to study the effect of installed security cameras on crimes.
None of the above models directly incorporate the temporal information of criminal incidents. For instance, Liu and Brown use Bayesian to model building that can include a variety of time series but no specific approach it recommended or tested. Other models usually estimate different parameter sets based upon coarse divisions of time. For example, these models use criminal incidents that happened within the most recent year to generate hot spots for this year. Another intuitive method discussed by Ivaha, AlMadfai, Higgs, and Ware [7] first models the temporal behaviors of crimes with time series models and then models the spatial behaviors given the predicted number of incidents at a certain time. However, this approach does not model interactions between space and time. Recent research has developed spatiotemporal approaches that apply Generalized Additive Models (GAM) to combine spatial, temporal, and other (e.g., demographic) features for prediction. This approach with GAM has performed well in predicting continuous response variables such as matter concentrations [8]. However, to date no similar approach has been attempted for criminal incident prediction.
In this paper, we describe a new approach to criminal incident prediction using spatiotemporal generalized additive model (STGAM) and local spatiotemporal generalized additive model (LSTGAM). The paper is organized as follows. Section 2 defines the problem of criminal incident modeling formally, describes STGAM and LSTGAM in detail, and discusses how to estimate parameters as well as evaluate spatial prediction performance. Section 3 applies both STGAM and LSTGAM to model the breaking and entering incidents in Charlottesville, VA and evaluates the prediction performance. Finally, section 4 gives conclusion and suggestions on future work.
2 The spatiotemporal modeling of criminal incidents
In this section, we first define the problem of criminal incident prediction formally. Then we describe our two models to model and predict criminal incidents. After that, we discuss how to estimate the model parameters. In the end of this section, we introduce a method to evaluate the performance of spatiotemporal predictions.
2.1 Definition of the problem of criminal incident prediction
As discussed in section 1, we have various features about an area where criminal incidents have occurred. We want to model the patterns of criminal incidents with these features and apply the model to predict the locations and times of future criminal incidents. Equivalently, we need to model the probability of a criminal incident happening at a certain location and time given all the features associated with this location and time.
Mathematically, we have an area of interest $S\subset {\mathbb{R}}^{2}$, a time period $T\subset {\mathbb{R}}^{+}$, and features $\left\{{X}_{s,t}s\in S,t\in T\right\}$ associated with S and T. To represent the area S and the time period T, we can partition S into grids {s_{ i }} and T into time intervals {t_{ j }}, where $\cup {s}_{i}=S$, $\cup {t}_{j}=T$, and $i,j\in {N}^{+}$ are indices. The features associated with a grid s_{ i } and a time interval t_{ j } can be represented by a vector ${X}_{{s}_{i},{t}_{j}}$.
Our objective is to find a probability function:
and a decision function:
such that:
where $inc{i}_{{s}_{i},{t}_{j}}=1$ means at least one incident happens at the grid s_{ i } and time t_{ j }; $inc{i}_{{s}_{i},{t}_{j}}=0$ means no incident happens at the grid s_{ i } and time t_{ j }; $p\left(inc{i}_{{s}_{i},{t}_{j}}=1\right)$ is the probability that at least one incident happens at the location s_{ i } and time t_{ j }; T* is a set of time intervals in the future; L is a loss function; N_{ S } is the total number of grids; N_{T*} is the total number of time intervals; I(·) is an indicator function; w_{0}, w_{1} are weights of different types of errors; and ϵ is the tolerance threshold.
The major difficulty of this problem is to find an accurate probability function $p\left(inc{i}_{{s}_{i},{t}_{j}}=1{X}_{{s}_{i},{t}_{j}}\right)$, such that it has high values for locations where criminal incidents will happen and low values for locations where criminal incidents will not happen for any future time intervals. Given a good probability function, the law enforcement agencies can easily choose their own decision functions based on resources and risk preferences. For example, they can choose a cutoff value p* = 0.8 to classify areas with predicted probabilities higher than p* as high risk areas.
In this paper, we focus on the development of $p\left(inc{i}_{{s}_{i},{t}_{j}}=1{X}_{{s}_{i},{t}_{j}}\right)$.
2.2 The spatiotemporal generalized additive model (STGAM)
To model $p\left(inc{i}_{{s}_{i},{t}_{j}}=1{X}_{{s}_{i},{t}_{j}}\right)$, we developed a spatiotemporal generalized additive model (STGAM) [9]. Additive models usually perform well for problems having many predictors and provide interpretable results. As shown in [5], the spatial generalized linear model (GLM) had better predictability than other probability models. In STGAM, we used the generalized additive model (GAM) [10] instead of GLM, because GAM is more exible in the treatment of nonlinearity than GLM. GAM assumes additivity between predictors, but allows for local nonlinearity in each predictor. This exibility is helpful for modeling criminal incidents. For example, criminals may prefer to burgle richer houses. However, they might not choose expensive houses because these houses often have security systems. To include temporal information of previous criminal incidents, the STGAM borrowed the idea of the binary timeseriescrosssectional data (BTSCS) model from Beck, Katz, and Tucker [11]. In the BTSCS model, a dummy variable is used to indicate when the last incident happened.
Our STGAM has the following form:
where $\mathrm{logit}\left(p\right)=\mathrm{log}\left(\frac{p}{1p}\right)$ is a logit function; N is the total number of features; ${x}_{n,{s}_{i},{t}_{j}}$ is the n^{th} feature associated with location s_{ i } and time ${t}_{j}\left({X}_{{s}_{i},{t}_{j}}=\left({x}_{1,{s}_{i},{t}_{j}},\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}},\phantom{\rule{2.77695pt}{0ex}}{x}_{N,{s}_{i},{t}_{j}}\right)\right);{f}_{n}$ is the smooth function of the n^{th} feature to be estimated from data; and ${\kappa}_{{s}_{i},{t}_{j}}$ is the dummy variable indicating the length of the continuous zeros (no accident happens) that precede the current observation at location s_{ i } and time t_{ j }. An example of the values of ${\kappa}_{{s}_{i},{t}_{j}}$ is shown in table 1. Notice that ${\kappa}_{{s}_{i},{t}_{j}}$ is a dummy variable, and its values are factors instead of integers.
With features and past criminal incidents, we can estimate the smooth functions ${f}_{n},\left(n=1,\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}},\phantom{\rule{2.77695pt}{0ex}}N\right)$ and parameters of ${\kappa}_{{s}_{i},{t}_{j}}\left({\kappa}_{{s}_{i},{t}_{j}}=1,\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}},\phantom{\rule{2.77695pt}{0ex}}K\right)$ for the above model. Here, K is the maximum length of the continuous zeros considered. For example, if the last incident happened before K time intervals at location s_{ i } and time t_{ j }, then ${\kappa}_{{s}_{i},{t}_{j}}=K$.
2.3 The local spatiotemporal generalized additive model (LSTGAM)
STGAM assumes that all grids in the area S have the same underlying pattern. The probability of criminal incidents happened in S is computed by a single equation. In reality, it is possible to have multiple regions within S. For example, we have all incident data of a state, including big cities, small towns and rural counties. Different types of regions might have different criminal patterns. In addition, the same feature might impact high risk areas (such as crime hot spots) differently from low risk areas.
To account for this situation, we extend the STGAM to the local spatiotemporal generalized additive model (LSTGAM) as follows:
In the above LSTGAM, equation 5 models the probability of criminal incidents over the whole area S with R regions. Here, R is the total number of regions in S. S_{ r } is the r^{th} region, where $\left\{{S}_{r}{S}_{r}\subset S,\phantom{\rule{2.77695pt}{0ex}}r\in \left\{1,\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}},\phantom{\rule{2.77695pt}{0ex}}R\right\}\right\}$ satisfies ${\cup}_{r}{S}_{r}=S$ and ${S}_{{r}_{i}}\cap {S}_{{r}_{j}}=\mathrm{0\u0338}\phantom{\rule{2.77695pt}{0ex}}\left({r}_{i}\ne {r}_{j}\right)$. I(·) is an indicator function with values of 0 and 1. ${p}_{r}\left(inc{i}_{{s}_{i},{t}_{j}}=1\right)$ models the probability of criminal incidents happened within the region r. Equation 6 defines ${p}_{r}\left(inc{i}_{{s}_{i},{t}_{j}}=1\right)$ for each region r. As we can see, it has the same form as equation 4. Notice that there are actually R equations in the form of equation 6 and each one may have different smooth functions f_{ r,n }(·) and parameters of κ.
LSTGAM can be considered as a two stage model. The first stage is to decide which region a grid s_{ i } belongs to. The second stage is to build different STGAM to model the probability of criminal incidents for each region. Clearly, STGAM is the special case of LSTGAM with R = 1.
2.4 Estimation of STGAM and LSTGAM
The STGAM has the form of regular GAM. GAM has been studied extensively in many different disciplines. Therefore, it can be estimated efficiently by well developed methods and algorithms. Standard statistical softwares, such as R, Splus and SAS, also have implements to estimate GAM.
We briefly review steps to estimate GAM here. Interested readers can refer to [12, 13] for details.
To estimate the GAM model in equation 4, the smooth function f(x) is first represented by a sum of basis functions:
where b_{ i }(x) is the i^{th} basis function; and β are the unknown parameters to be estimated.
A popular choice of basis functions is the cubic regression spline. The basis functions for this spline include:
where $\left\{{x}_{i}^{*}i\in \left\{1,\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}},\phantom{\rule{2.77695pt}{0ex}}B2\right\}\right\}$ are knots of the spline; and R(x, z) is as follows [13]:
By representing all smooth functions f(x) with basis functions, equation 4 becomes a GLM:
The above GLM can be estimated efficiently by maximizing the penalized likelihood using the penalized iteratively reweighted least squares method (PIRLS) [14].
A possible difficulty with the estimation of STGAM is that the size of training dataset is usually huge and the response variable is a sparse vector. For example, we built a predictive model of criminal incidents for Charlottesville, Virginia (total area is 26.6 km^{2}) using the grid size of 32m × 32m and the time interval of one month [9]. Then, there were 1,062,094 records for a year while about 1,700 records had the response of 1. It is time consuming to evaluate parameters using all the records. Therefore, we used subsampling. To generate a sample from the records, we included all the records with the response of 1 and a random sample from the records with the response of 0. Based on the analysis in [15], the effect from this biased sampling can be approximately corrected by adding an offset term log(sample size/total number of records) in the estimation process. Thus, the subsampling technique can reduce the size of training set and save estimation time. However, this method introduces stochastic effects to parameter estimates. If possible, we suggest to use all of training data to estimate parameters.
For LSTGAM, we can use the above method to estimate equation 6. In addition, we need to define or estimate regions {S_{ r }}. {S_{ r }} can be defined by domain knowledges. For example, if law enforcement agencies believe that criminal patterns are different in different cities, each S_{ r } can be a different city. When no such knowledge is available, we can estimate {S_{ r }} with features {X_{s,}.} and past incidents {inci_{ s,t }}. If we assume the region with high risk has different underlying patterns from the low risk region, we can use the following method to generate {S_{ r }} based on the incident density:

1.
Estimate the incident density over the whole area $S:\left\{{d}_{s}{d}_{s}\in \left[0,1\right],\phantom{\rule{2.77695pt}{0ex}}s\in S\right\}$;

2.
Pick points: $\left\{{d}_{1}^{*},\phantom{\rule{2.77695pt}{0ex}}\cdots \phantom{\rule{0.3em}{0ex}},\phantom{\rule{2.77695pt}{0ex}}{d}_{R1}^{*}{d}_{0}^{*}<{d}_{1}^{*}<{d}_{2}^{*}<\cdots <{d}_{R1}^{*}<{d}_{R}^{*}\right\}$, where ${d}_{0}^{*}=0$ and ${d}_{R}^{*}=1$;

3.
{S_{ r }} based on the incident density are: ${S}_{r}=\left\{{s}_{i}{d}_{r1}^{*}\le {d}_{{s}_{i}}<{d}_{r}^{*},\phantom{\rule{2.77695pt}{0ex}}{s}_{i}\in S\right\}$
2.5 A method to evaluate models: HRP vs. TIP
As discussed in section 2.1, we want to minimize the loss function L defined in equation 3. The first part of the function ${\sum}_{{s}_{i}\in S,{t}_{j}\in {T}^{*}}{w}_{0}\cdot I\left({\delta}_{{s}_{i},{t}_{j}}=0inc{i}_{{s}_{i},{t}_{j}}=1\right)$ is the weighted sum of the times of incorrect predictions for grids where criminal incidents actually happen. The second part of the function ${\sum}_{{s}_{i}\in S,{t}_{j}\in {T}^{*}}{w}_{1}\cdot I\left({\delta}_{{s}_{i},{t}_{j}}=1inc{i}_{{s}_{i},{t}_{j}}=0\right)$ is the weighted sum of times of incorrect predictions for grids where no criminal incident happens. To minimize the first part, the probability model should predict high probabilities for the locations where incidents actually happen. To minimize the second part, the total area of the locations with high probabilities should be small at a given time because of the sparseness of criminal incidents over the whole area S.
Both of the two criteria are important. The first criterion means the model should not miss a high risk area so that law enforcement agencies can know all of future locations of crimes. The second criterion is important, because the police resources are limited and only a part of area can be patrolled at a given time. With a good model, they can better allocate limited resources to help prevent crimes. Based on these criteria, we proposed the HRP vs. TIP method to evaluate the performance of spatial predictions at a given time [9].
To measure the performance of a model at time t_{ j }, the method first computes:
where $\left\{p\left(inc{i}_{{s}_{i},{t}_{j}}=1\right){s}_{i}\in S,\phantom{\rule{2.77695pt}{0ex}}{t}_{i}\in {T}^{*}\right\}$ are predictions from the model; · is the size of a set; and θ is a threshold (θ ∈ [0, 1]). Here, HRP_{ θ } represents the percentage of high risk area predicted by models; and TIP_{ θ } represents the percentage of incidents (from test set) happened within the high risk area given θ.
Two vectors of HRP and TIP can be computed with different thresholds {θ_{ i }θ_{ i } ∈ [0, 1]}. Then, TIP is plotted against HRP. The result plot looks like the receiver operating characteristic (ROC) curve [16]. Ideally, we hope as many as incidents happen within the high risk area with a given size. Therefore, the curve from a good model should be close to the upper left corner. Examples of this plot are shown in Figures 3, 4, 5, 6. Similar to ROC analysis, we can use area under the curve (AUC) to compare the performance of different models by a single score. Because a good model has the curve close to the upper left corner, AUC of a good model should be close to 1. It is easy to see that a random guess model has AUC= 0.5. Therefore, AUC of a bad model should be close to or less than 0.5.
3 Model evaluation: predicting criminal incidents in Charlottesville, Virginia
This section shows the application of our STGAM and LSTGAM to breaking and entering incidents in Charlottesville, Virginia. These two models, along with the spatial GLM [5] and the hot spot model, are evaluated and compared with the real incident data based on their performance of prediction.
3.1 Data
We used three data sets for this study. The first data set includes breaking and entering incidents in Charlottesville, Virginia from April 2001 to February 2005. In total, there were 1,795 incidents (58 incidents without the exact coordinates were excluded). Each incident in this study had the coordinates of the incident and the time of when it happened. The second data set was the geographic information of the city in the form of GIS layers, such as locations of roads, interstate highways, small businesses and schools. The third data set had the demographic data of Charlottesville measured in census block groups, including population, median values of all houses, races, marriages and so on. Figure 1 shows a small number of geographic features of Charlottesville with all the breaking and entering incidents.
3.2 Model construction and estimation
To model the criminal incidents in Charlottesville, we first partitioned the city into spatial grids with the size of 32m × 32m. The total number of grids covering the area was 23,089. We used the time interval with the length of one month and there were 46 months in the data set. Therefore, we had 1,062,094 (= 23, 089 × 46) records. Each record had a response variable indicating whether at least one incident happened within the grid and the time period. There were also two types of features associated with each record as explanatory variables. The first type was the distance feature. We calculated the shortest distance between the centroid of a grid and a certain geographic landmark, such as the distance to the nearest road. This calculation was done by a toolkit programmed in Visual C# and PostGIS [17]. The second type was the demographic feature, such as the population, marriage status, house values. Because we used the demographic data measured in census block groups, the demographic features of a grid actually measured the properties of the neighborhood where the grid was located. There were 14 distance features and 20 demographic features. For this study, we only kept the most important 11 features out of 34 features as did in our previous study [9]. Those 11 features were selected by the stepwise selection of GLM. Table 2 shows the description of the features for modeling.
To test our models, we kept the incident data that happened in the last 12 months as the test data. Thus, the training data set included incidents between April 2001 and February 2004. The test data set included incidents between March 2004 and February 2005.
We first built STGAM and LSTGAM as described in section 2. To build STGAM and LSTGAM, we chose the parameter K = 13, which means incidents happened before one year would not be considered. To build LSTGAM, we defined two regions, S_{1} and S_{2}, using the method discussed in section 2.4. The high risk region S_{2} included 10% of the area with the highest incident density. The low risk region S_{1} included the other 90% of the area. We used the package "mgcv" in R [18] to estimate the smooth functions and parameters in STGAM and LSTGAM. This package implemented the estimation method of GAM described in section 2.4. To avoid stochastic effects from subsampling, we used all the training data to estimate models.
To compare our models with the previous work, we also built a spatial GLM and a hot spot model. The spatial GLM used the same features in table 2 and parameters were estimated with all the training data. The hot spot model estimated the density with all the incidents in the training data set using Gaussian kernels. Both models were estimated by the software R.
3.3 Results
3.3.1 Prediction performance
We applied STGAM, LSTGAM, the spatial GLM and the hot spot model to predict the probability of criminal incidents in Charlottesville from March 2004 to February 2005 using the test data set. Then, we compared those four models with the metric described in section 2.5.
Figure 2 shows the AUC of 12 month predictions using the four models. The larger the AUC value is, the better the model predicted. Apparently our two models, STGAM and LSTGAM, performed better than the previous work, the spatial GLM and the hot spot model. The performance of LSTGAM was a little better than the performance of STGAM. To test whether the difference between any two curves in Figure 2 was significant, we performed paired Wilcoxon significance tests on groups of AUC values. Table 3 shows the test results. Small pvalues mean the differences are significant (p < 0.05). We can see that the difference between any two curves was significant. Therefore, LSTGAM was significantly better than STGAM. Both LSTGAM and STGAM were significantly better than the spatial GLM and the hot spot model.
Figures 3, 4, 5, 6 show HRP vs. TIP plots for the predictions in March 2004, July 2004, November 2004 and February 2005. In the plots, HRP and TIP are the percentage of high risk area and the percentage of incidents happened within the high risk area respectively, as defined in section 2.5. From these plots, we can confirm that STGAM and LSTGAM had better prediction performance in those four months. Especially, STGAM and LSTGAM can capture about half of the real incidents happened with a very small high risk area in each case. For example, about 50% of real incidents happened within the top 2% area with the highest risk predicted from LSTGAM in July 2004.The police department can use this prediction to patrol more efficiently.
Predictions from our models were probabilities on spatial grids. This type of data can be visualized easily with available GIS softwares. Figure 7 shows the heat map of the prediction from STGAM in February 2005 generated by Quantum GIS [19]. We used a kernel density with 3 standard deviation to smooth the prediction. On this map, red color means high predicted probabilities while the light blue color means low predicted probabilities. The red stars are the real criminal incidents happened in February 2005. As we can see from this map, most of the real criminal incidents were located within the high probability area.
3.3.2 Model interpretation
Table 2 shows feature significance in different models. As we can see, the temporal dummy variable κ was significant in both STGAM and LSTGAM. It was helpful to explain the variance of criminal incident probability. All selected features were significant in at least one model, except the feature widowed. Features roads all dist, small businesses, and divorced were significant in all the models. Comparing features in LSTGAM in region S_{1} and S_{2}, we can see the different regions had the different sets of significant features. For example, median_val was important to explain the variance in the low risk area S_{1}, but not in the high risk area S_{2}.
Figures 8, 9 and 10 show the estimated parameters and smooth functions of STGAM, LSTGAM in region S_{1} and LSTGAM in region S_{2} respectively. Only significant features were plotted. In the figures, solid lines represent the estimated smooth functions while the dotted lines are 95% confidence intervals. Clearly, we can see the nonlinear effects of features on the crime probability. Based on Figure 8, locations with no incident happened in the previous year were less likely to have a new incident. Out of the locations where incidents happened in the previous year, the locations with incidents just happened in the past half year were more likely to have a new incident. Incidents were more likely to happen at locations closer to schools, roads, and small businesses. As we expected, the neighborhoods with the least and the most expensive median house value were less likely to be broken and entered. The neighborhood with the median house value of about $60,000 was the most likely to have such incidents. The number of males in the neighborhood also impacted crimes. It was more likely to have incidents in the neighborhoods with less males, but this effect was not significant after the neighborhoods had more than 350 males. In addition, breaking and entering incidents were less likely to happen in the neighborhoods with less divorced and more owner occupied houses. Figures 9 and 10 show the different patterns in the different regions. In the low risk region S_{1}, features had similar effects on crimes as in the STGAM, but the number of significant features was less. In the high risk region S_{2}, locations with no incident happened in the previous year were still less likely to have a new incident. However, out of the locations where incidents happened in the previous two months, the locations with incidents just happened in the past month less likely to have a new incident in the following month. Different from the low risk region, incidents were more likely to happen in the neighborhoods with less divorced and more owner occupied houses.
4 Conclusions
In this paper, we describe the spatiotemporal generalized additive model (STGAM) and show its extension into the local spatiotemporal generalized additive model (LSTGAM) to predict criminal incidents. Both STGAM and LSTGAM can fully utilize many different types of data, such as spatial and temporal data, geographic data, demographic data, etc. STGAM can be easily estimated by available algorithms and has good interpretability. LSTGAM models the different criminal patterns based on different regions, and thus is more flexible than STGAM. We also showed that STGAM is a special case of LSTGAM. Methods to estimate LSTGAM are described in Section 2.4.
To evaluate this new approach we applied both models to predict breaking and entering incidents in Charlottesville, Virginia. Based on our assessments with the real criminal incident data, both models can predict future incidents accurately. Results showed that our two models outperformed the previous spatial GLM and the hot spot model. Compared with STGAM, LSTGAM had better performance in prediction. Law enforcement agencies can use STGAM and LSTGAM to model criminal incidents, predict future incidents and prevent crimes. In addition, those two models can be applied to other areas with the need to study the spatiotemporal patterns and predict future incidents. For example, we can use STGAM and LSTGAM to predict terrorist events and car accidents.
Several methods can be investigated to improve both models. To select features for modeling, more sophisticated methods like penalized regression methods [10] can be incorporated into STGAM to choose features automatically. For LSTGAM, better methods can be developed to generate optimal regions. For example, regions can be generated to minimize the loss function discussed in section 2.1. Our previous study [20] showed narratives provided important information about crimes. We can use text mining models to extract text features from narratives and add these features in STGAM and LSTGAM.
Authors' information
X. Wang is a PhD candidate in the Department of Systems and Information Engineering at the University of Virginia. His research interests include statistical learning with high dimensional data, spatiotemporal modeling and text mining. He has applied his research to solve real problems, such as criminal incident prediction and asymmetric threat tracker. He won the best paper honorable mention award at the 2011 IEEE International Conference on Intelligence and Security Informatics. He holds the MS degree in Systems and Information Engineering from the University of Virginia, and the BS degree in Management Information Systems from the Central University of Finance and Economics, China.
D. Brown is William Stansfield Calcott Professor of Engineering and Applied Science and Director of the Applied Predictive Technology Laboratory at the University of Virginia. Prior to joining the University of Virginia, Dr. Brown served as an officer in the U.S. Army and later worked at Vector Research, Inc. on projects in medical information processing and multisensor surveillance systems. He is President of Commonwealth Computer Research, Inc. This company provides products and services in prediction analytics for numerous private and governmental organizations. He served on the National Research Council (NRC) Committee on Transportation Security also on the National Academy of Sciences panel on High Performance Computing and Crisis Management, as well as, the NRC Committee on Surface Transportation Infrastructure Security. Dr. Brown is a Fellow of the National Institute for Aerospace. He is a past member of the Joint Directors of Laboratories Group on Data Fusion and a former Fellow at the National Institute of Justice Crime Mapping Research Center. Dr. Brown is the recipient of the IEEE Joseph Wohl Career Achievement Award for his work in systems engineering and data fusion. He is also the recipient of the IEEE Norbert Wiener Award for Outstanding Research in the areas of systems engineering, data fusion, and information analysis. He received the IEEE Intelligence and Security Informatics Award for outstanding research achievements information for security, law enforcement, and intelligence. The Governor of Virginia presented him with the Governor's Technology Award for his achievements in providing the technology to enable rapid crime analysis by local law enforcement agencies. He has also received an IEEE Outstanding Contribution Award and the IEEE Millennium Medal. Dr. Brown is a Fellow of the IEEE, the past EditorinChief of the IEEE Transaction on Systems, Man, and Cybernetics, Part A: Systems and Humans, and a past President of the IEEE Systems, Man, and Cybernetics Society.
References
 1.
Block C: STAC hotspot areas: A statistical tool for law enforcement decisions. Crime analysis through computer mapping. Washington, DC: Police Executive Research Forum, Citeseer 1995, 15–32.
 2.
Liu H, Brown D: Criminal incident prediction using a pointpatternbased density model. International journal of forecasting 2003,19(4):603–622. 10.1016/S01692070(03)000943
 3.
Xue Y, Brown D: Spatial analysis with preference specification of latent decision makers for criminal event prediction. Decision support systems 2006,41(3):560–573. 10.1016/j.dss.2004.06.007
 4.
Smith MA, Brown DE: Application of Discrete Choice Analysis to Attack Point Patterns. Information Systems and eBusiness Management 2007,5(3):255–274. 10.1007/s1025700700451
 5.
Brown D, Dalton J, Hoyle H: Spatial Forecast Methods for Terrorist Events in Urban Environments. In Intelligence and security informatics: Second Symposium on Intelligence and Security Informatics, ISI 2004, June 10–11, 2004; Tucson, AZ, USA. SpringerVerlag New York Inc; 2004:426.
 6.
Rodrigues A, Diggle P, Assuncao R: Semiparametric approach to point source modelling in epidemiology and criminology. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2010,59(3):533–542. 10.1111/j.14679876.2009.00708.x
 7.
Ivaha C, AlMadfai H, Higgs G, Ware A: The Dynamic Spatial Disaggregation Approach: A SpatioTemporal Modelling of Crime. Proceedings of the World Congress on Engineering, Volume 2, Citeseer 2007.
 8.
Paciorek C, Yanosky J, Puett R, Laden F, Suh H: Practical largescale spatiotemporal modeling of particulate matter concentrations. Ann Appl Stat 2009, 3: 369–396.
 9.
Wang X, Brown D: The spatiotemporal generalized additive model for criminal incidents. Proceedings of the IEEE International Conference on Intelligence and Security Informatics: 9–12 July 2011; Beijing, China, IEEE 2011.
 10.
Hastie T, Tibshirani R, Friedman J: The elements of statistical learning: data mining, inference, and prediction. Springer Verlag; 2009.
 11.
Beck N, Katz J, Tucker R: Taking time seriously: Timeseriescrosssection analysis with a binary dependent variable. American Journal of Political Science 1998,42(4):1260–1288. 10.2307/2991857
 12.
Hastie T, Tibshirani R: Generalized additive models. Chapman & Hall/CRC; 1990.
 13.
Wood S: Generalized additive models: an introduction with R, Volume 66. CRC Press; 2006.
 14.
Wood S: Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2011.
 15.
King G, Zeng L: Logistic regression in rare events data. Political analysis 2001,9(2):137. 10.1093/oxfordjournals.pan.a004868
 16.
Fawcett T: An introduction to ROC analysis. Pattern recognition letters 2006,27(8):861–874. 10.1016/j.patrec.2005.10.010
 17.
PostGIS[http://postgis.refractions.net/]
 18.
R Development Core Team:R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria; 2011. [ISBN 3–900051–07–0] [http://www.Rproject.org/]
 19.
Quantum GIS Development Team:Quantum GIS Geographic Information System. Open Source Geospatial Foundation; 2009. [http://qgis.osgeo.org]
 20.
Wang X, Brown D, Conklin J: Crime Incident Association with Consideration of Narrative Information. Processings of the IEEE Systems and Information Engineering Design Symposium: April 2007; Charlottesville, VA, IEEE 2007, 1–4.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
XW developed the models in this paper, tested them, and drafted the manuscript. DB developed the models and revised the manuscript. Both authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Received
Accepted
Published
DOI
Keywords
 spatiotemporal modeling
 generalized additive model
 criminal forecasting