The proposed Estimation & Score Algorithm can be broken into three basic stages: initialization, parameter estimation, and updating the weights. This method is succinctly described in Figure 2.

### Initialization

For this paper, there were two ways of initializing the Estimate & Score Algorithm. The first is used to infer rivalry affiliation given field data. After importing the data, the unknown events are identified and placed into each of the of the *K* processes. The weights, *S*_{i,k}, must also be initialized. If the event is known, then *S*_{i,k}=1. If the event is unknown then {S}_{i,k}=\frac{1}{K}.

An alternate initialization utilizes simulated data in order to test the components of the Estimate & Score Algorithm. In this case, data is generated from *K* independent Hawkes processes with given *μ*_{
k
}, *α*_{
k
}, and *ω*_{
k
}. From these data, choose *N* events at random from the network to mark as unknown. Place these *N* unknown events into each of the other processes. Initialize the weights such that for known events *S*_{i,k}=1 and for unknown events *S*_{i,k}=1/*K*. This initialization process is used in this paper to test the method and produce the results in Section “Results”.

### Parameter Estimation

In the presence of no unknown events, there are both parametric [12] and nonparametric [25]-[28] ways to model the underlying stochastic process on each edge of the social network. For this work, we chose a parametric form for the triggering density to validate the model but the results could easily be extended to the nonparametric case. We note that, as is usual with nonparametric estimates, speed would be compromised for the sake of flexibility.

For this paper, the data is assumed to be a realization of Equation 1, where the parameters are estimated using a method similar to the Expectation Maximization (EM) algorithm [29]. An EM-like approach is taken because of the branching structure present in a Hawkes process. In such a process each event can be associated with a background or response event. However, given a realization from this process it is not immediately obvious whether an event is a background or response event. We can view this information as a hidden variable that we must estimate. In this way, every event in each of the *K* processes is assigned a probability {P}_{i,j}^{k}. The probability that event *i* is a background event is denoted {P}_{i,i}^{k}, and probability that event *i* caused event *j* is denoted {P}_{i,j}^{k}. This assumes that *t*_{
i
}<*t*_{
j
}. From this EM estimation, the approximation for each of the variables is altered to include the weights for the unknown events. In fact, in the case where all the events are known, the estimation formulas are the same. This section derives the EM estimates when in the presence of incomplete data.

The classical log-likelihood function {\widehat{\ell}}_{k}\left({H}_{\tau ,k}\right|{\mu}_{k},{\alpha}_{k},{\omega}_{k}) for a general point process with a fixed window [0,*T*] is

\begin{array}{l}\phantom{\rule{-15.0pt}{0ex}}{\widehat{\ell}}_{k}\left({H}_{\tau ,k}\right|{\mu}_{k},{\alpha}_{k},{\omega}_{k})=\sum _{i=1}^{{M}_{k}}{\lambda}_{k}({t}_{i}\left|{H}_{\tau ,k}\right)-\underset{0}{\overset{T}{\int}}{\lambda}_{k}\left(t\right|{H}_{\tau ,k})\mathrm{dt.}\end{array}

(2)

Incorporating the branching structure into the log-likelihood function, the event association is added as a random variable, *χ*_{i,j} such that

\begin{array}{l}{\chi}_{i,j}=\left\{\begin{array}{ll}1& \text{if event}i\text{caused event}j\text{and}i\ne j\\ 1& \text{if event}i\text{is a background event and}i=j\\ 0& \text{else}\end{array}\right..\end{array}

(3)

This branching allows us to separate those events associated with the background *μ*_{
k
} and the response g\left(t\right)={\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}t}. This leads to the altered log-likelihood function

\begin{array}{lcl}{\ell}_{k}\left({H}_{\tau ,k}\right|{\mu}_{k},{\alpha}_{k},{\omega}_{k})& =& \sum _{i=1}^{{M}_{k}}{\chi}_{i,i}log\left({\mu}_{k}\right)-{\int}_{0}^{T}{\mu}_{k}\mathit{\text{dt}}\\ +\sum _{i=1}^{{M}_{k}}\left\{\sum _{j=i+1}^{{M}_{k}}{\chi}_{i,j}log\left({\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}({t}_{j}-{t}_{i})}\right)\right.\\ \phantom{\rule{2em}{0ex}}\phantom{\rule{2em}{0ex}}\left(\right)close="\}">-{\int}_{0}^{T-{t}_{i}}{\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}\left(s\right)}\mathit{\text{ds}}& .\end{array}\n

(4)

Taking the expectation of *ℓ*_{
k
}(*H*_{τ,k}|*μ*_{
k
},*α*_{
k
},*ω*_{
k
}) with respect to *χ*_{i,j}results in

\begin{array}{lcl}{E}_{\chi}\left[{\ell}_{k}\right({H}_{\tau ,k}|{\mu}_{k},{\alpha}_{k},{\omega}_{k})]& =& \sum _{i=1}^{{M}_{k}}{P}_{i,i}^{k}log\left({\mu}_{k}\right)-{\int}_{0}^{T}{\mu}_{k}\mathit{\text{dt}}\\ +\phantom{\rule{0.3em}{0ex}}\sum _{i=1}^{{M}_{k}}\left\{\phantom{\rule{0.3em}{0ex}}\sum _{j=i+1}^{{M}_{k}}{P}_{i,j}^{k}log\left(\phantom{\rule{0.3em}{0ex}}{\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}({t}_{j}-{t}_{i})}\phantom{\rule{0.3em}{0ex}}\right)\right.\\ \phantom{\rule{2em}{0ex}}\phantom{\rule{1em}{0ex}}\left(\right)close="\}">-{\int}_{0}^{T-{t}_{i}}{\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}\left(s\right)}\mathit{\text{ds}}& .\end{array}\n

(5)

In the EM algorithm, the quantity *E*_{
χ
}[*ℓ*_{
k
}(*H*_{τ,k}|*μ*_{
k
},*α*_{
k
},*ω*_{
k
})] is maximized with respect to each of the variables *μ*_{
k
},*α*_{
k
},*ω*_{
k
} given the data *H*_{τ,k}. This leads to the EM estimates

\begin{array}{l}{\mu}_{k}=\frac{\sum _{i=1}^{{M}_{k}}{P}_{i,i}^{k}}{T}\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{\alpha}_{k}=\frac{\sum _{i<j}^{{M}_{k}}{P}_{i,j}^{k}}{{M}_{k}-\sum _{i=1}^{{M}_{k}}{e}^{-{\omega}_{k}(T-{t}_{i})}}\end{array}

(6)

\begin{array}{l}{\omega}_{k}=\frac{\sum _{i<j}^{{M}_{k}}{P}_{i,j}^{k}}{\sum _{i<j}({t}_{j}-{t}_{i}){P}_{i,j}^{k}+{\alpha}_{k}\sum _{i=1}^{{M}_{k}}(T-{t}_{i}){e}^{-{\omega}_{k}(T-{t}_{i})}}.\end{array}

(7)

Where {P}_{i,j}^{k} is defined by

\begin{array}{l}{P}_{i,j}^{k}=\frac{{\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}({t}_{j}-{t}_{i})}}{{\lambda}_{k}\left({t}_{i}\right|{H}_{\tau ,k})}\phantom{\rule{2.77626pt}{0ex}},\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{P}_{i,i}^{k}=\frac{{\mu}_{k}}{{\lambda}_{k}\left({t}_{i}\right|{H}_{\tau ,k})},\end{array}

(8)

for *t*_{
i
}<*t*_{
j
}. The EM algorithm then becomes a matter of iterating between estimating the probabilities and the parameters. It has been proven that this algorithm will converge under mild assumptions [29]. Further, Equation 6 adjusts for boundary effects.

In the presence of events with unknown process affiliation in the network, we assign weights to the contribution of each event to the log-likelihood function. Specifically, each of the unknown events in process *k* have a weight *S*_{i,k}, such that ∑_{
k
}*S*_{i,k}=1. For the known events *S*_{i,k}=1. These weights are incorporated for each process via

\begin{array}{ll}{L}_{k}\left({H}_{\tau ,k}\right|& {\mu}_{k},{\alpha}_{k},{\omega}_{k})\phantom{\rule{2em}{0ex}}\\ =\sum _{i=1}^{{M}_{k}}{P}_{i,i}^{k}{S}_{i,k}log\left({\mu}_{k}\right)-{\int}_{0}^{T}{\mu}_{k}\mathit{\text{dt}}\phantom{\rule{2em}{0ex}}\\ +\sum _{i=1}^{{M}_{k}-1}\sum _{j=i+1}^{{M}_{k}}{S}_{i,k}{S}_{j,k}{P}_{i,j}^{k}log\left({\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}({t}_{j}-{t}_{i})}\right)\phantom{\rule{2em}{0ex}}\\ -\sum _{i=1}^{{M}_{k}}{S}_{i,k}{\int}_{0}^{T-{t}_{i}}{\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}\left(s\right)}\mathrm{ds.}\phantom{\rule{2em}{0ex}}\end{array}

(9)

Note that *L*_{
k
}(*H*_{τ,k}|*μ*_{
k
},*α*_{
k
},*ω*_{
k
}) is no longer an EM log likelihood in the presence of unknown data. Maximizing *L*_{
k
}(*H*_{τ,k}|*μ*_{
k
},*α*_{
k
},*ω*_{
k
}) with respect to each of the parameters the estimates become

\begin{array}{ll}{\mu}_{k}& =\frac{\sum _{i=1}^{{M}_{k}}{P}_{i,i}^{k}{S}_{i,k}}{T}\phantom{\rule{2.56804pt}{0ex}},\phantom{\rule{2.56804pt}{0ex}}\phantom{\rule{2.56804pt}{0ex}}\phantom{\rule{2em}{0ex}}\\ {\alpha}_{k}& =\frac{\sum _{i<j}^{{M}_{k}}{P}_{i,j}^{k}{S}_{i,k}{S}_{j,k}}{\sum _{i=1}^{{M}_{k}}{S}_{i,k}-\sum _{i=1}^{{M}_{k}}{S}_{i,k}{e}^{-{\omega}_{k}(T-{t}_{i})}}\phantom{\rule{2em}{0ex}}\end{array}

(10)

\begin{array}{l}{\omega}_{k}=\frac{\sum _{i<j}^{{M}_{k}}{P}_{i,j}^{k}{S}_{i,k}{S}_{j,k}}{\sum _{i<j}({t}_{j}-{t}_{i}){P}_{i,j}^{k}{S}_{i,k}{S}_{j,k}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}{\alpha}_{k}\sum _{i=1}^{{M}_{k}}{S}_{i,k}(T\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}{t}_{i}){e}^{-{\omega}_{k}(T-{t}_{i})}}.\end{array}

(11)

When all of the events are known, i.e. *S*_{i,k}=1 when unknown event *i*, *k* belongs to process *k* and is zero otherwise, these estimates become identical to the EM parameter estimates.

### Updating weights

At the start of the Estimation & Score algorithm all of the weights for the unknown events are *S*_{i,k}=1/*K*. Once the parameters are estimated using the altered EM algorithm described in Equation 11, the weights, *S*_{i,k}, are updated, see Figure 2. Here we present four different score functions and the Stomakhin-Short-Bertozzi method [24], used to define, *q*_{i,k}, the intermediate process affiliation. Each of these score functions synthesize information from different portions of the data set. Given an event early in the data set, a score function that uses future events would be ideal. On the other hand, for later events a score function using previous events is desired. Similar considerations should be made if there are portion of the data with more incomplete data. After all of these intermediate weights, *q*_{i,k}, have been calculated, they are re-normalized as a probability via {S}_{i,k}=\frac{{q}_{i,k}}{\sum _{k}{q}_{i,k}}. For simplicity we consider a response function of the form, {g}_{k}\left(t\right)={\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}\left(t\right)}.

#### Ratio Score Function

The *Ratio* score function considers the ratio of the background rate *μ*_{
k
} and the sum of all the future events, _{∑i<j}*g*_{
k
}(*t*_{
j
}−*t*_{
i
}). Mathematically the score is determined by

\begin{array}{l}{q}_{i,k}^{\mathit{\text{Ratio}}}=\frac{\sum _{i<j}{g}_{k}({t}_{j}-{t}_{i})}{{\mu}_{k}\left({t}_{i}\right)}.\end{array}

(12)

#### Lambda Score Function

The *Lambda* score function uses only previous information by taking the ratio of the intensities evaluated at the unknown event time *t*_{
i
}.

\begin{array}{l}{q}_{i,k}^{\mathit{\text{Lambda}}}=\frac{{\lambda}_{k}\left({t}_{i}\right|{H}_{\tau ,k})}{\sum _{m=1}^{K}{\lambda}_{m}\left({t}_{i}\right|{H}_{\tau ,k})}\end{array}

(13)

#### Stomakhin-Short-Bertozzi (SSB) method

The method defined in [24] is summarized by

\begin{array}{l}\phantom{\rule{-2.0pt}{0ex}}max\phantom{\rule{0.3em}{0ex}}\left\{\phantom{\rule{0.3em}{0ex}}\sum _{k}\phantom{\rule{0.3em}{0ex}}\sum _{\mathit{\text{ij}}}{\delta}_{i,j}{\mu}_{k}{q}_{i,k}^{\mathit{\text{SSB}}}\phantom{\rule{0.3em}{0ex}}+\phantom{\rule{0.3em}{0ex}}\frac{1}{2}(1\phantom{\rule{0.3em}{0ex}}-\phantom{\rule{0.3em}{0ex}}{\delta}_{\mathit{\text{ij}}}){\alpha}_{k}{\omega}_{k}{e}^{-{\omega}_{k}|\underset{i}{\overset{k}{t}}-\underset{j}{\overset{k}{t}}|}{q}_{i,k}^{\mathit{\text{SSB}}}{q}_{j,k}^{\mathit{\text{SSB}}}\phantom{\rule{0.3em}{0ex}}\right\},\end{array}

(14)

subject to

\begin{array}{l}\sum _{k=1}^{K}{\left({q}_{i,k}^{\mathit{\text{SSB}}}\right)}^{2}=1.\end{array}

(15)

This method is motivated by the Hawkes process defined in Equation 1.

#### Probability Score Function

The *Probability* score function uses the approximation of the branching structure of the underlying process. The idea behind this method is events that are background events with no corresponding response events should not belong in the process. An event that is a background with many response events or an event that is a response to another event should be part of that process.

{q}_{i,k}^{\mathit{\text{Prob}}}=\frac{\sum _{{t}_{j}>{t}_{i}}{P}_{i,j}^{k}}{{P}_{i,i}^{k}}

(16)

{P}_{i,i}^{k}=\frac{{\mu}_{k}\left({t}_{i}\right)}{{\lambda}_{k}\left({t}_{i}\right|{H}_{\tau ,k})}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{2.77626pt}{0ex}}{P}_{i,j}^{k}=\frac{{g}_{k}({t}_{j}-{t}_{i})}{{\lambda}_{k}\left({t}_{j}\right|{H}_{\tau ,k})}

(17)

#### Forward Backward Score Function

This method is the ratio of the summation of the response for the events in the future and the past, \sum _{i\ne j}{g}_{k}\left(\right|{t}_{i}-{t}_{j}\left|\right) over the background rate *μ*_{
k
}.

\begin{array}{l}{q}_{i,k}^{\mathit{\text{FB}}}=\frac{\sum _{i\ne j}{g}_{k}\left(\right|{t}_{i}-{t}_{j}\left|\right)}{{\mu}_{k}}\end{array}

(18)