Automatic detection of cyber-recruitment by violent extremists
© Scanlon and Gerber; licensee Springer 2014
Received: 16 January 2014
Accepted: 29 May 2014
Published: 13 August 2014
Growing use of the Internet as a major means of communication has led to the formation of cyber-communities, which have become increasingly appealing to terrorist groups due to the unregulated nature of Internet communication. Online communities enable violent extremists to increase recruitment by allowing them to build personal relationships with a worldwide audience capable of accessing uncensored content. This article presents methods for identifying the recruitment activities of violent groups within extremist social media websites. Specifically, these methods apply known techniques within supervised learning and natural language processing to the untested task of automatically identifying forum posts intended to recruit new violent extremist members. We used data from the western jihadist website Ansar AlJihad Network, which was compiled by the University of Arizona’s Dark Web Project. Multiple judges manually annotated a sample of these data, marking 192 randomly sampled posts as recruiting (Yes) or non-recruiting (No). We observed significant agreement between the judges’ labels; Cohen’s κ=(0.5,0.9) at p=0.01. We tested the feasibility of using naive Bayes models, logistic regression, classification trees, boosting, and support vector machines (SVM) to classify the forum posts. Evaluation with receiver operating characteristic (ROC) curves shows that our SVM classifier achieves an 89% area under the curve (AUC), a significant improvement over the 63% AUC performance achieved by our simplest naive Bayes model (Tukey’s test at p=0.05). To our knowledge, this is the first result reported on this task, and our analysis indicates that automatic detection of online terrorist recruitment is a feasible task. We also identify a number of important areas of future work including classifying non-English posts and measuring how recruitment posts and current events change membership numbers over time.
KeywordsCyber Recruitment Extremist Terrorism Darkweb Machine learning Natural language processing
In the last decade, the modern landscape of extremism has expanded to encompass the Internet and online social media ,. In particular, extremist organizations have increasingly used these technologies to recruit new members. Recent research by Torok shows that cyber tools are most influential at the onset of a future member’s extremist activity—the recruitment and radicalization phase . Terrorist groups use the free and open nature of the Internet to form online communities  and disseminate literature and training materials without having to rely on traditional media outlets which might censor or change their message ,. Terrorist organizations engage in directed communication and advertisement, recruiting members on social websites like Second Life, Facebook, and radicalized religious web forums ,,. The intelligence community would benefit from knowledge of how terrorist organizations conduct online recruitment and whom they may be targeting.
The investigation report on FBI counterterrorism intelligence failures leading up to the Ft. Hood shooting on November 7, 2009 cited a “data explosion” and “workload” as contributing factors to analyst and agent oversights. At “nearly 20,000 Aulaqi-related [electronic documents],” keeping up with workload demands was clearly a challenge for the two reviewers assigned to the case at the time . Considering this large volume of possibly relevant text data requiring review by a limited number of FBI agents, automated classification methods would be useful for pre-screening text documents—reducing the workload of human analysts.
Within this article, a violent extremist (VE) group is an organization that uses violent means, like terrorism, to disrupt a legitimate authority, whereas insurgents and terrorists are common types of violent extremist groups that act with the specific goal of influencing public opinion or inciting political change. A radical religious group organizing inflammatory yet peaceful protests or a politically motivated person engaging in civil disobedience are not considered violent extremists under these definitions. Many modern groups, like the Westboro Baptist Church, have radical religious views, but these beliefs are neither necessary nor sufficient to classify them as violent extremists without the intent to carry out or advocate for specific acts of violence. Within this article, VE recruitment is any attempt by a group or individual involved in VE to recruit, radicalize, or persuade another person to aid a violent movement. Cyber-recruitment is therefore recruitment activity that makes use of computers and the Internet.
This article presents data and analytic methods for automatically identifying recruitment activities of violent extremist organizations within online social media. Specifically, these methods identify messages recruiting individuals for participation in violent extremism. For these classification purposes, a VE cyber-recruitment message is any message that attempts to persuade the reader to join a violent extremist organization. These recruitment messages must assist readers in finding violent movements to join, or describe ways to become more active or provide material aid. By developing and evaluating an automatic system for identifying such messages, we demonstrate an important and feasible method for identifying the intention/incitement of violent activity within online communities.
The rest of this article is organized as follows. In the section “Related work”, we compare offline violent extremist recruitment with the recent increase in cyber-recruitment efforts. Additionally, we discuss previous counterinsurgency efforts and contemporary research that outlines the challenges associated with analyzing VE activities, like recruitment; we emphasize the specific gaps that our article addresses. The “Data collection and annotation” section describes our data requirements and the specific data sources we used, followed by the pre-processing and annotation steps required for supervised learning of VE recruitment. Following the annotation steps, we present our agreement analysis results of the VE recruitment annotations. In the section “Analytic approach”, we propose a probabilistic model employing natural language features for automatically classifying VE recruitment in forum posts. We also describe the classification functions used in our supervised learning experiments, such as naive Bayes, logistic regression, and support vector machines. A description of the results obtained from the experiments along with our interpretation is provided in the “Results and discussion” section. Finally, “Conclusions and future work” section discusses future directions and potential for our proposed techniques along side privacy concerns related to automated monitoring and analysis of ubiquitous communication.
Offline recruitment and manual social network analysis
The modern jihadist insurgencies in Iraq and Afghanistan operate among the local civilian population and engage in both legal and illegal activities in order to achieve their strategic and political goals. However, the illegal acts are only effective when carried out by an organized and well-manned group . Recruiting new members is thus a critical activity for both daily operations and the underlying political cause. An average terrorist group has a life expectancy of less than a year, so groups wishing to extend their lifespan must replace members lost through arrests, deaths, and defections . Several studies have tried to understand why some people join violent rebellions –, while others only sympathize or cooperate in a non-violent capacity –. This article facilitates such understanding by providing methods that identify examples of active recruitment activity within a population of individuals who may passively sympathize with violent groups.
Ralph McGehee observed VE recruitment first hand during his 1967 work to identify communist insurgents in the rural villages along the northern border of Thailand. His efforts enabled the joint CIA-Thai counterinsurgency (COIN) to provide targeted aid to at-risk villages and persons, and in doing so simultaneously thwart communist recruitment efforts and improve regional support for the Thai government. The success of McGehee’s program can be attributed to his intelligence teams collecting information on nearly every person in the villages, not just the communist sympathizers he was specifically targeting. This provided a more complete picture of the community and allowed this early social network analysis (SNA) effort to better infer the community’s support for the communists and successfully identify active members of the insurgency . Although our research problem specifically targets online communities, strong parallels exist between these virtual worlds and the physical communities addressed by McGehee because both contain violent extremist groups that operate within, hide among, and recruit from a passive majority population.
Cyber-recruitment, social network analysis, and data mining
The primary danger of cyber-recruitment is its ability to quickly expose large online communities to a substantial amount of engaging, multimedia content ,,. COIN experts are increasingly concerned with the potential of these cyber-communities for illegal purposes. Most literature has focused on how violent extremist groups use legitimate social networking websites along with online discussion forums for recruitment and other activities. This prior research largely provides evidence and case studies of real online VE activity and suggests ways that virtual worlds may be used by these groups in the future ,,–. Recent research has evaluated the use of political tools for shutting down websites or shaming material supporters . Some researchers have suggested the use of web-crawling and analysis techniques to monitor for VE activities including recruitment ,,; however, we are not aware of any implementations of such techniques on recruitment specifically. This article presents new research that fills this gap, addressing the need to detect cyber-recruitment in online social media forums.
Computer-based social network analysis is a large field of research, one objective of which is to identify the organizational structure of VE networks ,–. With objectives similar to McGehee’s manual SNA work, present research hopes to detect the presence of VE groups and their influence within large-scale networks based on the number of interconnections among VEs and influential community members. There have also been preliminary attempts to profile individual users using text mining techniques . However, this prior research has typically focused on violent extremist activity in general without focusing on a particular activity like recruitment. Although much COIN literature has covered cyber-recruitment, and data/text mining techniques have been used in an early capacity to collect/analyze Internet data, no published research has applied such techniques to specifically examine the cyber-recruitment activities of extremist groups in online environments. The present research complements the research surveyed above by building on recent data collection efforts, focusing on online recruitment specifically, and applying current techniques from natural language processing to automatically identify recruitment activities.
Data collection and annotation
The need for cyber-COIN tools has increased interest in methods that analyze so-called “dark web” content. Dark web content is defined as information from typically private social websites where extremists interact. Many early efforts focused on locating, accessing, extracting, and storing data from dark web forums ,,,,. The present research builds on these key efforts. In the section “Data requirements and sources”, we describe requirements that must be met by data sources supporting our objectives along with specific data sources used in our study. In the “Data pre-processing and annotation” section, we describe our manual annotation effort, which analyzed individual posts for recruitment content.
Data requirements and sources
This article leverages prior data collection efforts by using pre-compiled forum post data to model violent extremist recruitment within online social media. The following data requirements are needed to support our research objectives.
Violent extremist activity - The collected data should come from sources that are popular among violent extremist groups and their sympathizers and contain overt recruitment for such groups.
Contemporary time-frame - The collected data should cover a contemporary time-frame (e.g., the last decade) in order to be considered relevant to contemporary anti-extremist efforts.
Language - The collected data must use the English language or be translatable to English using an automatic process like Google’s machine-translation service .
Forums used in our study, extracted from the Ansar AlJihad Network via the Dark Web Portal
11/2008 - 5/2012
12/2008 - 1/2010
The Ansar AlJihad Network is a set of invitation-only jihadist forums in Arabic and English that are known to be popular with western jihadists . The Dark Web Project compiled 299,040 total messages posted on Ansar AlJihad between 2008-2012. Fewer posts are compiled from the English forums, called Ansar1, than from the Arabic portion of the site; however, the English subset was sufficiently large for our study and contained contemporary, original-English discussions between jihadists and jihadist sympathizers. We used this subset in all of our experiments. The structured data annotations discussed below are the only data elements not originating from this pre-compiled Ansar AlJihad source.
Data pre-processing and annotation
We read in a sample of raw Ansar1 forum posts and compiled the message text and respective message IDs into an initial corpus. We then automatically removed duplicates (same message ID) and empty documents (no message text) from the corpus.
Most posts contain exclusively English text as Asnar1 is the English-language forum for the Ansar AlJihad Network. However, occasional posts include non-English words or phrases; these are commonly Arabic passages from the Koran. In these cases the non-English passages were converted to English using Google Translate . We left slang words written in latin characters intact under the assumption that they were meant to be readable by an English language speaker. For example, “Kuffar” is a derogatory Arabic term for unbeliever.
You have been provided with 192 forum posts sampled from a Jihadist forum.
Read each post carefully and determine whether that post has the intent to recruit violent extremists to some group or movement. For the purpose of annotation, violent extremist recruitment is defined as any attempt by a group or individual to recruit, radicalize, or persuade another person into aiding a violent movement aimed at disrupting a legitimate authority.
Annotate each post by marking it as either (a) contains violent extremist recruitment, or (b) does not contain violent extremist recruitment.
Example text of Ansar1 forum posts and the respective annotations
A Golden chance to join Jihad in Somalia. Abo Dojana invited those who want to participate in jihad to join the militants in Somalia to form what he called a base of martyrdom-seekers who would from there spread to the entire world. Somalia could actually be an ideal base for physical and weapons training...
Representing the militant Islamic group Shebab, Abu Mansour makes a pitch for new overseas recruits after praising one militant fighter killed in an apparent ambush. ‘So, if you can encourage more of your children and more of your neighbors and anyone around to send people like him to this jihad (holy war), it would be a great asset for us,’ he says.
I have now added him as a friend on Facebook. But something tells me that he isn’t going to answer to my request. LOL, you had me rolling on the floor man!!!!! So this attack was done my ‘Jaish al Mujihadeen’ How it that possible?, did they have problems with bounced-checks from the US?
A court in the German city of Koblenz sentenced a German of Pakistani origin to eight years in prison Monday on a conviction of assisting the international Al-Qaeda terror network. The man gave the group financial aid and tried to recruit new members in German territory, according to the indictment
Did Mansoor join the emerat? I heard he is still fighting for Ichkira Republic
Agreement matrix of proportions for recruitment categories
p i B
p o =0.82+0.11=0.93
p c =0.74+0.02=0.76
Table 3 shows the agreement results for our manually annotated Ansar1 data. As shown, the two judges found that approximately 11% of the posts contained VE recruitment. The two judges agreed on the labels for 93% of the posts, with an expected chance agreement of 76%, producing a κ of 70% (see Table 3 for details). Significant non-random agreement was observed with a confidence interval of (0.5,0.7) at p=0.01; however, interpreting strength of agreement is a common problem with agreement metrics. Some studies have attempted to provide a scale and would describe κ=0.70 as “substantial” strength of agreement ,, but despite considerable debate among statisticians this issue has never been definitively addressed. Considering both the significance and magnitude of agreement, these results adequately justify using the annotated Ansar1 messages in our analytic approach. To increase the final size of our experimental dataset, one of the judges annotated an additional 100 posts from the Ansar1 collection following the same protocol described above. In total, we observed that 13% of forum posts contained recruitment according to our definition.
In Equation 1, Recruitment is a binary classification label, d i ∈D is a forum post, and w j is a feature function of d i . In the following section, we discuss the features used and then we present different formulations of the classification function F.
Text classification features
We employed a bag-of-words, or unigram only, feature space by parsing each forum post in the corpus into a term-by-document matrix. This matrix of term frequency (tf) features was created using the RTextTools and tm text mining packages in R , which also performed basic normalization and feature reduction through the removal of URL web addresses, numbers, punctuation, stopwords, and whitespace. The number of features was further reduced through stemming using the Porter Stemming Algorithm . Under this representation, w j (d i ) is equal to the raw frequency of a stemmed word form, with n (the number of feature functions) equal to the number of distinct words remaining after document processing.
|D|, corpus cardinality, is the total number of posts in the training corpus, and the denominator represents the number of posts containing at least one occurrence of the jth feature (i.e. word). In order to keep the test data unbiased we used IDF terms computed only from posts in the training portion of the corpus.
We conducted supervised learning over our annotated posts using a variety of classification functions: naive Bayes, logistic regression, classification trees, boosting, and support vector machines (SVM).
Our implementation of naive Bayes was adapted from the R package e1071 for use with the sparse training data typical in a term-by-document matrix . We fit a naive Bayes model using the default settings of Laplace (add one) smoothing and priors taken from the training data.
where C>0 is the regularization cost parameter.
We used the LiblineaR package in R to minimize Equation 4 and then to predict the VE recruitment classification of testing data . All GLM results shown in the “Results and discussion” section are for L2-regularized logistic regression models fit with the default settings for this R package and an L2-regularization cost parameter C equal to the ratio of negative to positive class labels.
We applied the probability model in Equation 1 to a classification tree by calculating the posterior probability of the recruitment classes at each node of the tree. The R package tree was used to train classifiers grown using recursive partitioning with a deviance criterion to select features at each node . We used the default package parameters to control tree growth, including: minimum within-node deviance =0.01(d e v i a n c e r o o t ), minimum allowable node size = 10, and minimum observations to a candidate child node = 5.
Support vector machines
Finally, we trained a recruitment classifier using the support vector machine (SVM) algorithm implemented in the R package e1071. SVMs do not fit into a probability model like Equation 1; however, the R package provides a method for estimating class probabilities if they are required for things like performance comparisons with receiver operating characteristic (ROC) curves. All SVM results shown below were produced using default package parameters, constraint violation cost = 100, and a radial basis function as the kernel.
Results and discussion
Confusion matrix used to assess the recruitment model’s classification performance
P r(R e c j |d i )≥θ
P r(R e c j |d i )<θ
Recruitment = True
P=T P+F N
Recruitment = False
N=F P+T N
95% confidence intervals for multiple comparisons of bootstrapped mean AUCs using Tukey’s range test
Tukey’s test comparisons
SVM – Logit
SVM – Boost
SVM – nBayes
SVM – Tree
Logit – Boost
Logit – nBayes
Logit – Tree
Boost – nBayes
Boost – Tree
nBayes – Tree
Time performance benchmark results* (a) training time using 294 posts, and (b) mean classification time per post
Typically classification research compares results against prior methods as a benchmark for improvements in accuracy; however, we were unable to find any previously published methods for the specific task of identifying violent extremist recruitment using text classification techniques. Thus, our results serve as initial performance benchmarks against which future methods can be compared.
The most discriminating term features as weighted by the cross-validated logistic regression models
Conclusions and future work
This work was motivated by increasing online activities of violent extremist organizations along with the lack of automated approaches to analyze such activity. Our research built upon recent data collection and analysis efforts to develop supervised learning and natural language processing methods that automatically identify cyber-recruitment by violent extremists. The results presented in this article support the conclusion that automatic VE recruitment detection is a feasible goal. As the first reported results on this task, our classifiers serve as initial performance benchmarks against which future VE recruitment classifiers can be compared.
In the future, our VE recruitment detection methods could be improved by including support for non-English languages. Whether such future methods use automatic translation or non-English features, support for other languages is an important task considering that violent extremist groups frequently operate in non-English speaking communities. Incorporating non-English text and features could be accomplished through the use of experts to perform the manual annotation. Expert judges might also improve annotation quality if agreement remains strong. Future work could also analyze classifier behavior in depth, and test the effectiveness of more advanced feature selection and modeling techniques. Methods like latent semantic analysis perform singular value decomposition transformations on the feature space and may be employed to further reduce both dimensionality and the effect of non-discriminating terms . Latent Dirichlet allocation may be used to substitute the terms in a high-dimensional feature space with a smaller set of latent topics that represent the major subjects appearing in the corpus . Such latent variable modeling techniques could serve as feature selection and replacement methods while preserving the statistical relationships that are essential for text classification tasks.
By testing the effectiveness of our methods in a proxy for real-world settings we demonstrated that such automated classification tools would clearly fit into the workflow of counterterrorism intelligence teams like the FBI’s information review analysts. The current workflow tasks human analysts with manually reviewing and annotating “the ever-increasing [volume of investigative] information” stored in data warehouses like the Electronic Surveillance Data Management System (DWS-EDMS) used by the FBI . An automated classification system using our methods for detecting VE recruitment could serve as a pre-screening step in the current review workflow tasked with reducing the volume of documents requiring human attention. Our automated approach could also complement current lead management systems like eGuardian by automatically detecting potential terrorist recruitment events so they can be efficiently compiled into leads for current investigations or used as evidence to open new terrorism-related investigations.
More generally our automated classification methods could be used as part of a VE recruitment identification and tracking methodology that would enable the study of recruitment efforts and the membership dynamics of violent organizations. Such a method might be able to measure the effectiveness of extremist and counterinsurgency efforts on new membership by correlating specific recruitment activities and current events with changes in the VE population of a community. As a future research path, this proposed methodology requires (1) an automated system for classifying whether a forum user is a member of a violent extremist group, and (2) time series methods for analyzing recruitment and membership along a timeline.
In light of the still unfolding news regarding the NSA’s Boundless Informant and PRISM programs , we address some ethical implications of our work. Given that such a comprehensive and intrusive source of text data does exist, there is clearly a potential for abusing a recruitment and membership classification method to target non-combative individuals. Such tracking methods could thwart perfectly legal recruitment efforts of peaceful protesters, or radical yet law-abiding religious sects. These groups might fit the profile of a VE organization in every way except the critical ingredient of violence. Furthermore, recruitment alone rarely necessitates a violent act even though a recruiter may refer to or even encourage such acts. Because of these possible unethical repercussions, we proposed classification methods that target not just extremist groups, but specifically violent groups engaged in acts like terrorism. We hope that tuning the learning algorithms in this way will reduce some risk of misuse.
Area under the curve (ROC curve)
False positive rate
Generalized linear model
Inverse document frequency
Logistic regression algorithm
Naive Bayes algorithm
Recruitment (of violent extremists)
Receiver operating characteristic
Social network analysis
Support vector machines
True positive rate
This research was financially supported by a grant from the United States Army Research Laboratory (ARL).
- Overbey LA, McKoy G, Gordon J, McKitrick S: Automated sensing and social network analysis in virtual worlds. In Intelligence and Security Informatics (ISI) . IEEE, Vancouver, BC, Canada; 2010:179–184.Google Scholar
- Torok R: “Make A Bomb In Your Mums Kitchen”: Cyber Recruiting And Socialisation of ‘White Moors’ and Home Grown Jihadists. In Australian Counter Terrorism Conference . School of Computer and Infomation Science, Edith Cowan University, Perth, Western Australia; 2010:54–61.Google Scholar
- Rogers M: Chapter 4: The Psychology of Cyber-Terrorism. In Terrorists, Victims and Society: Psychological Perspectives on Terrorism and its Consequences . Edited by: Silke A. John Wiley & Sons, Chichester, West Sussex, England; 2003:77–92.View ArticleGoogle Scholar
- O’Rourke S: Virtual radicalisation: Challenges for police. In 8th Australian Information Warfare and Security Conference . School of Computer and Infomation Science, Edith Cowan University, Perth, Western Australia; 2007:29–35.Google Scholar
- Mandal S, Lim E-P: Second life: Limits of creativity or cyber threat. In IEEE Conference on Technologies for Homeland Security . IEEE, Waltham, MA; 2008:498–503.Google Scholar
- WH Webster, DE Winter, L Adrian, J Steel, WM Baker, RJ Bruemmer, KL Wainstein, Final report of the William H.Webster Commission on the Federal Bureau of Investigation, counterterrorism intelligence, and the events at Fort Hood, Texas on November 5, 2009. Technical report, Federal Bureau of Investigation (2012).Google Scholar
- Tomes RR: Waging war on terror relearning counterinsurgency warfare. Parameters 2004, 34(1):16–28.Google Scholar
- F Gutiérrez, in Santa Fe Institute, Mimeo. Recruitment in a civil war: a preliminary discussion of the colombian case, (2006).Google Scholar
- Humphreys M, Weinstein JM: Who fights? the determinants of participation in civil war. Am. J. Pol. Sci 2008, 52(2):436–455. 10.1111/j.1540-5907.2008.00322.xView ArticleGoogle Scholar
- Lichbach MI: The Rebel’s Dilemma . University of Michigan Press, Ann Arbor; 1998.Google Scholar
- Peters K, Richards P: ‘Why we fight’: Voices of youth combatants in Sierra Leone. Africa 1998, 68(02):183–210. 10.2307/1161278View ArticleGoogle Scholar
- Weinstein JM: Inside Rebellion: The Politics of Insurgent Violence . Cambridge University Press, New York; 2007.Google Scholar
- Petersen RD: Resistance and Rebellion: Lessons from Eastern Europe . Cambridge University Press, New York; 2001.View ArticleGoogle Scholar
- Popkin S: The rational peasant. Theory Soc 1980, 9(3):411–471. 10.1007/BF00158397View ArticleGoogle Scholar
- Scott JC: The Moral Economy of the Peasant: Rebellion and Subsistence in Southeast Asia . Yale University Press, New Haven & London; 1976.Google Scholar
- Wood EJ: Insurgent Collective Action and Civil War in El Salvador . Cambridge University Press, New York; 2003.View ArticleGoogle Scholar
- McGehee RW: Deadly Deceits: My 25 Years in the CIA . Sheridan Square Publications, Inc., New York; 1983.Google Scholar
- Conway M: Terrorism and the internet: new media–new threat? Parliamentary Aff 2006, 59(2):283–298. 10.1093/pa/gsl009View ArticleGoogle Scholar
- Torok R: Developing an explanatory model for the process of online radicalisation and terrorism. Secur. Informatics 2013, 2(1):1–10. 10.1186/2190-8532-2-1View ArticleGoogle Scholar
- Bowman-Grieve L: A psychological perspective on virtual communities supporting terrorist & extremist ideologies as a tool for recruitment. Secur. Informatics 2013, 2(1):1–5. 10.1186/2190-8532-2-1View ArticleGoogle Scholar
- Kohlmann EF: Al-Qaida’s MySpace: terrorist recruitment on the internet. CTC Sentinel 2008, 1(2):8–9.MathSciNetGoogle Scholar
- LA Overbey, G McKoy, J Gordon, S McKitrick, Jr MH, L Buhler, L Casassa, S Yaryan, Virtual DNA: Investigating cyber-behaviors in virtual worldsL. Technical Report 33–09 E, Space and Naval Warfare System Center Atlantic Charleston, SC (2009).Google Scholar
- McNeal GS: Cyber embargo: Countering the internet jihad. Case West. Reserv. Univ. J. Int. Law 2008, 39: 789–826.Google Scholar
- Chen H, Thoms S, Fu T: Cyber extremism in web 2.0: An exploratory study of international jihadist groups. In IEEE International Conference on Intelligence and Security Informatics (ISI) . IEEE, Taipei; 2008:98–103.View ArticleGoogle Scholar
- Yang M, Kiang M, Chen H, Li Y: Artificial immune system for illicit content identification in social media. J. Am. Soc. Inf. Sci. Technol 2012, 63(2):256–269. 10.1002/asi.21673View ArticleGoogle Scholar
- Basu A: Social network analysis of terrorist organizations in India. In North American Association for Computational Social and Organizational Science (NAACSOS) Conference . NAACSOS, Notre Dame, Indiana; 2005:26–28.Google Scholar
- Carley KM: Destabilization of covert networks. Comput. Math. Organ. Theory 2006, 12(1):51–66. 10.1007/s10588-006-7083-yView ArticleGoogle Scholar
- Chau M, Xu J: Using web mining and social network analysis to study the emergence of cyber communities in blogs. In Terrorism Informatics . Springer, New York; 2008:473–494.View ArticleGoogle Scholar
- Diesner J, Carley KM: Using network text analysis to detect the organizational structure of covert networks. In Proceedings of the North American Association for Computational Social and Organizational Science (NAACSOS) Conference . NAACSOS, Pittsburgh; 2004.Google Scholar
- Chen Z, Liu B, Hsu M, Castellanos M, Ghosh R: Identifying intention posts in discussion forums. In Proceedings of NAACL-HLT . Association for Computational Linguistics, Atlanta, Georgia; 2013:1041–1050.Google Scholar
- Chen H, Chung W, Qin J, Reid E, Sageman M, Weimann G: Uncovering the dark web: A case study of jihad on the web. J. Am. Soc. Inf. Sci. Technol 2008, 59(8):1347–1359. 10.1002/asi.20838View ArticleGoogle Scholar
- Fu T, Abbasi A, Chen H: A focused crawler for dark web forums. J. Am. Soc. Inf. Sci. Technol 2010, 61(6):1213–1231.Google Scholar
- Google Inc., Google Translate (2014). . http://translate.google.com/ Google Inc., Google Translate (2014). .
- Terrorism Informatics: Knowledge Management and Data Mining for Homeland Security . Springer, New York; 2008.
- Artificial Intelligence Laboratory, University Of Arizona, Dark Web Forum Portal: Ansar AlJihad Network English Website (2014). http://cri-portal.dyndns.org
- Cohen J: A coefficient of agreement for nominal scales. Educ. Psychol. Meas 1960, 20(1):37–46. 10.1177/001316446002000104View ArticleGoogle Scholar
- Landis JR, Koch GG: The measurement of observer agreement for categorical data. Biometrics 1977, 33(1):159–174. 10.2307/2529310View ArticleMathSciNetMATHGoogle Scholar
- Fleiss JL, Levin B, Paik MC: The measurement of interrater agreement. Stat. Methods Rates Proportions 1981, 2: 212–236.Google Scholar
- I Feinerer, K Hornik, Tm: Text Mining Package. (R Foundation for Statistical Computing, 2014). R package version 0.5–10. http://CRAN.R-project.org/package=tm
- TP Jurka, L Collingwood, AE Boydstun, E Grossman, W van Atteveldt, RTextTools: Automatic Text Classification Via Supervised Learning (2014). R package version 1.4.2. http://CRAN.R-project.org/package=RTextToolsGoogle Scholar
- CJ Van Rijsbergen, SE Robertson, MF Porter, New Models in Probabilistic Information Retrieval (British Library Research and Development Dept, 1980).Google Scholar
- Cavnar W: Using an n-gram-based document representation with a vector processing retrieval model. In Overview of the Third Text Retrieval Conference . Edited by: Harman DK. Computer Systems Laboratory, National Institute of Standards and Technology, Gaithersburg, MD; 1995:269–277.Google Scholar
- Duda RO, Hart PE, Stork DG: Pattern Classification . John Wiley & Sons, Inc, New York; 2001.MATHGoogle Scholar
- D Meyer, E Dimitriadou, K Hornik, A Weingessel, F Leisch, E1071: Misc Functions of the Department of Statistics (e1071), TU Wien (2014). R package version 1.6–2. . http://CRAN.R-project.org/package=e1071 D Meyer, E Dimitriadou, K Hornik, A Weingessel, F Leisch, E1071: Misc Functions of the Department of Statistics (e1071), TU Wien (2014). R package version 1.6-2. .
- Lin C-J, Weng RC, Keerthi SS: Trust region newton method for logistic regression. J. Mach. Learn. Res 2008, 9: 627–650.MathSciNetMATHGoogle Scholar
- T Helleputte, LiblineaR: Linear Predictive Models Based On The Liblinear C/C++ Library (2013). R package version 1.80–7. http://CRAN.R-project.org/web/packages/LiblineaRGoogle Scholar
- B Ripley, Tree: Classification and Regression Trees (2014). R package version 1.0–35. http://CRAN.R-project.org/package=treeGoogle Scholar
- J Tuszynski, caTools: ROC AUC Tools, Moving Window Statistics (2013). R package version 1.16. http://CRAN.R-project.org/package=caToolsGoogle Scholar
- Friedman J, Hastie T, Tibshirani R: Additive logistic regression: a statistical view of boosting. Ann. Stat 2000, 28(2):337–407. 10.1214/aos/1016218223View ArticleMathSciNetMATHGoogle Scholar
- Fenech AP: Tukey’s method of multiple comparison in the randomized blocks model. J. Am. Stat. Assoc 1979, 74(368):881–884. 10.1080/01621459.1979.10481048View ArticleMathSciNetMATHGoogle Scholar
- Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci 1990, 41(6):391–407. 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9View ArticleGoogle Scholar
- Blei DM, Ng AY, Jordan MI: Latent dirichlet allocation. J. Mach. Learn. Res 2003, 3: 993–1022.MATHGoogle Scholar
- G Greenwald, NSA Collecting Phone Records of Millions of Verizon Customers Daily (2013). http://www.guardian.co.uk/world/2013/jun/06/nsa-phone-records-verizon-court-orderGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.