Can we analyze word associations in online solicitation transcripts using online software Overview? Part 2

Hollie Richardson

Last year Ian Elliott began investigating the use of free, open-source online text analysis tool Overview (read Ian’s post here) to examine online grooming transcripts. The tool – originally designed for investigative journalists and more recently used by researchers – searches and analyses huge sets of documents simultaneously and provides a visualization of the broad trends and patterns across these documents in the form of ‘topic trees’. This post describes the findings of an updated analysis with a larger sample.

The topic trees show the key words which are most frequently used across all of the analyzed documents in the first level, and then splits to show superordinate ‘clusters’ or folders of documents based on their similarity. These are further divided into sub-folders, with the number of documents in each folder getting fewer and thus the folders becoming more refined with each additional level. Each folder is labelled with the key words that make those documents within the folder similar – resulting in trees like the one below.


I extended this research with the help of Ian Elliott and Anthony Beech using a larger dataset of 150 transcripts. My main aim was to investigate whether Overview could be used to analyze word associations in a large sample of online solicitation transcripts, and in doing so identify linguistic themes within these transcripts.

I used the 150 “most slimy” transcripts on the Perverted Justice website, removing any commentary before uploading the transcripts onto Overview and running the analysis. Perverted Justice is an independent organization whose aim is to provide a deterrent effect by acting as ‘decoy’ children (an adult simulating the responses of a hypothetical ‘child’) in online chat rooms and pass evidence of illegal activity on to law enforcement agencies. In a second analysis I divided the transcripts based on the gender of the decoy so I could compare the themes identified. There were significantly fewer male decoy transcripts in the analysis (19 compared to 131 female decoy transcripts).

“When all of the transcripts were analyzed together the key words revealed – perhaps unsurprisingly – an overarching theme of sexualization across the transcripts.”

Overview allows you to ignore certain words, so I excluded the usernames in each of the transcripts (which appear on every line) and some irrelevant words which came up frequently in the analysis but weren’t particularly relevant, such as “yeh”. There’s also the option to give extra weighting to ‘important’ words. In two separate analyses I gave extra weighting to a list of sexual words and a brief list of risk assessment words, identified by reading through the documents and a quick internet search.  Once all of the transcripts were uploaded and the analysis run, I analyzed the clusters in the topic trees to identify linguistic themes.

When all of the transcripts were analyzed together regardless of decoy gender and extra weighting given to sexual words, the key words revealed – perhaps unsurprisingly – an overarching theme of sexualization across the transcripts. Two superordinate folders were identified which I interpreted and subsequently labelled Complimentary/descriptive – with key words such as “sexy”, “love” and “naked” – and Male oriented, with key words such as “boy” and “dick”.

Three meaningful sub-themes were also evident: Kindness and flattery, with key words such as “sweetie” and Authoritative, with key words such as “master” were identified within the Complimentary/descriptive theme, and Violent/aggressive, with key words such as “rape” within the Male-oriented theme. It’s worth noting that the output Overview provided had multiple sub-folders, however many contained very few documents and were therefore discarded from the analysis as any interpretation of these folders was superficial at best.


Model of themes when extra weighting is given to sexually descriptive words

Overview also allows for a deeper analysis of the context in which the key words in each folder were used within the transcripts. For example, contextual analysis of the ‘kindness and flattery’ sub-folder revealed the use of sexual compliments and the offender acting as a friend and supporting the child. The identification of a theme of kindness and flattery in both analyses suggests this is a fundamental strategy used by offenders. This is supported by previous literature suggesting offenders use kindness and flattery to build trust and an emotional bond from victim to offender, to aid the transition into more sexually explicit conversation.

“The identification of a theme of kindness and flattery in both analyses suggests this is a fundamental strategy used by offenders.”

When extra weighting was given to risk assessment words the analysis revealed considerably different results, with key words referring to parents and the police. I included this analysis purely for exploratory purposes based on O’Connell‘s descriptions of risk assessment within her seven stage online grooming process. Overview again produced multiple folders and sub-folders with superordinate themes labelled Security (“mom”, “dad”, “cop”) and The next step; this theme was labelled as such in light of previous research by Juliane Kloess and colleagues and refers to suggestions of a physical meeting.

Overview also pairs words which appear frequently together; for example, “meet” and “sex” occur frequently together in The next step, so Overview pairs them together (meet_sex) to show that the two words are meaningful as a whole. Multiple sub-themes were also identified in this analysis; Acknowledgement of illegal activity and Kindness and flattery were identified within Security, and Telephone communication within The next step.


Model of themes when extra weighting is given to risk assessment words

When transcripts were stratified based on decoy gender and analyzed separately, the key words for both male and female decoy transcripts shared a central theme of sexualization and affection, however linguistic themes identified differed greatly. Male decoy transcripts revealed a focus on creating exclusivity between decoy and offender. I hypothesized that this is to aid the transition into a more sexually explicit conversation; this theory is supported by the sub-theme identified in which the offender asks for sexually explicit images or the use of a webcam, labelled ‘Supplementing sexual stimulation’, again based on prior research by Kloess and co. In contrast, the female decoy analysis showed a focus on flattery and arranging physical meetings.

“Male decoy transcripts [when compared to female decoys] revealed a focus on creating exclusivity between decoy and offender.”

The study itself was not without limitations. Overview does hold a key flaw – it is sensitive to the slang and misspellings by both offenders and victims that are frequent in online solicitation transcripts. Because of this, it may be that key word associations and linguistic themes have been missed. The number of male decoy transcripts was significantly lower than that of female decoy transcripts, all offenders used were male, and the transcripts analysed used a decoy rather than a real child victim, and so the analyses may lack validity.

Nevertheless, overall Overview appears to be a valid and reliable tool by which to analyse word associations and identify linguistic themes in online solicitation transcripts. Overview could prove useful in the identification of dynamic (treatable) factors within these conversations (who is driving the conversation, etc.). The themes identified also add to previous research and may be useful in raising awareness and educating parents, teachers and children about internet safety, in turn reducing the prevalence of online sex offences.

Hollie Richardson is an undergraduate student at the University of Birmingham (UK) currently studying for a Masters degree in Psychology and Psychological practice. Her third year undergraduate research project focused on forensic psychology, and investigated the possibility of analyzing word associations in online solicitation transcripts using the online software Overview. Hollie will soon begin a forensic psychology placement year as an assistant psychologist in a medium secure forensic unit, and hopes to pursue a career in forensic psychology after university.

Suggested citation:
Richardson, H. (2015, September 27). Can we analyze word associations in online solicitation transcripts using online software Overview? Part 2 [Weblog post]. Retrieved from

NextGenForensic welcomes analytical, thoughtful, cogent, relevant, respectful, and proportional comments, either supportive or opposing, on posts. Comments that are vague, off-topic, factually inaccurate, or include arguments replete with logical fallacies, will not be approved.

Want to submit your own post? Click here to find out how!

%d bloggers like this: