Sensitivity and specificity in phallometric tests for pedophilia and hebephilia
Developing and testing measures of sexual interests in children is a key clinical and scientific endeavor in sexology and forensic psychology. Without valid ways of testing for, establishing the presence of, examining correlates of, and tracking change in sexual interest in children, our fields are rather at a standstill.
One method of measuring sexual interest in males that has a long, and at times controversial, history is phallometry. Phallometric assessment involves measuring changes in penile volume or circumference while presenting sexual stimuli to the individual. The more arousal an individual shows to a particular stimulus (e.g., prepubescent children), the more we interpret him to be sexually interested in that age or gender of person. Throughout the history of phallometry, measurement issues have been a main topic of discussion and a focus of research on this method of assessing sexual interests.
One key issue regarding validity is the sensitivity and specificity of phallometric tests for pedophilia and hebephilia. Just so we are all on the same page, the sensitivity of a test is its ability (stated in percentage points) to identify those who have pedohebephilic interests as pedohebephilic, according to the test. Specificity is the other side of the coin: how well a phallometric test appropriately identifies teleiophilic individuals as teleiophilic (or, if you rather, as non-pedohebephilic). These two statistics are rather important, especially if a phallometric test result is going to be used to inform diagnosis. If a phallometric test indicates that an individual does or does not have a sexual interest in prepubescent children, we want to be rather confident that this is indeed the state of affairs for the individual because treatment decisions are based on such findings, and perhaps more importantly, court-based decisions may be in part informed by a phallometric test result.
If a phallometric test indicates that an individual does or does not have a sexual interest in prepubescent children, we want to be rather confident that this is indeed the state of affairs for the individual
To date, a number of studies have provided sensitivity and specificity estimates for different phallometric tests for sexual interest in children. As part of ongoing research that colleagues and myself are conducting, I have been reviewing this literature and data for approximately 25 studies can be found in an excel spreadsheet I have created for this post. In almost all of the studies that are cited in the spreadsheet, the sensitivity/specificity analyses was an “add on” analysis, or at least appears to be an secondary analysis that forms a minor focus of the study. To my knowledge, only a few studies (Blanchard et al., 2001; Byrne, 2001; Freund & colleagues’ studies) acknowledged methodological issues associated with estimating the sensitivity and specificity of a diagnostic test. None of the other studies referenced any of the many articles in the medical diagnostic literature that attempt to elaborate some of the issues inherent in attempting diagnostic accuracy research.
I find these problems concerning, since when examining the diagnostic accuracy of a test we are required to make some consideration of methodological issues that will affect the estimates our research produces. Indeed, some research in the medical test literature suggests that certain methodological features of diagnostic accuracy research might systematically bias the estimates of sensitivity and specificity (see this article as well). In published reviews of the phallometric research completed to date, the twin issues of how much and in what direction the estimates of sensitivity and specificity have been biased seems not to have been considered. As a result, we are lost in a confusing stew of findings that might be biased a little bit or a lot and biased in favor of greater or worse sensitivity than is actually the case. The wide range of sensitivity/specificity estimates that are found in the literature might be considered support for the notion that methodological features are pulling estimates in both directions and to varying degrees.
We are lost in a confusing stew of findings that might be biased a little bit or a lot and biased in favor of greater or worse sensitivity than is actually the case.
One overarching criticism that I would make of this literature is that within these methodological features of diagnostic research, there has been little attention paid to quantifying known or self-reported sexual behaviour in samples used to estimate sensitivity/specificity of phallometric tests. This is an idea that Ray Blanchard and colleagues suggested over a decade ago. In a sample that one wants to use to estimate sensitivity (i.e., accuracy at identifying pedophilic individuals), a researcher will need to consider the amount of sexual behaviour that sample as a whole is known to have exhibited towards children.
The idea is rather simple and works to alleviate one main methodological problem for this type of research. Since establishing a sample as purely pedophilic is a difficult task because there is not a “gold standard” reference for pedophilia, a researcher will need to establish a sample that is the most likely to be as close to 100% pedophilic as possible. The more known or self-reported sexual behaviour involving children that members of a sample have exhibited, the more confident a researcher can be that a sample is close to 100% pedophilic. The same goes for a sample of teleiophilic individuals used to estimate specificity: a researcher will need to establish quantitatively how much sexual activity a sample has had with adult women and also that they have had no sexual contacts with children.
So in this type of research with sexual offending samples, using the number of underage victims as a means of grouping offenders against children becomes a pivotal consideration. A recent study by Dr. James Cantor and myself has provided further support for the notion that the number of known or self-reported victims will affect the sensitivity of phallometric testing. Our findings support a few prior studies that show the sensitivity of phallometric testing increases when a sample has multiple victims. Applying this issue to the research presented in the excel sheet I assembled, studies where samples have less than 5 or more victims are likely to produce underestimates of the sensitivity of the phallometric test scores examined. For instance, Barsetti et al. (1998) may have underestimated the sensitivity of the Quinsey and Chaplin stimuli set used in the research, given the samples in the study had 2.3 and 1.5 victims.
Our findings support a few prior studies that show the sensitivity of phallometric testing increases when a sample has multiple victims.
A second issue that has recently re-emerged is the use of individuals who admit to pedophilic sexual interests to estimate the sensitivity of phallometric tests. While this group of individuals seems to be an ideal sample to estimate sensitivity of phallometric tests, this might not be the case. Individuals who admit to pedophilic sexual interests, if they were to present in a clinical setting where diagnosis of pedophilia was a question under examination, would not receive a phallometric test. This is because using a diagnostic test for pedophilia is typically moot if a person admits to having such an interest; in these cases, the diagnosis is established and further testing is unnecessary. In diagnostic accuracy research, one of the main concerns is to use samples of individuals with whom a diagnostic test is typically used in a clinical setting.
If a sample of admitting pedophiles is used to establish the sensitivity of a phallometric test score, then this sensitivity estimate is only valid for admitting pedophiles. This places a sharp limit on how generalizable such a finding is to clinical samples that deny pedophilic interests. For clinical forensic psychology, diagnostic phallometric testing will typically be used with those individuals who deny pedophilic interests in order to establish those who do and those who do not have these sexual interests. Further, those who admit to pedophilic interests may differ in real and important ways from those who deny such interests. These considerations leave me doubting the appropriateness of using samples of admitting individuals to estimate the sensitivity of a phallometric test.
In diagnostic accuracy research, one of the main concerns is to use samples of individuals with whom a diagnostic test is typically used in a clinical setting.
In a soon to be submitted paper, Dr. Cantor and I have examined what impact using a sample of individuals admitting to pedophilic interests might have on the sensitivity of the test. Fascinatingly, we found that in the admitting sample, the estimate of sensitivity is significantly higher when compared to the sensitivities found in samples of denying sexual offenders against children with differing numbers of victims (1, 2, 3, 4, or 5+ victims). The use of admitting samples might function to overestimate the sensitivity of a phallometric test to a significant degree. Given this finding, it is likely that researchers wanting to estimate the sensitivity of a phallometric test will need to track the number of admitting individuals in a sample in order to provide some indication of the rate of inflation that might be expected in their estimates due to these individuals.
The use of admitting samples might function to overestimate the sensitivity of a phallometric test to a significant degree.
I think these two methodological features, number of known/self-reported prepubescent victims and admitting status, are likely two key considerations for research aiming to establish the sensitivity and specificity of a phallometric test score. There may be more methodological features to consider when undertaking this type of research, but any researchers examining this issue are strongly urged to take these two considerations into mind in order to produce the most accurate estimates, which, by extension, are the fairest estimates to our clients and the public.
McPhail, I. V. (2015, October 25). Sensitivity and specificity in phallometric tests for pedophilia and hebephilia [Weblog post]. Retrieved from http://wp.me/p2RS15-bV.
NextGenForensic welcomes analytical, thoughtful, cogent, relevant, respectful, and proportional comments, either supportive or opposing, on posts. Comments that are vague, off-topic, factually inaccurate, or include arguments replete with logical fallacies, will not be approved.
Want to submit your own post? Click here to find out how!