There is evidence to support the use of Western developed violent risk assessment in China: Responding to Zhou et al. (2015)

Seung Chan Lee and Karl Hanson

The recent article published by Zhou and his colleagues (2015) concluded that there was little evidence to support the use in China of violent risk assessment instruments developed in Western countries. They made two claims: 1) the predictive validity estimates (AUCs) were noticeably lower in China than in Western countries, and 2) the values of predictive validity found in Chinese studies were poor. We believe that the evidence presented in the article does not support either of their claims. This post outlines our rationale for this belief.

We compared eight effect sizes (AUCs) from Zhou et al. (2015) with nine effect sizes from Western countries: 2 meta-analyses and 6 single studies (see the table at the bottom of the post; Appendices 1 and 2 provide links). We omitted Yao et al. (2014b) because of its overlap with Yao et al. (2012), which had a larger sample size and longer follow-up period. The Area under the Receiver Operating Characteristic Curve (AUC) can vary between 0 and 1, with .50 indicating the chance level of prediction. A higher value of AUC between .50 and 1 indicates greater accuracy with statistical significance if the 95% confidence interval does not include .50. AUC can be interpreted as the probability that a randomly selected recidivist has a higher score on the scale than a randomly selected non-recidivist.


As can be seen in the figure above, the AUC values among Chinese samples were largely equivalent to those found in Western countries. Specifically, the predictive accuracy of the Historical, Clinical, Risk Management-20 (HCR-20) among Chinese samples was almost identical to that in Western meta-analyses studies (Mdn = .703 vs. .705). The AUC values of Brøset Violence Checklist (BVC) from Chinese studies were a little lower, but similar to those of Western countries (Mdn  = .82 vs. .865). It is worth noting, however, that BVC (developed in Norway) had unusually high predictive accuracy (AUC  = .94) among Norwegian samples. In contrast, the values among other Western samples (i.e., Switzerland and Australia) were similar to or even lower than the values among Chinese studies (Mdn  = .82 vs. 815 without Norwegian sample).

Predictive validity estimates of Violence Risk Screening-10 (V-RISK-10) were noticeably lower in China than in Western samples (Mdn = .68 vs. .83). It is hard to tell, however, whether the Chinese values are low or the Western values are unusually high. All three predictive validity studies of V-RISK-10 have been conducted among Norwegian samples by the same authors; in addition, as seen in the funnel plot for detecting publication bias below, there is a risk of a publication bias for the Western studies (e.g., BVC and V-RISK-10). All the large effect sizes are based on small samples and there are no small sample studies with below average effect sizes.

As well, there are reasons to question the findings of Yao et al., which found the lowest AUC value (.63). First, the raters only received a two-hour briefing, which resulted in the low Intraclass Correlation Coefficient (ICC) values (ICC = .35 for total score). The low reliability of assessment scales places an upper limit on the predictive validity of the scales. Second, by hand-calculation from data reported in Table 1 on page 442 of Yao et al. (2012), we found that the AUC value based on the final judgment was higher than the value reported in the text (AUC = .70 instead of .63). Consequently, it is difficult to make any judgment about V-RISK-10 with these results.


Note. H = HCR-20, B = BVC, V =V-RISK-10, C = CRAP-T, and VS = VS-CM. Triangle indicates region where Western studies were expected but not found.

The predictive accuracy of Chinese modified version of Violence Scale (VS-CM) and the Chinese Risk Assessment Tool for Perpetrators (CPAP-T) had not been evaluated in Western countries. The predictive accuracy of these instruments among Chinese samples was very similar to those found in Western studies (AUC = .80 for VS-CM and .76 for CRAP-T; see table below).


Note. HCR-20 = Historical, Clinical, Risk Management-20, BVC = Brøset Violence Checklist, V-RISK-10 = Violence Risk Screening-10, CRAP-T = The Chinese Risk Assessment Tool for Perpetrators, VS-CM = Chinese modified version of Violence Scale.

Responding to the second claim that the values of predictive validity found in Chinese studies were poor, we need to clarify how good is good enough for the predictive accuracy of violent risk assessment tools. Currently, there is limited consensus how to report and describe the predictive accuracy of risk assessment tools. Two features of risk tools need to be considered: 1) discrimination, or how well the scale distinguishes between recidivist and non-recidivist, and 2) calibration, or the correspondence between observed and expected recidivism rates. In this review, we restricted our discussion to discrimination because there are no specific recidivism rates associated with scores on these instruments.

Interpretation of AUC values could be based on the context of diagnosis or of prognosis. In the diagnostic context, AUC values are used to detect whether or not something exits (e.g., x-ray for brain tumor). In the prognostic context, AUC values are related to the probability or likelihood of occurrence of an event that has not yet happened and may never happen.

Given the prognostic context of risk assessment forecasting a future event, it is unrealistic to expect similar accuracy as those in the diagnostic context. Consequently, the convention standards of AUCs in the prognostic context are lower than those in the diagnostic context (i.e., AUCs of .56, .64, and .71 corresponding to small, moderate and large effect size). Using these benchmarks, the AUC values found among Chinese samples indicates moderate to large effect sizes (mostly in the .70s and .80s).

We share Zhou et al.’s (2015) concern that is important to evaluate predictive accuracy among different ethnic groups. Contrary to Zhou et al. (2015), we believe that the available evidence suggests that predictive accuracy of the violent risk assessment tools developed in Western countries indicate moderate to large effect size among Chinese samples, similar estimates to those found in Western countries. Given that there are no better alternatives, the use of the violent risk assessment developed in Western countries to Chinese population is justified.

Appendix 1: Singh, Grann, & Fazel (2011); Yang, Wong, & Coid (2010).

Appendix 2: Abderhalden et al. (2006); Almvik, Woods, & Rasmussen (2007); Chu, Daffern, & Ogloff, (2013); Hartvig, Roaldset, Mger, Østberg, & Bjørkly (2011); Roaldset, Hartvig, & Bjørkly (2011);Roaldset, Hartvig, Linaker, & Bjørkly (2012).

Seung Chan Lee is a second year Ph.D. student at Carleton University. He completed his B.A. (Double major: Criminal Justice & Psychology) and M.A. in Forensic Psychology in the U.S. His primary research interest is evaluating the validity of risk assessment instruments of sexual offenders (e.g., Static-99R), across ethnic groups in North America. His further goal is to investigate risk-relevant characteristics of Asian sexual offenders as well as to achieve the international generalizability of risk assessment tools for sexual offenders.

Karl Hanson, Ph.D., C. Psych.,is one of the leading researchers in the field of sexual offender risk assessment and treatment. Originally trained as a clinical psychologist, he spent several years providing direct service to offenders before starting a research position with the Canadian Federal Government (Public Safety Canada). His mandate has been to advance policy-relevant knowledge concerning the assessment and treatment of offenders. Most of his research has focussed on sexual offenders, with a secondary interest in men who have been physically abusive to their intimate partners. He has published more than 150 articles, including several highly influential reviews, and has developed the most widely used risk assessment tools for sexual offenders (Static-99R; Static-2002R; STABLE-2007). He is a Fellow of the Canadian Psychological Association and the 2002 recipient of Significant Achievement Award from the Association for the Treatment of Sexual Offenders.


Suggested citation:
Lee, S., & Hanson, R. K. (2015, August 30). There is evidence to support the use of Western developed violent risk assessment in China: Responding to Zhou et al. (2015) [Weblog post]. Retrieved from

NextGenForensic welcomes analytical, thoughtful, cogent, relevant, respectful, and proportional comments, either supportive or opposing, on posts. Comments that are vague, off-topic, factually inaccurate, or include arguments replete with logical fallacies, will not be approved.

Want to submit your own post? Click here to find out how!

%d bloggers like this: