National Institute of Justice National Institute of Justice. Research. Development. Evaluation. Office of Justice Programs
skip navigationHome  |  Help  |  Contact Us  |  Site Map   |  Glossary
Reliable Research. Real Results. skip navigation
skip navigation

How We Review and Rate a Practice From Start to Finish

To be included on, practices undergo a seven-step review and evidence-rating process. On this page you'll find details of each step in this process. You also will find information on how and when we determine to re-review a Practice.

1 Preliminary ID. 2 Initial screening. 3 Literature search 4. initial evidence screening. 5 selection of evidence base. 6 expert review. 7 study classification. 8. Program evidence rating

1.  Preliminary practice identification

Practices are identified for potential inclusion on through:

  • Literature searches of relevant databases, journals and publications, including:
    • Social science databases using keywords identified in the areas of criminal justice, juvenile justice, and victims of crime;
    • Journals (including peer-reviewed journals) and other relevant resources; and
    • Other web-based databases of effective programs and meta-analyses of evaluated programs.
  • Nominations from the field:   See how to nominate a practice here: Nominate a Program or Practice for

2.  Initial practice screening

After practices are identified, research staff review practice materials to determine whether the goals of the practice fall within the scope of To fall within the scope, the practice must:

  • Aim to prevent or reduce crime, delinquency or related problem behaviors (such as aggression, gang involvement or school attachment);
  • Aim to prevent, intervene or respond to victimization;
  • Aim to improve justice systems or processes; and/or
  • Target an offender population or an at-risk population (that is, individuals who have the potential to become involved in the justice system).

Prevention practices not explicitly aimed at reducing or preventing a problem behavior must apply to a population at risk for developing problem behaviors.

3.  Practice definition and literature search

Based on the identified meta-analyses, Lead Researchers develop a clear definition that includes definite boundaries for what does and does not constitute the practice. Meta-analyses vary in scope such that some meta-analyses may address multiple practices or multiple populations. For example, often summarizes results separately for juveniles and adults, but both may be addressed within the same meta-analysis. The practice definition is used to develop search terms to identify additional meta-analyses, systematic reviews, and reviews in the practice area. does not limit the number of eligible meta-analyses that may be reviewed for any given practice.

4.  Initial evidence screening

All meta-analyses included in the evidence base for practices must meet the following minimum criteria:

  1. Intervention. The meta-analysis must include at least two studies of the practice of interest.

  2. Aggregation. The meta-analysis must aggregate the results from at least two studies.

  3. Primary aim of the intervention. The programs included in the meta-analysis must aim to address the goals identified in the initial screening stage (Step 2 above).

  4. Literature search. The literature search that guided the inclusion of primary studies in the meta-analysis must include at least two sources and must provide evidence that unpublished literature was sought in the search.

  5. Primary outcomes. The meta-analysis must report on at least one eligible outcome related to crime, delinquency, overt problem behaviors (e.g., aggression, gang involvement, substance abuse), crime victimization, justice system practices or policies, or risk factors for crime and delinquency.

  6. Control groups. All studies included in the meta-analysis must include an appropriate control, comparison or counterfactual condition, or the meta-analysis must analyze these studies separately from those that appropriate counterfactuals.

  7. Reporting of results. The meta-analysis must report effect sizes that represent the magnitude of the treatment effect.

  8. Combining effect sizes. When an average effect size is reported for multiple studies, all effect sizes in the combination must address the same type of relationship.

  9. Publication date. At least 50 percent of the studies included in the meta-analysis must be published or otherwise available on or after 1980.

  10. Age of samples. Samples included in the meta-analysis must be restricted to either adults or juveniles, or mean effect sizes for adults and juveniles must be reported separately.

5.  Expert review

Once the Lead Researcher selects the eligible meta-analyses that will comprise the practice’s evidence base, Study Reviewers begin the practice evidence review using the practice scoring instrument to assess the quality, strength and extent to which the evidence indicates that the practice achieves its goals. Each meta-analysis within the evidence base is assessed by at least two Study Reviewers. The scoring instrument indicates the overall rating for each outcome that is reviewed.

See the Practices Scoring Instrument (PDF)

The scoring for practices consists of two parts looking at overall quality and internal validity:

Part 1: Overall Quality is assessed for each eligible meta-analysis included within the evidence base for a practice. The Study Reviewers make this assessment based on information about the methods and procedures used in each meta-analysis. Multiple items are scored to determine overall quality and these items are weighted differently based on their importance.

Meta-analysis' Overall Quality
Dimension Overview Elements1
Methodological Quality Assesses the extent to which meta-analysis authors were attentive to the methodological quality of the primary studies included in the meta-analysis.
  • Methodological quality
Main Analysis Assesses multiple elements related to the quality of the statistical analysis used to calculate and report effect size estimates.
  • Handling dependent effect sizes
  • Effect size reporting
  • Weighting of results
  • Analysis model
  • Heterogeneity attentiveness
Eligibility and Search Assesses the degree to which the meta-analysis provides a clear statement of the inclusion and exclusion criteria for selecting studies to be included, and whether the literature search was comprehensive and not limited to commercial publishers.
  • Eligibility criteria
  • Comprehensive literature search
  • Grey literature search
Reliability, Outliers, and Publication Bias Assesses methods used in the meta-analysis to extract data from primary studies, account for extreme scores, and account for biases towards large and statistically significant effects in published findings.
  • Coder Reliability
  • Outlier Analysis
  • Publication Bias
1 Scores for elements in each dimension are combined and then weighted according to their importance to arrive at an Overall Quality score.

Study Reviewers assign numerical values to each element in the practice scoring instrument. The elements include a definition and other guidance reviewers consider when rating the elements. See the Practice Scoring Instrument for guidance on each element.

Part 2: Internal Validity is assessed for each outcome within each eligible meta-analysis for a practice. Internal validity refers to the extent to which changes in the outcome can be attributed to the intervention, program or practice. Stated differently, internal validity refers to the extent that the research design is free from threats that may bias the estimated effect. There may be multiple outcomes that are independently assessed within each meta-analysis.

Randomized controlled trials (RCTs) have the strongest inherent internal validity. uses the extent to which average effect sizes are based on results from RCTs to assess internal validity. Meta-analyses based on a higher proportion of RCTs generally receive higher internal validity scores. Quasi-experimental designs may also be included in eligible meta-analyses for, but the internal validity ratings will generally be lower as the proportion of quasi-experimental designs increases. See the Practice Scoring Instrument for guidance on how internal validity is scored.

6.  Outcome classification

To arrive at the outcome rating, the internal validity score for each outcome coded above is combined with the overall quality rating for the meta-analysis from which it was extracted. This information, along with information about the direction and statistical significance of the mean effect size, is used to categorize each outcome into one of following:

  • Class 1 – Strong evidence of a positive effect
    Highest quality evidence with statistically significant average effect size favoring the practice
  • Class 2 – Moderate evidence of a positive effect
    Moderate quality evidence with statistically significant average effect size favoring the practice
  • Class 3 – Negative effect
    Moderate to high quality evidence with statistically significant average effect size in the opposite direction of the intended effect for the practice
  • Class 4 – Non-significant or null effect
    Moderate to high quality evidence and the average effect size is not statistically significant from the comparison group
  • Class 5 – Insufficient evidence
    Limitations in the study design preclude from reporting further on these average effect sizes regardless of their direction or statistical significance.

Discrepancies among Reviewers: In the event that there is a classification discrepancy between the Study Reviewers, the Lead Researcher will work to achieve a consensus classification. If necessary, the Lead Researcher will also review the study and make a final determination on the classification.

For more information about scoring and classifying outcomes, see the Practice Scoring Instrument.

7.  Outcome evidence rating

In cases where there is only one eligible meta-analysis for each outcome in a practice, the conversion from outcome classification described above to final outcome evidence rating is straightforward.

Outcome Evidence Rating Icon* Determination of Outcome Evidence Rating
Effective effective icon Outcomes in Class 1
Promising promising icon Outcomes in Class 2
No Effects no effect icon Outcomes in Class 3 or 4
Insufficient Evidence N/A Outcomes in Class 5 – does not receive an icon

In some cases, multiple meta-analyses provide information about the same outcome. For each meta-analysis, that outcome is first rated separately for quality and validity as described above and then classified according to the five classes in Step 5 above. An additional step is needed to arrive at a final evidence rating for that outcome: Outcome findings are combined from multiple meta-analyses using the process below. does not limit the number of eligible meta-analyses that may be included.

Each class of outcomes is weighted as follows:

  • Class 1 = 3
  • Class 2 = 1
  • Class 3 = -3
  • Class 4 = 0
  • Class 5 is not counted

Ratings for the same outcome from multiple meta-analyses are summed and averaged. The final evidence rating for the outcome is assigned based on this score.

  • If the averaged points ≥ 1.50, then final outcome rating is Effective.
  • If the averaged points are ≥ 0.50 and ≤ 1.49, the final outcome rating is Promising.
  • If the averaged points ≤ 0.49, the final outcome rating is No Effects.
Outcome Evidence Rating Icon* Determination of Outcome Evidence Rating
Effective effective multi-study icon The averaged points ≥ 1.50
Promising promising icon The averaged points ≥ 0.50 and ≤ 1.49
No Effects no effect icon The averaged points ≤ 0.49

*A single study icon (effective icon) is used to identify practice outcomes that have been subjected to one meta-analysis. A multiple studies icon (effective multi-study icon) is used to represent a greater extent of evidence supporting the evidence rating. The icon depicts practice outcomes that have more than one meta-analysis in the evidence base demonstrating effects in a consistent direction.

Read more about: the Practices Scoring Instrument.

Re-Reviewing a Practice and Updating a Rating

We consider re-reviewing a practice's rating when:

  • New meta-analyses, or meta-analyses not previously identified, are found that meet the criteria. This may include meta-analyses that have been updated since the original review.

The new materials may or may not be sufficient to warrant a new evidence rating. If a Lead Researcher determines that there is sufficient evidence in the new materials to warrant another review, then the new information is sent to Practice Reviewers for assessment. Even if the outcome rating does not change, the new evidence and materials may be included or referenced on the profile page.

For more information: Inquiring About or Appealing an Evidence Rating.