National Institute of Justice National Institute of Justice. Research. Development. Evaluation. Office of Justice Programs
skip navigationHome  |  Help  |  Contact Us  |  Site Map   |  Glossary
Reliable Research. Real Results. skip navigation
skip navigation


Program Review and Rating from Start to Finish

To be included on, programs undergo an eight-step review and evidence rating process conducted through a contract with Development Services Group, Inc. (DSG):

Program Review and Rating from Start to Finish

1.  Preliminary program identification

Programs are identified for potential inclusion on via:

  • Literature searches of relevant databases, journals, and publications: This includes searching:
    • Social science databases using keywords identified in the areas of criminal justice, juvenile justice, and victims of crime;
    • Journals (including peer-reviewed journals) and other relevant resources; and
    • Other Web-based databases of effective programs (e.g., the Department of Education’s What Works Clearinghouse and the Substance Abuse and Mental Health Services Administration’s National Registry of Evidence-Based Programs and Practices) and meta-analyses of evaluated programs (e.g., such as those produced by the Campbell Collaboration or published in peer-reviewed journals).
  • Nominations from the field: Nominations of programs to be reviewed can be submitted via by anyone, including experts, practitioners, organizations, and program developers. To see how to nominate a program, please review: Nominate a Program for

2.  Initial program screening

After programs are identified, research staff review program materials to determine whether the goals and/or activities of the program fall within the scope of To fall within the scope, the program must:

  • Aim to prevent or reduce crime, delinquency, or related problem behaviors (such as aggression, gang involvement, or school attachment);
  • Aim to prevent, intervene, or respond to victimization;
  • Aim to improve justice systems or processes; and/or
  • Target an offender population or an at-risk population (that is, individuals who have the potential to become involved in the justice system).

Prevention programs not explicitly aimed at reducing or preventing a problem behavior must apply to a population at risk for problem behaviors.

3.  Literature search

If the program’s scope meets criteria, research staff then expand the search for evaluations, research, and program materials to identify all relevant information needed for Lead Researcher and Study Reviewer consideration. Non-experimental, qualitative, ethnographic, and case study research is collected if it adds contextual information to the program description, but is not used to determine the evidence rating for a program.

4.  Initial evidence screening

Once the literature search is complete, research staff review the newly identified studies to determine whether they meet the criteria for evidence. To be considered for expert review, the program’s evaluation evidence must meet the following minimum requirements:

  • The program must be evaluated with at least one randomized field experiment or quasi-experimental research design (with a comparison condition).
  • The outcomes assessed must relate to crime, delinquency, or victimization prevention, intervention, or response.
  • The evaluation must be published in a peer-reviewed publication or documented in a comprehensive evaluation report.
  • The date of publication must be 1980 or after.

5.  Selection of evidence base

A Lead Researcher, with subject matter and research methodology expertise, selects up to three studies representing the most rigorous study designs and methodologies from all available evaluations of the program (some programs may only have one or two studies available). The studies selected comprise the program’s evidence base and will be scored by Study Reviewers, ultimately to be used as the basis for the program’s evidence rating. (See the FAQ on Why does use a maximum of three evaluations to determine an evidence rating for a program?) Any additional studies identified through the literature search, but not included in the evidence base, will serve as supporting documentation.

The criteria used to determine the three most rigorous studies include:

  • Strength of research design
  • Breadth of documentation
  • Type of analytic procedures used
  • Sample size
  • Independence of evaluator
  • Year of publication

If a Study Reviewer believes there is a compelling reason to review more than three studies, he/she may contact the Lead Researcher to request additional studies for review. The Lead Researcher will then make the final determination. In addition, multiple articles and publications that report on various aspects of a single study are generally treated as one study for purposes of the review; however, two studies that utilize the same dataset, but include different follow-up periods or analyses may be considered separately, on a case-by-case basis.

6. Expert review

Once the Lead Researcher selects the studies that will comprise the program’s evidence base, trained and certified Study Reviewers begin the program evidence review using the scoring instrument. Each study within the evidence base is assessed by at least two Study Reviewers.

Scoring Instrument

The scoring instrument consists of two parts:

Part 1 of the Scoring Instrument: Conceptual Framework is assessed only once for each program, regardless of the number of studies in the evidence base. The Study Reviewers make this assessment based on information from all of the studies and program materials they have received. These additional program materials may include non-experimental, qualitative, ethnographic, and case study research as well as implementation materials.

Program’s Conceptual Framework
Dimension Overview Elements
Conceptual Framework Assesses the degree to which the program is grounded in the research literature.
  • Prior research
  • Theoretical base
  • Program description

Part 2 of the Scoring Instrument: Quality, Outcomes, and Fidelity is completed for each evaluation study that is included as part of the evidence base (up to three studies). It includes the research design quality, outcome evidence, and program fidelity.

Study Quality, Outcomes, and Fidelity
Dimension Overview Elements
Design Quality Assesses the quality of the research design. The Study Reviewers are also required to note specific information, such as threats to validity.
  • Type of research design
  • Sample size
  • Statistical adjustment (if applicable)
  • Instrumentation
  • Internal validity
  • Follow-up period
  • Displacement/diffusion (if applicable)
Outcome Evidence Assesses the quality of the results. (Note: Outcomes are considered and rated separately within this dimension because programs may target multiple outcomes. In addition, the assessment focuses on the programs’ primary, intended outcomes.)
  • Substantive program effects
  • Behavior change
  • Outcomes
Program Fidelity Assesses the degree to which the program is delivered as designed and intended.
  • Documentation
  • Adherence

Study Reviewers assign numerical values to each element in the scoring instrument. The elements include a definition and other guidance Reviewers consider when rating the elements. In the program review information, the Reviewers also make note of any other information that should be highlighted as being of particular importance. See the Scoring Instrument for guidance on each element.

The Study Reviewer is responsible for making a reasonable determination (i.e., supported or justified by fact or circumstance) as to the strength of the conceptual framework, research design, outcome evidence, and program fidelity based on the provided documentation and his/her specialized knowledge with regard to program evaluation.

Reviewer Confidence: As a final step on the scoring instrument, Study Reviewers provide an assessment as to their overall confidence in the study design. If both Study Reviewers agree, and the Lead Researcher concurs, that there is a fundamental flaw in the study design (not captured in the Design Quality dimension) that raises serious concerns about the study’s results, the study is removed from the evidence base and not factored into the evidence rating. This final determination serves as an additional safeguard to ensure that only the most rigorous studies comprise the evidence base. The study citation will be listed among the program’s additional references.

7.  Study classification

The score for each of the four dimensions (Conceptual Framework, Design Quality, Outcome Evidence, and Program Fidelity) is calculated separately and used to assess each study. Based on the scores, the study is classified as one of the following:

  • Class 1 Studies are very rigorous and well-designed and find significant, positive effects on justice-related outcomes.
  • Class 2 Studies are well-designed, but slightly less rigorous and/or there may be limitations in their design. They find significant, positive effects on justice-related outcomes.
  • Class 3 Studies are very rigorous and well-designed and find significant, harmful effects on justice-related outcomes.
  • Class 4 Studies are very rigorous and well-designed and find no significant effects on justice-related outcomes.
  • Class 5 Studies do not provide enough information or have significant limitations in study design such that it is not possible to establish a causal relationship to the justice-related outcomes.

Discrepancies among Reviewers: In the event that there is a classification discrepancy between the Study Reviewers, the Lead Researcher will work to achieve a consensus classification. If necessary, the Lead Researcher will also review the study and make a final determination on the classification.

For more information about scoring and classifying studies, see the Scoring Instrument.

8.  Program evidence rating

To reach an evidence rating for each program, the study-level information is aggregated.

All evidence ratings based on 1-3 studies are classified as follows:

Evidence Rating* Study Classification
Class 1 -
Strong Evidence of Positive Effect
Class 2 -
Some Evidence of Positive Effect
Class 3 -
Strong Evidence of Negative Effect
Class 4 -
Strong Evidence of Null Effect
Class 5 -
Insufficient Information
Effective effective icon
Programs have strong evidence to indicate they achieve their intended outcomes when implemented with fidelity.
Must have at least 1 study in Class 1 May have up to 2 studies in Class 2 Must have 0 studies in Class 3 May have up to 1 study in Class 4 Studies do not determine Evidence Rating
Promising promising icon
Programs have some evidence to indicate they achieve their intended outcomes.
Must have 0 studies in Class 1 Must have at least 1 study in Class 2 Must have 0 studies in Class 3 May have up to 1 study in Class 4 Studies do not determine Evidence Rating
No Effects no effect icon
Programs have strong evidence indicating that they had no effects or had harmful effects when implemented with fidelity.
Must have 0 studies in Class 1 Must have 0 studies in Class 2 Must have at least 1 study in either Class 3 or Class 4 Studies do not determine Evidence Rating

*effective icon A single study icon is used to identify programs that have been evaluated with only one study.

effective multi-study icon A multiple studies icon is used to represent a greater extent of evidence supporting the evidence rating. The icon depicts programs that have more than one study in the evidence base demonstrating effects in a consistent direction.

Inconsistent evidence. In some cases, the evidence for a program may be inconsistent, for example, if there is one study indicating a statistically significant positive effect (i.e., Class 1 or Class 2); one study indicating a statistically significant null effect (Class 4); and no third study is available for consideration. In such cases, the Lead Researcher will also review both studies and make a final determination on whether a final evidence rating can be assigned.

Insufficient evidence. periodically updates a static list of programs that have been reviewed by Study Reviewers, but not assigned an evidence rating due to lack of evidence. A program is placed on the insufficient evidence list if the study (or studies) reviewed received only Class 5 study ratings indicating that there were significant limitations in the study design such that it was not possible to establish a causal relationship to the program’s justice-related outcomes (as outlined in the above program evidence rating chart). See the List of Programs with Insufficient Evidence.

Read more about: Researchers and Reviewers or see the Scoring Instrument.

Read more about the connection between and the Office of Juvenile Justice and Delinquency Prevention (OJJDP)’s Model Programs Guide of juvenile programs.

Search Programs