A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A method for analyzing the differences in the means of two or more groups. Specifically, this procedure partitions the total variation in the dependent variable into two components: between-group variation and within-group variation. It allows researchers to determine if the differences between a control group and a treatment group are attributed to the independent variable or treatment.
Occurs when the effects of a program are observed prior to the implementation of the program, generally because the target population believes the program has already started. This element is reviewed along with diffusion and displacement on the CrimeSolutions.gov Scoring Instrument. These elements are typically considered in evaluations of community-level crime prevention efforts. See Program Review and Rating from Start to Finish for more information.
Evidence that documents a relationship between an activity, treatment, or intervention (including technology) and its intended outcomes, including measuring the direction and size of a change, and the extent to which a change may be attributed to the activity or intervention. Causal evidence depends on the use of scientific methods to rule out, to the extent possible, alternative explanations for the documented change. This differs from descriptive evidence.
A statistical test used to compare differences between observed, categorical data and expected data (based on a specific hypothesis) to determine if any difference that occurred is the same as would occur by chance.
Comparative effectiveness research (CER): An evaluation approach to show the relative strengths and weaknesses of two (or more) programs on the same outcome. This approach generally compares a target program with another program, instead of a true control condition [i.e., treatment as usual (TAU) or no treatment]. The CrimeSolutions.gov review process
evaluates the effectiveness of a target program, by comparing a group that receives the program with a group that receives TAU or no treatment. Currently, CER studies are not eligible for review for CrimeSolutions.gov because the comparison group is considered to receive more than TAU.
A group of individuals whose characteristics are similar to those of a treatment group. Comparison group individuals may not receive any services, or they may receive a different set of services, treatment, or activities as the treatment group. In no instance do they receive the same services as the individuals being evaluated (the treatment group). Comparison groups are used in quasi-experimental designs where random assignment is not possible or practical.
A group of individuals whose characteristics should be almost identical to those of the treatment group but do not receive the program services, treatments, or activities being evaluated. In experimental designs, individuals are placed into control groups and treatment groups through random assignment.
A statistical term that measures the degree of the relationship between two variables. A correlation has two components, magnitude and direction. Magnitude is a measure of strength and ranges from 0, no correlation, to 1, perfect correlation. Direction determines whether a correlation is positive or negative. A positive correlation means that as one variable, X, increases so does another variable, Y. A negative correlation means that as one variable, X, decreases so does another variable, Y. An inverse correlation means that as one variable, X, increases the other variable, Y, decreases and vice versa. For example, if variables X and Y have a correlation of 0.7 this means they have a strong, positive relationship. Correlation does not imply a causal relationship between variables.
A variable whose outcome is influenced or changed by some other variable, usually the independent variable or the treatment. It is the “effect” or outcome variable in a cause and effect relationship.
Evidence used to characterize individuals, groups, events, processes, trends, or relationships using quantitative statistical methods, correlational methods, or qualitative research methods. This differs from causal evidence.
A standardized, quantitative index representing the magnitude and direction of an empirical relationship. More specifically, the effect size is a value that reflects the magnitude of the treatment effect. An effect size from an outcome evaluation represents the change in an outcome measure from before a program is implemented to the follow-up period. The effect size of the treatment group can be compared to the effect size from the control group to determine if there are any differences, and if so, whether those differences are statistically significant (which allows for greater confidence that the difference was due to the program). See Statistical Significance for more information. The most common types of effect sizes in the criminal justice and delinquency literature are the standardized mean difference effect size; odds ratios and risk ratios; and correlation coefficients.
In program evaluation, the effect size is typically hypothesized a priori to guide decisions about needed sample size and the likelihood of Type I and Type II errors (See Type I Error and Type II Error for more information). In a meta-analysis, the effect sizes from the various evaluation studies are standardized to be in the same form. By representing the findings of each study included in a meta-analysis in the same form, this permits a synthesis of those findings across studies. After evaluation data are analyzed, an actual effect can usually be estimated from the data, and this value is often used as a basis for comparative effectiveness research on alternative interventions.
The magnitude of an effect size is often judged using “rules of thumb” from social science research. For example, standardize mean difference effect sizes (Cohen’s d or Hedge’s g) are judge using the following rules: small=0.20; medium=0.50; large=0.80. These are not hard cut-off points but rather approximation. There are different standards for each type of effect size.
The strength of the evidence demonstrating that a program achieves its intended outcomes.
Information about a question that is generated through systematic data collection, research, or program evaluation using accepted scientific methods that are documented and replicable. Evidence may be classified as either descriptive or causal.
Refers to one of three designations on CrimeSolutions.gov indicating the extent of the evidence that a program works. The three designations are: "Effective" (
), "Promising" (
), and "No Effects" (
). A single study icon is used to identify programs that have been evaluated with only one study. A multiple studies icon (
) is used to represent a greater extent of evidence supporting the evidence rating. The icon depicts programs that have more than one study in the evidence base demonstrating effects in a consistent direction. For practices, the rating designations take a slightly different meaning. Read more About CrimeSolutions.gov
or about Program Review and Rating from Start to Finish
or about Practice Review and Rating from Start to Finish
The National Institute of Justice considers programs to be evidence-based when their effectiveness has been demonstrated by causal evidence obtained through high quality outcome evaluations and that have been replicated and evaluated in at least three sites.
NIJ defines high quality outcome evaluations as those using rigorous, randomized controlled trials on programs implemented with fidelity.
The degree to which a program’s core services, components, and procedures are implemented as originally designed. Programs replicated with a high degree of fidelity are more likely to achieve consistent results. See Program Review and Rating from Start to Finish for more information.
The length of time that the study period continues after the program ends to determine the program’s sustained or continued effects. This is a dimension
in the CrimeSolutions.gov Scoring Instrument
Research and evaluations that are not controlled by commercial publishers (i.e., not published in a peer-review journal or a book). Sources of grey literature or unpublished studies include dissertations, theses, government reports, technical reports, conference presentations, and other unpublished sources. This is a dimension in the CrimeSolutions.gov Practices Scoring Instrument that assesses the extent to which a meta-analysis includes results from unpublished or “grey” literature sources. A meta-analysis should always attempt to include grey literature due to consistent evidence that the nature and direction of research findings is often related to publication status. See Publication Bias for more information.
Note: If the literature search does not include an effort to locate unpublished studies, or is explicitly restricted to published literature, it is not eligible for inclusion as a practice on CrimeSolutions.gov.
Refers to the variability of the effect sizes from the different evaluation studies included in a meta-analysis (e.g., some evaluations may show strong, significant effects while other evaluations show small or no effects). This is a dimension
in the CrimeSolutions.gov Practices Scoring Instrument
that rates a meta-analysis on whether the authors were aware of and attentive to heterogeneity (i.e., variability) in the effect sizes from the studies in the meta-analysis. Heterogeneity statistics include tau (t), tau-squared (t2), Q, or I-squared (I2).
An event that takes place between the pretest (data collected prior to the treatment beginning) and the posttest (data collected after the treatment ends) that has nothing to do with the treatment but may impact observed outcomes. History is a potential threat to Internal Validity. See Program Review and Rating from Start to Finish for more information.
Refers to implementing a program in the same or similar manner targeting the same or similar population in order to achieve the same results that occurred when a program was originally implemented.
Programs or practices with inconclusive evidence are those that have been reviewed by CrimeSolutions.gov Study Reviewers, but were not assigned an evidence rating due to limitations of the studies included in the programs' evidence base. Programs are placed on the inconclusive evidence list if the study (or studies) reviewed (1) had significant limitations in the study design or (2) lacked sufficient information about program fidelity so that it was not possible to determine if the program was delivered as designed.
Note that these programs and practices were previously referred to as "insufficient evidence."
A variable that changes or influences another variable, usually the dependent variable. This is often the treatment in experimental designs and precedes the outcome variable in time. It is the “cause” in a cause and effect relationship.
The measures used in a study. The instrumentation quality is dependent on the measures’ reliability and validity. Reliability refers to the degree to which a measure is consistent or gives very similar results each time it is used, and validity refers to the degree to which a measure is able to scientifically answer the question it is intended to answer. Instrumentation is a component considered within Internal Validity. See Program Review and Rating from Start to Finish for more information.
The results that a program deliberately sets out to achieve by its design (i.e., the program’s goals). For example, a reentry program’s intended outcomes might be to reduce recidivism among program participants.
An analysis based on the initial treatment intent, not on the treatment eventually administered. For example, if the treatment group has a higher attrition rate than the control or comparison group, and outcomes are compared only for those who completed the treatment, the study results may be biased. An intent-to-treat design ensures that all study participants are followed until the conclusion of the study, irrespective of whether the participant is still receiving or complying with the treatment.
The degree to which observed changes can be attributed to the program. The validity of a study depends on both the research design and the measurement of the program activities and outcomes. Threats to internal validity may affect the extent to which observed effects may be attributed to a program or intervention, on CrimeSolutions.gov’s Scoring Instrument, which includes: Attrition, Maturation, Instrumentation, Regression toward the Mean, Selection Bias, Contamination,and History, as well as other factors. See Program Review and Rating from Start to Finish for more information.
On CrimeSolutions.gov’s Scoring Instrument for practices, internal validity is measured by the number of randomized controlled trials used to calculate the mean effect size. Mean effect sizes calculated using only randomized controlled trials are considered to have fewer threats to internal validity then mean effect sizes calculated using only quasi-experimental designs. See Practice Review and Rating from Start to Finish for more information.
Subject matter and research methodology experts who serve a leadership role in selecting the studies that comprise the evidence base for a program or practice and who coordinate the review process for a given topic area on CrimeSolutions.gov. They also ensure that any scoring discrepancies between Study Reviewers are resolved and consensus is achieved prior to a program or practice being assigned a final Evidence Rating. Read more about CrimeSolutions.gov Researchers and Reviewers.
In general terms, meta-analysis is a social science method that allows us to look at effectiveness across numerous evaluations of similar, but not necessarily identical, programs, strategies, or procedures. Meta-analysis examines conceptually similar approaches and answers the question, "on average, how effective are these approaches?" On CrimeSolutions.gov, we use the term "practices" to refer to these categories of similar programs, strategies, or procedures and meta-analyses form the evidence-base for practices.
A more precise definition for meta-analysis is that it is the systematic quantitative analysis of multiple studies that address a set of related research hypotheses in order to draw general conclusions, develop support for hypotheses, and/or produce an estimate of overall program effects.
A statistical method that allows researchers to estimate separately the variance between subjects within the same setting, and the variance between settings. For example, when evaluating a school-based program it is important to know the variation of students within the same school as well as the variation of students between different schools. This ensures that when programs are evaluated, the effects are not attributed to the program when there could be underlying differences between schools or between the students in those schools.
Research strategy and analytic technique that involves the investigation of more than two variables at the same time or within the same statistical analysis. For example, in a multiple regression analysis, the effects of two or more independent variables are assessed in terms of their impact on the dependent variable.
Refers to a research design in which participants are not assigned to treatment and control/comparison groups (randomly or otherwise). Such designs do not allow researchers to establish causal relationships between a program or treatment and its intended outcomes. Non-experimental designs are sometimes used when ethics or circumstances limit the ability to use a different design or because the intent of the research is not to establish a causal relationship. Examples of non-experimental designs include case studies, ethnographic research, or historical analysis.
An agency of U.S. Department of Justice, the Office of Justice Programs works in partnership with the justice community to identify the most pressing crime-related challenges confronting the justice system and to provide information, training, coordination, and funding of innovative strategies and approaches to address these challenges. The following bureaus and offices are part of the Office of Justice Programs: the Bureau of Justice Assistance (BJA), the Bureau of Justice Statistics (BJS), the National Institute of Justice (NIJ), the Office of Juvenile Justice and Delinquency Prevention (OJJDP), the Office for Victims of Crime (OVC), and the Office of Sex Offender Sentencing, Monitoring, Apprehending, Registering, and Tracking (SMART). Read more About the Office of Justice Programs.
A formal study that seeks to determine if a program is working. An outcome evaluation involves measuring change in the desired outcomes (e.g., changes in behaviors or changes in crime rates) before and after a program is implemented, and determines if those changes can be attributed to the program. Outcome evaluations can use many different research designs: randomized controlled trials, quasi-experimental designs, time-series analysis, simple pre/posttest, etc. For CrimeSolutions.gov, a program must be evaluated with at least one randomized controlled trial or quasi-experimental research design (with a comparison condition) in order for the outcome evaluation to be included in the program’s evidence base. See Program Review and Rating from Start to Finish for more information.
The intended results of a program’s activities or operation and a dimension in the CrimeSolutions.gov Scoring Instrument. Primary outcomes refer to the primary or central intended effects of a program. Within the scope of CrimeSolutions.gov, those primary outcomes must also relate to criminal justice, juvenile justice, or victim services. Secondary outcomes are the ancillary effects of a program. Outcomes are considered and rated separately within this dimension because programs may target multiple outcomes. Examples of outcomes include: reducing drug use, increasing system response to crime victims, and reducing fear of crime.
The intended results of a practice’s activities or operation and a dimension
in the CrimeSolutions.gov Scoring Instrument
. Tier 1 outcomes refer to the general outcome constructs (e.g., crime/delinquency, drugs and substance abuse, mental/behavioral health, education, victimization, family, etc.). Tier 2 outcomes refer to the specific outcome constructs (e.g., property offenses, sex-related offenses, or violent offenses under crime/delinquency; alcohol, cocaine/crack cocaine, and heroin/opioids under drugs and substance abuse; internalizing behavior, externalizing behavior, and psychological functioning under mental/behavioral health; etc.). On CrimeSolutions.gov’s Scoring Instrument
for practices, all effect sizes are coded to a Tier 2 outcome construct.
An unusually high or low effect size. When combining effect sizes from various evaluations, extreme outliers can potentially distort the overall mean effect size. This is a dimension
in the CrimeSolutions.gov Practices Scoring Instrument
that assesses whether the meta-analysis checks for effect size outliers in the data. Note that this item refers to outlying effect sizes included in the meta-analysis, not the outlying data in the evaluation studies that contributed to the meta-analysis.
Refers to the practical importance of an effect size. For example, an outcome evaluation may show that the treatment group performed statistically significantly better than the control group following participation in a program, but if the effect size is very small and the program costs are very high, the results may not be practically significant. Practical significance can be subjective, and can be assessed by looking at the magnitude of the effect size, the costs and resources of the program, and various other factors.
A general category of programs, strategies, or procedures that share similar characteristics with regard to the issues they address and how they address them. CrimeSolutions.gov uses the term “practice” in a very general way to categorize causal evidence that comes from meta-analyses of multiple program evaluations. Using meta-analysis, it is possible to group program evaluation findings in different ways to provide information about effectiveness at different levels of analysis. Therefore, practices on CrimeSolutions.gov may include the following:
- Program types – A generic category of programs that share similar characteristics with regard to the matters they address and how they do it. For example, family therapy is a program type that could be reported as a practice in CrimeSolutions.gov.
- Program infrastructures – An organizational arrangement or setting within which programs are delivered. For example, boot camps may be characterized as a practice.
- Policies or strategies – Broad approaches to situations or problems that are guided by general principles but are often flexible in how they are carried out. For example, hot spots policing may be characterized as a practice.
- Procedures or techniques – More circumscribed activities that involve a particular way of doing things in relevant situations. These may be elements or specific activities within broader programs or strategies. For example, risk assessment.
On the CrimeSolutions.gov website, a practice is distinguished from a program. Whereas the evidence base for a practice is derived from one or more meta-analyses, the evidence base for a program is derived from one to three individual program evaluations.
To determine if a program works, most of the outcome evidence must indicate effectiveness. This is part of a dimension
in the CrimeSolutions.gov Scoring Instrument
A study that seeks to determine if a program is operating as it was designed to. Process evaluations can be conducted in a number of ways, but may include examination of the service delivery model, the performance goals and measures, interviews with program staff and clients, etc. Process evaluations are not included in a program’s evidence base and therefore do not determine a program’s evidence rating, but may be used as supporting documentation. See Program Review and Rating from Start to Finish
for more information.
A planned, coordinated group of activities and processes designed to achieve a specific purpose. A program should have specified procedures (e.g., a defined curriculum, an explicit number of treatment or service hours, and an optimal length of treatment) to ensure the program is implemented with fidelity to its model. It may have, but does not necessarily need, a “brand” name and may be implemented at single or multiple locations.
On the CrimeSolutions.gov website, a program is distinguished from a practice. Whereas the evidence base for a program is derived from one to three individual program evaluations, the evidence base for a practice is derived from one or more meta-analyses.
Broadly refers to the idea that published evaluations are more likely to show large and/or statistically significant program effects, whereas unpublished evaluations are more likely to show null, small, or “negative” (i.e., opposite of what would be predicted) program effects. This is a dimension
in the CrimeSolutions.gov Scoring Instrument
that rates the extent to which a meta-analysis investigates the potential for publication bias in the sample of included studies.
A research design that resembles an experimental design, but in which participants are not randomly assigned to treatment and control groups. Quasi-experimental designs are generally viewed as weaker than experimental designs because threats to validity cannot be as thoroughly minimized. This reduces the level of confidence that observed effects may be attributed to the program and not other variables.
Refers to an experimental research design in which participants are randomly assigned to a treatment or a control group. Most social scientists consider random assignment to lead to the highest level of confidence that observed effects are the result of the program and not other variables.
To receive a “randomized controlled trial” tag on CrimeSolutions.gov, a program must include in the evidence base at least 1 study that (1) allocates groups via a valid random assignment procedure; and (2) is rated highly for overall design quality by CrimeSolutions.gov Study Reviewers and (3) has outcome evidence consistent with the overall program rating.
The plan for how a study’s information is gathered that includes identifying the data collection method(s), the instrumentation used, the administration of those instruments, and the methods to organize and analyze the data. The quality of the research design impacts whether a causal relationship between program treatment and outcome may be established. Research designs may be divided into three categories: experimental, quasi-experimental, and non-experimental. See the Program Review and Rating from Start to Finish for more information.
As a final step on the Scoring Instrument, Study Reviewers provide an assessment as to their overall confidence in the study design. If both Study Reviewers agree, and the Lead Researcher concurs, that there is a fundamental flaw in the study design (not captured in the Design Quality dimension) that raises serious concerns about the study’s results, the study is removed from the evidence base and not factored into the Evidence Rating. This final determination serves as an additional safeguard to ensure that only the most rigorous studies comprise the evidence base. The study citation will be listed among the program’s additional references. See Program Review and Rating from Start to Finish for more information.
A sample is the subset of the entire population that is included in a research study. Typically, all else being equal, a larger sample size leads to increased precision in estimates of various properties of the population. The sample size affects the statistical power of a study and the extent to which a study is capable of detecting meaningful program effects. It is included as an element with Statistical Power in the CrimeSolutions.gov Scoring Instrument. See Program Review and Rating from Start to Finish for more information.
The method by which aspects, strengths, and weaknesses of programs and practices are consistently and objectively rated for evidence. For programs, the scoring instrument is a compilation of the dimensions and elements of a research study that are reviewed and assigned a numerical score by the CrimeSolutions.gov Study Reviewers in order to assess the evidence of a program’s effectiveness. The instrument provides a standard method to assess the quality of each program’s evidence base, while also reflecting Study Reviewers’ judgment and expertise. A similar method of scoring the aspects of meta-analyses is used for practices. See Program Review and Rating from Start to Finish or Scoring Instrument for more information.
Occurs when study participants are assigned to groups such that pre-existing differences (unrelated to the program, treatment, or activities) impact differences in observed outcomes. Selection bias threatens the study’s Internal Validity. Even if the subjects are randomly assigned, this threat is of particular concern with studies that have small samples. See Program Review and Rating from Start to Finish for more information.
The ability of a statistical test to detect meaningful program effects. It is a function of several factors, including: 1) the size of the sample; 2) the magnitude of the expected effect; and 3) the type of statistical test used. Statistical power is an element within Sample Size on the CrimeSolutions.gov Scoring Instrument. See Program Review and Rating from Start to Finish for more information.
In an evaluation, statistical significance refers to the probability that any differences found between the treatment group and control group are not due to chance but are the result of the treatment group’s participation in the program or intervention being studied. For example, if an outcome evaluation finds that after participating in a substance abuse program, the treatment group was statistically significantly less likely to abuse substances compared with the control group, this means that the difference between the two groups is likely due to the program and not due to chance.
In social science, researchers generally use a p-value of 0.05 or less, which means the probability that the difference between the treatment group and control group is due to chance is less than 5 percent. The p=0.05 is the cut-off point that CrimeSolutions.gov Expert Reviewers use to score whether an outcome is statistically significant. If the p-value is larger than 0.05, the outcome is not statistically significant, and the difference between the treatment and control group could be due to chance. See Program Review and Rating from Start to Finish for more information.
Subject matter and research methodology experts who review and assess the individual evaluation studies (for programs) or meta-analyses (for practices) that comprise the evidence base upon which CrimeSolutions.gov ratings are based. All Reviewers must complete training and receive certification prior to becoming a Study Reviewer. Read more about CrimeSolutions.gov Researchers and Reviewers.
A subset of the full study sample other than “a priori” treatment
, or control groups
. Researchers sometimes conduct analyses on subgroups when they want to examine whether the program or practice works better for one type of participant or another (for example, whether a juvenile prevention program works better for boys than girls, or whether a prison-based program works better for high-risk than low-risk offenders).
Analysis that involves dividing the analyzed full study sample into a subset of study participants, most often to make comparisons between them. While subgroup analyses can provide valuable information, they are most often observational, or correlational, analyses, as no proper comparison/control groups are included in these analyses. Under the CrimeSolutions.gov review process, only the full study sample is scored (even if the study authors state clearly an “a priori” theoretical rationale for why the program or practice would be expected to work for a given subgroup and not another).
Analyses of subgroups are described in the Additional Information sections of CrimeSolutions.gov program profiles, but the results do not impact the program’s overall evidence rating. Examples of subgroups that may be reported in program profiles include those categorized by sex (e.g., male versus female); race/ethnicity (e.g. black, Hispanic); age (e.g., older versus younger participants); setting (e.g., urban versus suburban versus rural); risk status (e.g., high-risk versus low-risk); family structure (e.g., single-parent versus two-parent household); delivery setting (e.g., community versus institutional); dosage (e.g., partial versus full implementation); and offense types (e.g., violent versus nonviolent offenders).
A process by which the research evidence from multiple studies on a particular topic is reviewed and assessed using systematic methods to reduce bias in selection and inclusion of studies. A systematic review is generally viewed as more thorough than a non-systematic literature review, but does not necessarily involve the quantitative statistical techniques of a meta-analysis.
An analytic technique that uses a sequence of data points, measured typically at successive, uniform time intervals, to identify trends and other characteristics of the data. For example, a time series analysis may be used to study a city’s crime rate over time and predict future crime trends.
The subjects or program participants of the set of services, treatment, or activities being studied or tested.
The probability of a Type I error, usually signified as “alpha,” is often used to indicate the chance of failing to reject a null hypothesis that is actually false (e.g., concluding that a program works when in fact it does not, also called a false positive).
The probability of a Type II error, usually signified as “beta,” is often used to indicate the chance that an actual effect goes undetected (e.g., concluding that a program doesn’t work when it fact it does, also called a false negative).
A statistical measure of how far a set of data points are dispersed from the mean or average for a population or a sample. It is the average deviation of outcomes from the mean of outcomes for a group. It is used as a step in determining the effect of an intervention or treatment on a population.