ACT Learning and Professional Services

Technical Assessment Tools and Services for K-12 Education, Higher Education, and Workforce

CRASE+®

ACT’s Automated Essay Scoring Engine

Automated essay scoring uses computers to reliably emulate how humans score writing assessment responses.

CRASE+ is ACT’s automated essay scoring engine. CRASE+ stands for Constructed Response Automated Scoring Engine. CRASE+ accepts open-ended text from examinees and evaluates their answers according to predefined rubrics. CRASE+ was developed in 2007 and has been used to score many writing assessments. ACT acquired CRASE+ in 2017.

CRASE+ uses Natural Language Processing and machine learning methodology to understand and model the behavior of human scorers. Its use in large-scale writing assessments can serve to control costs and increase scoring efficiency while maintaining the highest standards of quality.

CRASE+ includes three main components: a preprocessor, a feature extractor, and a machine learning component.

During preprocessing, CRASE+ imports examinee essays and standardizes their format. Sentence spacing, word spacing, and paragraph breaks are made consistent across essays. Alternate forms of the essays, such as spell-corrected versions, are produced during preprocessing.

The feature extractor calculates various numeric qualities (called features) of the essays. Essay features vary in complexity. Features can be as simple as the average length (in words) of an essay’s sentences or as complex as certain measures of lexical diversity. All features to be analyzed by CRASE+ were determined by English language arts (ELA) experts and aligned to established writing rubrics.

The machine learning component uses statistical models to relate an essay’s features to its expected human-assigned score. When CRASE+ is being trained, the best statistical model is being determined. When CRASE+ is used for an assessment, an essay’s features are entered into the best model to determine the CRASE+ score. 

For assessments where the examinee’s score is based on multiple writing traits (domains), such as the ACT writing test, an automated scoring model is produced for each trait.

Starting in late 2022, ACT began using CRASE+ to help score international administrations of the ACT writing test. CRASE+ replaced one of the two human scorers traditionally used for scoring this test. Human scoring continues to be used to resolve any scoring discrepancies and to conduct quality control. Every essay receives at least one human review.

ACT’s Scoring Operations team has implemented several procedures for quality control to ensure that the CRASE+ scores are as accurate as possible. In rare cases where an essay cannot be reliably assessed by the CRASE+ model, Scoring Operations staff can replace a CRASE+ score with a human score.

Spring of 2023 saw ACT implement CRASE+ for use in scoring online District administrations of the ACT writing test. The use of CRASE+ for district testing is governed by the rules described above, whereby CRASE+ pairs with at least one human rater to evaluate each essay.  

One way to evaluate the accuracy of CRASE+ is to send a sample of essays through the engine and determine the percentage of essays that received the same score from both CRASE+ and the human scorer. This percentage is called the exact agreement rate. ACT requires that the exact agreement rate between any two raters (human or computer) be at least 60%.

For the CRASE+ models created for the ACT writing test, all human-CRASE+ exact agreement rates exceeded 68%, surpassing ACT’s requirement.

In a separate study, CRASE+ researchers compiled the human-human and human-CRASE+ exact agreement rates for 173 essay scoring models produced for various customers between 2016 and 2019. We compared the human-human and the human-CRASE+ exact agreement rates. A significant majority of the human-CRASE+ exact agreement rates were higher than their human-human counterparts. In addition, 85% of the human-CRASE+ exact agreement rates were within 5.125 percentage points of their human-human counterparts. (The 5.125 percentage point threshold is commonly used in automated scoring research.)

Finally, CRASE+ was used in a famous automated scoring study described by Mark D. Shermis and Ben Hamner in Handbook of Automated Essay Evaluation: Current Applications and New Directions (2013). CRASE+ performed comparably to other major automated scoring engines on a set of eight essay prompts.

Some news articles have highlighted the potential for bias to exist in automated scoring engines. Such bias may inadvertently advantage or disadvantage an examinee because of their gender or ethnicity.

The CRASE+ research team takes these concerns seriously, and our responses with respect to the CRASE+ models developed for the ACT writing test include the following:

  • We reviewed examinee demographic information throughout engine training to ensure that selected subgroups were fairly represented in the model-building process.

  • We computed various agreement metrics for selected subgroups to confirm that scoring accuracy thresholds were met for all studied subgroups. 

  • We used a new evaluation technique called differential feature functioning to determine whether the essay features used in scoring were behaving consistently across studied subgroups.

  • We continue to research new methods for identifying potential subgroup differences in automated scoring models.

All analysis to date has shown that any observed differences between subgroups are minimal and should not directly advantage or disadvantage the diverse population of examinees taking the ACT writing test.

CRASE+ can automatically identify certain types of unusual responses, such as essays that are blank, essays written in a language other than English, and essays containing certain types of “non-attempts.”

For examinees, the best approach is to write without any concern about AI scoring. After all, in most cases, both a human scorer and CRASE+ will read and evaluate the response—and in all cases, at least one human will read and score the essay. Examinees should always be sure to follow the instructions in the prompt.

Selected References about Automated Scoring and CRASE+

The CRASE+ team at ACT use the best practices that we can to produce automated scoring models. The following references shape our best practices. 

The Standards for Educational and Psychological Testing, by the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education (2014) 

Guidelines for Technology-Based Assessment, by the International Test Commission and the Association of Test Publishers (2022) 

Establishing Standards of Best Practice in Automated Scoring, by Scott Wood, Erin Yao, Lisa Haisfield, and Susan Lottridge (2021) 

Public Perception and Communication around Automated Essay Scoring, from Handbook of Automated Scoring: Theory into Practice, by Scott Wood (2020) 

Best Practices for Constructed-Response Scoring, by ETS (2021) 

A Framework for Evaluation and Use of Automated Scoring, from Educational Measurement: Issues and Practice, by David M. Williamson, Xiaoming Xi, and F. Jay Breyer (2012) 

Selected CRASE+ References 

The following references illustrate uses of the CRASE+ engine on writing assessments. 

CRASE Essay Scoring Model Performance Based on Proof-of-Concept and Operational Engine Trainings, by Scott Wood (2023)

Anchoring Validity Evidence for Automated Essay Scoring, from the Journal of Educational Measurement, by Mark D. Shermis (2022) 

Communicating to the Public About Machine Scoring: What Works, What Doesn’t, by Mark D. Shermis and Susan Lottridge (2019) 

Establishing a Crosswalk between the Common European Framework for Languages (CEFR) and Writing Domains Scored by Automated Essay Scoring, from Applied Measurement in Education, by Mark D. Shermis (2018) 

The Impact of Anonymization for Automated Essay Scoring, from the Journal of Educational Measurement, by Mark D. Shermis, Sue Lottridge, and Elijah Mayfield (2015) 

An Evaluation of Automated Scoring of NAPLAN Persuasive Writing, by the ACARA NASOP Research Team (2015) 

NAPLAN Online Automated Scoring Research Program: Research Report, by Goran Lazendic, Julie-Anne Justus, and Stanley Rabinowitz (2018) 

Using Automated Scoring to Monitor Reader Performance and Detect Reader Drift in Essay Scoring, from Handbook of Automated Essay Evaluation: Current Applications and New Directions, by Susan Lottridge, E. Matthew Schulz, and Howard Mitzel (2013) 

Contrasting State-of-the-Art Automated Scoring of Essays, from Handbook of Automated Essay Evaluation: Current Applications and New Directions, by Mark D. Shermis and Ben Hamner (2013) 

To learn more, contact ACT at crase@act.org