Project Activities
The research team will review items from the 5th grade Massachusetts Comprehensive Assessment System (MCAS) to study linguistic aspects of multiple-choice items in science that may impede the valid measurement of science knowledge among ELLs. The team will create and test modified items that avoid linguistic structures that may interfere with ELLs' ability to accurately complete the assessment item, such as lexical complexity, syntactic complexity, discourse complexity, unfamiliar context, and atypical perspective. In addition, they will examine other assessments for the prevalence of the identified linguistic structures. Finally, they intend to create guidance for school and district managers as well as teachers in how to avoid the use of assessment items with linguistic structures that may result in invalid measures of science knowledge among ELLs.
Structured Abstract
Setting
This study will take place in Massachusetts, New Jersey, and Utah.
Sample
Participants will include approximately 75,000 fifth graders in Massachusetts in 2009. The study to try out new assessment items will include a total of 1,080 fifth-grade students from five urban school districts in Massachusetts. Studies to confirm the identified linguistic features will include a large sample of students from across the United States, as well as specialized studies that include all fourth-grade students in New Jersey and all fifth-grade students in Utah.
The research team will develop a coding system for characterizing the linguistic complexity of multiple-choice science items, and identify linguistic features associated with differential item functioning (DIF) for ELLs. The validity of the linguistic features in creating difficulty for ELLs will be demonstrated by administering modified assessment items to remove the identified features and comparing item difficulty for ELL and non-ELL students.
Research design and methods
Researchers will begin by conducting a literature review to find linguistic features of items that have been identified in similar studies as contributing to invalid measurement for ELLs. Examples of such features include lexical complexity, syntactic complexity, discourse complexity, unfamiliar context, and atypical perspective. Three coders will independently review 175 multiple-choice items from the 5th grade MCAS between 2004 and 2010 for the presence of these features. The relationship between the identified features and differential item functioning will be studied by comparing the performance of ELLs and non-ELLs who took the MCAS in 2009. The research team will then conduct an empirical test of the relationship between modified items and ELL status by creating three test forms with both modified and non-modified items and administering them to 180 ELLs and 180 non-ELLs. In addition, researchers will investigate the frequency and relationship to ELL status of the identified item features that occur on the MCAS science tests in 8th and 10th grade; elementary-level assessments in New Jersey and Utah; and on the 4th grade NAEP in science.
Control condition
There is no control condition.
Key measures
The key measures used in this project are the MCAS science assessments in 5th, 8th, and 10th grade. In addition, the 4th grade NAEP assessments in science and the 4th grade New Jersey Assessment of Skills and Knowledge are utilized. Also, the 5th grade Utah Core Criterion-Reference Test will be used.
Data analytic strategy
A total feature score will be computed for each item by first putting the individual feature coding on a common scale and then creating a total feature score. The test of identified linguistic features will use logistic regression modeling to estimate differential item functioning (DIF). Items for which the coefficient predicting item passage is negative and statistically significant will be considered to exhibit DIF. Analyses will determine whether there are any statistically significant relationships between DIF level and total feature scores, DIF levels and individual feature scores, and DIF levels and combinations of feature scores. Based on these findings, initial feature hypotheses will be revised to create a refined set of hypotheses about features that act as sources of construct irrelevant difficulty for ELLs. The validity of the identified linguistic features will be tested by conducting analysis of variance to separate the effects of modified versus non-modified items for ELLs and non-ELLs. To explore the effects of individual item modifications, researchers will conduct independent samples t-tests to compare overall scores on the original versus modified forms of each item and a two-way analysis of variance to measure the effects of modifications to individual items across student groups. Similar analytic approaches will be used with Massachusetts standardized science tests in eighth and tenth grade, elementary school state science tests in New Jersey and Utah, and NAEP science data.
People and institutions involved
IES program contact(s)
Project contributors
Products and publications
Products: Products will include published papers and reports on aspects of science assessment items that result in invalid measurement of ELLs and guidelines for coding linguistic features of science assessment items. Two handbooks will be developed. This will include a handbook for state and district personnel that describes linguistic aspects of science assessment items that should be avoided, and a handbook for teachers that explains how to help ELs cope with poorly written assessment items.
Journal article, monograph, or newsletter
Kachchaf, R., Noble, T., Rosebery, A., O'Connor, C., Warren, B., & Wang, Y. (2016). A Closer Look at Linguistic Complexity: Pinpointing Individual Linguistic Features of Science Multiple-Choice Items Associated with English Language Learner Performance. Bilingual Research Journal, 39(2): 152-166.
Noble, T., Rosebery, A., Suarez, C., Warren, B., and O'Connor, M.C. (2014). Science Assessments and English Language Learners: Validity Evidence Based on Response Processes. Applied Measurement in Education, 27(4): 248-260.
Noble, T., Suarez, C., Rosebery, A., O'Connor, M. C., Warren, B., & Hudicourt-Barnes, J. (2012). "I Never Thought of It As Freezing": How Students Answer Questions on Large-Scale Science Tests and What They Know About Science. Journal of Research in Science Teaching, 49(6): 778-803.
Nongovernment report, issue brief, or practice guide
Noble, T., Rosebery, A., Kachchaf, R., & Suarez, C. (2015). Lessons Learned and Implications for Practice from the English Learners and Science Tests Project. TERC.
Noble, T., Rosebery, A., Kachchaf, R., & Suarez, C. (2015). A Handbook for Improving the Validity of Multiple Choice STE Items for English Learners.
Questions about this project?
To answer additional questions about this project or provide feedback, please contact the program officer.