Using Multiple-Variable Matching to Identify EFL Ecological Sources of Differential Item Functioning

Document Type: Research Paper


University of Isfahan



Context is a vague notion with numerous building blocks making language test scores inferences quite convoluted. This study has made use of a model of item responding that has striven to theorize the contextual infrastructure of differential item functioning (DIF) research and help specify the sources of DIF. Two steps were taken in this research: first, to identify DIF by gender grouping via logistic regression modeling, an inventory of mostly cited DIF sources was prepared, based on which a list of demographic items was appended to the TOEFL reading paper only to be administered to the intermediate Iranian undergraduates; second, using multiple-variable matching regression (Wu & Ercikan, 2006), a built-in sequence was followed to let every potential DIF source be considered as a covariate, over and above the conditioning variable, and specify whether a particular ecological variable could reduce DIF value/status. Then, all significant variables were analyzed together to show the final DIF predictors. The same procedures, i.e., individual/collective analyses, were employed after the purification of the test. The results indicated three ecological predictors affecting DIF before and after purification: income, administration convenience, and SES. The ultimate predictors helped create an EFL configuration of the ecological model of item responding. The implications for validity arguments are also discussed.


Abbott, M. (2007). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24(1), 7-36.

Ahmadi, A. & Darabi Bazvand, A. (2016). Gender differential item functioning on a national field-specific test: The case of PhD entrance exam of TEFL in Iran. Iranian Journal of Language Teaching Research, 4(1),63-82.

Ahmadi, A. & Jalili. T. (2014). A confirmatory study of differential item functioning on EFL reading comprehension. Applied Research on English Language, 3(2), 55-68. 

Alderson, J. C. (2000). Assessing reading. Cambridge: Cambridge University Press.

Allalouf, A. & Abramzon, A. (2008). Constructing better second language assessments based on differential item functioning analysis. Language Assessment Quarterly, 5(2), 120-141.

Allalouf, A. Hambleton, R. K., & Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185-198. 

Aryadoust, V., Goh, C. C. M., & Kim, L. O. (2011). An investigation of differential item functioning in the MELAB listening test. Language Assessment Quarterly, 8, 361-385.

Banks, K. (2012). Are inferential reading items more susceptible to cultural bias than literal reading items? Applied Measurement in Education, 25, 220-245.

Barati, H., Ketabi, S. & Ahmadi, A. (2006). Differential item functioning in high-stakes tests: The effect of field of study. IJAL, 19(2), 27-42.  

Bolt, S. & Thurlow, M. (2007). Item-level effects of the read-aloud accommodation for students with reading disabilities. Assessment for effective Intervention, 33, 15-28.

Brantmeier, C. (2001). Second language reading research on passage content and gender: Challenges for the intermediate-level curriculum. Foreign Language Annals, 34(4), 325-333.

Brantmeier, C. (2003). Beyond linguistics knowledge: Individual differences in second language reading. Foreign Language Annals, 36(1), 33-43.

Brantmeier, C. (2007). Adult second language reading in the USA: The effects of readers’ gender and test methods. Forum on public policy, 14, 1-34.

Chen, Z. & Henning, G. (1985). Linguistic and cultural bias in language proficiency tests. Language Testing, 2(2), 155-163.

Cheong, Y. F. (2006). Analysis of school context effects on differential item functioning using hierarchical generalized linear models. International Journal of Testing, 6(1), 57-79. 

Cheong, Y. F., & Kamata, A. (2013). Centering, scale indeterminacy, and differential item functioning detection in hierarchical generalized linear and generalized linear mixed models. Applied Measurement in Education, 26, 233-252.

Cho, H-J., Lee, J., & Kingston, N. (2012). Examining the effectiveness of test accommodation using DIF and a mixture IRT model. Applied Measurement in Education, 25, 281-304.

Cohen, A. S. & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133-148.  

Cohen, A. & Macaro, E. (Eds.). (2007). Language learner strategies: Thirty years of research and practice. Oxford, UK: Oxford University Press.

Dörnyei, Z. (2003). Questionnaires in second language research. Lawrence Erlbaum Associates, Inc.     

Dörnyei, Z. & Skehan, P. (2003). Individual differences in L2 learning. In C. Doughty & M. Long (Eds.), The handbook of second language acquisition (pp. 589-630). Malden, MA: Blackwell Publishing.

Elder, C., McNamara, T. F., & Congdon, P. (2003). Understanding Rasch measurement: Rasch techniques for detecting bias in performance assessment: An example comparing the performance of native and non-native speakers on a test of academic English. Journal of Applied Measurement, 4, 181-197.

Elosua, P. & Lopez-Jauregui, A. (2007). Potential sources of differential item functioning in the adaptation of tests. International Journal of Testing, 7(1), 39-52.  

Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3&4), 199-215.

Ercikan, K., Roth, W., Simon, M., Sandilands, D., & Lyons-Thomas, J. (2014). Inconsistencies in DIF detection for sub-groups in heterogeneous language groups. Applied Measurement in Education, 27, 273-285.

Ferne, T. & Rupp, A. A. (2007). A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4(2),113-148.

Finch, W. H., Hernández Finch, M. E., & French, B. F. (2016). Recursive partitioning to identify potential causes of differential item functioning in cross-national data. International Journal of Testing, 16, 21-53.

Gόmez-Benito, J., Sirecim S., Padila, J-L., Hidalgo, M. D., & Benítez, I. (2018). Differential Item functioning: Beyond validity evidence based on internal structure. Psicothema, 30(1),104-109.     

Harding, L. (2011). Accent, listening assessment and the potential for a shared-L1 advantage: A DIF perspective. Language Testing, 29(2), 163-180.

Helwig, R., Rozek-Tedesco, M. A., Tindal, G., Heath, B., & Almond, P. J. (1999). Reading as an access to mathematics problem solving on multiple-choice tests for six-grade students. Journal of Educational Research, 93, 113-125.

Hidalgo, M. D. & Gόmez-Benito, J. (2010). Differential item functioning. In P. Peterson, E. Baker, & B. McGaw, (Eds.), International encyclopedia of education, 4, (pp. 36-44). Oxford: Elsevier.

Hidalgo, M. D. & Lόpez-Pina, J. A. (2004). DIF detection and effect size: A comparison between logistic regression and Mantle-Haenszel variation. Educational and Psychological Measurement, 64, 903-915.  

Jang, E. E. & Roussos, L. (2009). Integrative analytic approach to detecting and interpreting L2 vocabulary DIF. International Journal of Testing, 9(3), 238-259.

Jodoin, M. G., & Gierl, M. J. (2001). Type-one error and power rates using an effect size measure with the logistic regression for DIF detection. Applied Measurement in Education, 14, 329-349.            

Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18(1),89-114.

Koo, J., Becker, B. J., & Kim, Y-S. (2014). Examining differential item functioning trends for English language learners in a reading test: A meta-analytical approach. Language Testing, 31(1), 89-109.

Kunnan, A. J. (1990). DIF in native language and gender groups in an ESL placement test. TESOL Quarterly, 24, 741-746.     

Le, L. T. (2009). Investigating gender differential item functioning across countries and test languages for PISA science items. International Journal of Testing, 9(2), 122-133.

Lee, H., & Geisinger, K. F. (2014). The effect of propensity scores on DIF analysis: Inference on the potential cause of DIF. International Journal of Testing, 14, 313-338.

Lynch, B. K., & McNamara, T. F. (1998). Using G-theory and Many-Facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158-180.  

Mazor, K. M., Kanjee, A., & Clause, B. E. (1995). Using logistic regression and the Mantel-Haenszel with multiple ability estimates to detect differential item functioning. Journal of Educational Measurement, 32(2),131-144.   

McNamara, T. & Roever, C. (2006). Language testing: The social dimension. Malden, MA & Oxford: Blackwell.

Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19(4),289-304.

Newman, M. L., Groom, C. J., Handelman, L. D., & Pennebaker, J. W. (2008). Gender differences in language use: An analysis of 1400 text samples. Discourse Processes, 45, 211-236.

Oliveri, M. E., Ercikan, K., Zumbo, B. D. (2014). Effects of population heterogeneity on accuracy of DIF detection. Applied Measurement in Education, 27(4), 286-300.

O’Neill, K. A. & McPeek, W. M. (1993). Item and test characteristics that are associated with differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning, 255-276. Hillsdale, NJ: Lawrence Erlbaum Associates.

Oshima, T. C., Raju, N. S., Flowers, C. P., & Slinde, J. A. (1998). Differential bundle functioning using the DFIT framework: Procedures for identifying possible sources of differential functioning.  Applied Measurement in Education, 11(4), 353-369.

Pae, T. I. (2004b). Gender effect on reading comprehension with Korean EFL learners. System, 32(3),265-281.

Pae, T. I. (2012). Causes of gender DIF on an EFL language test: A multiple-data analysis over nine years. Language Testing, 29(4), 533-554.

Roever, C. (2005). That’s not fair: Fairness, bias, and differential item functioning in language testing. SLS Brownbag, 1-14.

Roth, W. M., Oliveri, M. E., Sandilands, D. D., & Lyons-Thomas, J. (2013). Investigating linguistic sources of differential item functioning using expert think-aloud protocols in science achievement tests. International Journal of Science Education, 35(4), 546-576.  

Ryan, K. & Bachman, L. F. (1992). Differential item functioning on two tests of EFL proficiency. Language Testing, 9(1), 12-29.

Santelices, M. V. & Wilson, M. (2012). On the relationship between differential item functioning and item difficulty: An issue of methods? Item response theory approach to differential item functioning. Educational and Psychological Measurement, 72(1), 5-36. 

Sasaki, M. (1991). A comparison of two methods for detecting differential item functioning in an ESL placement test. Language Testing, 8(2), 95-111.

Shermis, M. D., Mao, L., Mulholland, M., & Kieftenbeld, V. (2017). Use of automated scoring features to generate hypotheses regarding language-based DIF. International Journal of Testing, 17(4), 351-371.

Shimizu, Y., & Zumbo, B. D. (2005). Logistic regression for differential item functioning: A primer. Japan Language Testing Association Journal, 7, 110-124.

Sireci, S. G., & Rios, J. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation, 19(2-3), 170-187.  

Stricker, L. J., & Emmerich, W. (1999). Possible Determinants of differential item functioning: Familiarity, interest, and emotional reaction. Journal of Educational Measurement, 36(4), 347-366.

Suh, Y., & Talley, A. E. (2015). An empirical comparison of DDF detection methods for understanding the causes of DIF in multiple-choice items. Applied Measurement in Education, 28, 48-67.  

Takala, S. & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17(3), 323-340.

Taylor, C. S., & Lee, Y. (2012). Gender DIF in reading and mathematics tests with mixed item formats. Applied Measurement in Education, 25, 246-280.

Tsaousis, I., Sideridis, G., & Al-Saawi, F. (2018). Differential distractor functioning as a method for explaining DIF: The case for a national admissions test in Saudi Arabia. International Journal of Testing, 18(1), 1-26.

Uiterwijk, H. & Vallen, T. (2005). Linguistic sources of item bias for second generation immigrants in Dutch tests. Language Testing, 22(2),211-234.

Vermunt, J. K. (2008). Latent class and finite mixture models for multilevel data sets. Statistical Methods in Medical Research, 17, 33-51.

Widdowson, H. (2001). Communicative language testing: The art of the possible. In C. Elder, A. Brown, E. Grove, K. Hill, N. Iwashita, T. Lumley, T. McNamara, & K. O'Loughlin (Eds.), Experimenting with uncertainty: Essays in honour of Alan Davis (pp. 12-21). Cambridge: Cambridge University Press. 

Wu, A. D. & Ercikan, K. (2006). Using multiple-variable matching to identify cultural sources of differential item functioning. International Journal of Testing, 6(3), 287-300.

Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.

Zumbo, B. D. (2007). Three generations of DIF analysis: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.

Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65-82). Charlotte, NC: IAP-Information Age Publishing, Inc.

Zumbo, B. D. & Gelin, M. N. (2005). A matter of test bias in educational policy research: bringing the context into picture by investigating sociological/community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5(1),1-23.  

Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera Astivia, O. L., & Ark, T. K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12, 136-151.