Ary, D., Jacobs, L. C. & Sorensen, C. (2010). Introduction to research in education (8th Ed.). New York, NY: Wadsworth.
Azmoon.Net. (2014). PhD entrance examination news. Retrieved 2014, October, 15th from www. Phd.Azmoon.Net. www. PhD Test.
Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning: A preliminary theory of action for summative and formative assessment. Measurement: Interdisciplinary Research and Perspectives, 8, 70-91.
Cohen, L., Manion, L., & Morrison, K. (2007). Research methods in education (sixth Ed.) London: Routledge.
Cronbach, L. J. (1980). Validity on parole: How can we go straight? New directions for testing and measurement: Measuring achievement over a decade. In Proceedings of the 1979 ETS Invitational Conference (pp. 99-108). San Francisco, CA: Jossey- Bass.
Douglas, D. (2014). Understanding language testing. Oxon.Hodder Education.
Dörnyei, Z. (2007). Research methods in applied linguistics: quantitative, qualitative and mixed methodologies. Oxford: Oxford University Press.
Drasgow, F. (1987). Study of the measurement bias of two standardized psychological tests. The Journal of Applied Psychology, 72, 19–29.
Farhady, H., Jafarpur, A. J., & Birjandi, P. (2014). Testing Language Skills from Theory to Practice. Tehran: SAMT.
Glaser, B. G., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative research. New York: Aldine.
Green, A. (2007). Washback to the learners: Learners and teacher perspectives on IELTS preparation course expectation and outcomes. Assessing Writing, 11, 113 -134.
Haertel, E. (2013). How is testing supposed to improve schooling? Measurement: Interdisciplinary Research and Perspectives, 11(1-2), 1-18.
Johnson, R.C., & Riazi, M. (2013). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Papers in Language Testing and Assessment. 2(1), 31-58.
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527-535.
Kane, M., Crooks, T., & Cohen, A. (1999). Validating measures of performance. Educational Measurement: issues and practice, 18(2), 5-17.
Kane, M. T. (2006). Validation. Educational Measurement, 4, 17-64.
Kane, M.T. (2011). Validating score interpretations and uses. Language Testing 29(1), 3– 17.
Kane, M.T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement. 50(1), 1–73
Kiany, R., Shayestefar, P., Ghafar Samar, R., Akbari, R. (2013). High-rank stakeholders’ perspectives on high- stakes University entrance examinations reform: priorities and problems. High Educ 65, 325–340
Kline, P. (2000). The handbook of psychological testing (2nd Ed.). London: Routledge.
Kunnan, A. J. (2000). Fairness and justice for all. In A. J. Kunnan, (Ed.). Fairness and Validation in Language Assessment: Selected Papers from the 19th Language Testing Research Colloquium, Orlando, Florida (pp. 1-14). Cambridge: Cambridge University Press.
Kunnan, A. J. (2003). Test fairness. In M. Milanovic & C. Weir (Eds.), Select Papers from the European Year of Languages Conference, Barcelona. Cambridge: Cambridge University Press.
Maxwell, J. A. (1996). Qualitative Research Design: An Interactive Approach. Thousand Oaks, California: Sage Publications.
Monk, T H. (1990). The relationship of chronobiology to sleep schedules and performance demands. Work and Stress, 4(3), 227-236.
NOET. (2013). PhD entrance examination news. Retrieved 2013, December, 20th from http://www.eao.ir/eao/Full Story.aspx? gid=1&id=730
Shulman, H C., Boster, F J., & Carpenter, C J. (2011). Do data collection procedures influence political knowledge test performance? Paper presented at the annual meeting of the Midwestern Political Science Association in Chicago, IL. Oaks, CA: Sage.
Sireci, S.G., & Rios, J.A. (2013). Decisions that make a difference in detecting differential item Functioning. Educational Research and Evaluation, 19, 170–187. DOI: 10.1080/13803611.2013.767621.
Takala, S., & Kaftandjieva, F. (2000). Test fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17, 323–40.
Teddlie, C. & Tashakkori, A. (2003).Major Issues and Controversies in the Use of Mixed Methods in the Social and Behavioral Sciences. In Tashakkori, A. & Teddlie, C. Handbook of mixed methods in social and behavioral research. Thousand Oaks, CA: Sage.
Teddlie, Ch. & Tashakkori, A. (2006). A general typology of research designs featuring mixed methods. Research in Schools, 13 (1), 12-28.
Weir, C. J. (2005).Language testing and validation. Hampshire: Palgrave McMillan.
Wise, S L., Kingsbury, G., Hauser, C., & Ma, L. (2010). An investigation of the relationship between time of testing and test-taking effort. Paper presented at the annual meeting of the National Council on Measurement in Education, Denver, CO.
Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147- 170.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense
Zumbo, B. D. (2008, July). Statistical methods for investigating item bias in self-report measures. Florence Lectures on DIF and Item Bias. Lectures Conducted from Universita degli Studi di Firenze, Florence, Italy.