- Abedi, J. 2004. The No Child Left Behind Act and English Language Learners: Assessment and Accountability Issues
Educational Researcher 33, 1, 4 - 44.
- Abedi, J., Hofstetter, C. H., and Lord, C. (2004). Assessment Accommodations for English Language Learners: Implications for Policy-Based Empirical Research
Review of Educational Research 74, 1, 1 - 28.
- American Council on the Teaching of Foreign Languages 2012. ACTFL Proficiency Guidelines. Alexandria, VA: ACTFL.
- Agard, F. B. and Dunkel, H. B. (1948). An Investigation into Second Language Teaching. Boston: Ginn & Company.
- Alderson, J. C. 2009. Air safety, language assessment policy, and policy implementation: The case of aviation English.
Annual Review of Applied Linguistics, 29, 168 - 187.
- Alderson, J. C, and Banerjee, J. 2002.
Language testing and assessment (Part 1).
Language Teaching, 34, 213 - 236.
- Alderson, J. C, and Banerjee, J. 2002. Language testing and assessment (Part 2).
Language Teaching, 35, 79 - 113.
- Alderson, J. C, and Hughes, A. (1981)
Issues in Language Testing. ELT Documents 111. London: British Council.
- Amrein, A. L., Berliner, D. C. & Rideau, S 2010. Cheating in the first, second, and third degree: Educators' responses to high-stakes testing.
Education Policy Analysis Archives, 18, 14.
- Amrein-Beardsley, A. L. and Berliner, D. C. 2002. High Stakes Testing, Uncertainty, and Student Learning. Education Policy Analysis Archives, 10, 18.
- Anonymous, 2009. Computer-based and paper-pencil test comparability.
Pearson Education: Test, Measurement and Research Services Bulletin 9
- Assessment Reform Group. (1999). Beyond the Black Box.
- Assessment Reform Group. (2002). Testing, Motivation and Learning. Cambridge: University of Cambridge Faculty of Education.
- Atkinson, T and Davies, G. 2000.
Computer Aided Assessment and Language Learning. ICLT4LT.
- Atkinson, R. C. and Geiser, S. 2010. Reflections on a century of College Admissions Tests. Educational Researcher 38, 9, 665 - 667.
- Au, W. 2007.
High Stakes Testing and Curricular Control.
Educational Researcher, 36, 5.
- Campbell, D. T. and Fiske, D. W. 1959. Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix
Psychological Bulletin 56, 2, 81 - 105.
- Camilli, G. 1996. Standard Errors in Educational Assessment. Education Policy Analysis Archives 4, 4.
- Canagarajah, S. 2006.
Changing Communicative Needs, Revised Assessment Objectives: Testing English as an International Language
Language Assessment Quarterly, 3, 3, 229 - 242.
- Canale, M. and Swain, M. 1980.
Theoretical Bases of Communicative Approaches to Second Language Teaching and Testing. Applied Linguistics 1, 1, 1 - 47.
- Carr, N. T. 2014.
Automated Scoring of Written Responses. In Kunnan, A. J. (Ed.) The Companion to Language Assessment (pp. 1063 - 1078). London: John Wiley and Sons.
- Carrell, P. L. 2007.
Notetaking strategies and their relationship to performance on listening comprehension and communicative assessment tasks.
TOEFL Monograph No. MS-35. Princeton, NJ: Educational Testing Service.
- Carrell, P. L. , Dunkel, P. A. 2004.
The effects of notetaking on listening comprehension.
Applied Language Learning 14, 1, 83 - 105.
- Carrell, P. L. , Dunkel, P. A. and Mollaun, P. 2002.
The effects of notetaking, lecture length, and topic on the listening component of TOEFL 2000.
TOEFL Monograph No. MS-23. Princeton, NJ: Educational Testing Service.
- Celik, M. 1999.
Testing Some Suprasegmental Features of English Speech The Internet TESL Journal, 5, 8.
- Chalhoub-Deville, M. 1993.
Performance Assessment and the Components of the Oral Construct Across Tasks and Rater Groups. ERIC.
- Chalhoub-Deville, M. 2001.
Language Testing and Technology: Past and Future
Language Learning and Technology, Vol 5, No. 2, May 2001, 95 - 98.
- Chalhoub-Deville, M. and Fulcher, G. 2003.
The Oral Proficiency Interview: A Research Agenda
Foreign Language Annals, 36, 4, 498 - 506.
- Chapman, M. 2003.
TOEIC: Tried but Undertested.
JALT Testing and Evaluation SIG Newsletter, 7, 3, 2 - 5.
- Cimbricz, S. 2002.
State-mandated testing and teachers' beliefs and practice. Education Policy Analysis Archives 10, 2.
- Clapham, C. 2000.
Assessment and Testing.
Annual Review of Applied Linguistics, 20, 147 - 161.
- Cohen, A. D., & Upton, T. A. 2006.
Strategies in responding to new TOEFL reading tasks.
TOEFL Monograph No. MS-33. Princeton, NJ: Educational Testing Service.
- Commitee on Assessment and Evaluation in Education. 2005.
The Knowledge Base for Assessment and Evaluation in Education.
Israel Academy of Sciences and Humanities; Ministry of Education, Culture and Sport;
Rochschild Foundation (Yad Hanadiv).
- Coniam, D. and Falvey, P. 1999.
Assessor training in a high-stakes test of speaking: The Hong Kong English language benchmarking initiative. Melbourne Papers in Language Testing 8, 2.
- Cronbach, L. J. and Meehl, P. E. 1955.
Construct Validity in Psychological Tests
Psychological Bulletin, 52, 281 - 302.
- Cumming, A. 1994.
Does Language Assessment Facilitate Recent Immigrants' Participation in Canadian Society?
TESL Canada Journal, 11, 2, 117 - 133.
- Cumming, A., Grant, L., Mulcahy-Ernt, P., & Powers, D. E. 2005.
A teacher-verification study of speaking and writing prototype tasks for a new TOEFL Test.
TOEFL Monograph No. MS-26. Princeton, NJ: Educational Testing Service.
- Cumming, A., Kantor, R., Baba, K., Eouanzoui, K., Erdosy, U., & James, M. 2006.
Analysis of discourse features and verification of scoring levels for independent and integrated prototype written tasks for the new TOEFL.
TOEFL Monograph No. MS-30. Princeton, NJ: Educational Testing Service.
- Cunningham, C. R. 2002.
The TOEIC test and communicative competence: Do test score gains correlate
with increased competence? A preliminary study. University of Birmingham,
UK: MA dissertation.
- Davidson, F. 1988.
An Exploratory Modeling Survey of the Trait Structures of Some Existing Language Tests. Unpublished PhD Dissertation, University of California at Los Angeles.
- Davidson, F. and Fulcher, G. 2007.
Flexibility is proof of a good 'framework'.
Guardian Weekly, 17th November.
- Davidson, F. and Fulcher, G. (2007).
"The Common European Framework of Reference (CEFR) and the design of language tests: A Matter of Effect."
Language Teaching 40, 3, 231 - 241.
- Davies, A. 1984.
Computer Assisted Language Testing.
CALICO Journal 1, 5.
- Davies, A. 1997.
The education (and training) of language testers. Melbourne Papers in Language Testing 6, 1.
- Davies, A. 2014.
Fifty Years of Language Testing. In Kunnan, A. J. (Ed.) The Companion to Language Assessment (pp. 1 - 21). London: John Wiley and Sons.
- de Jong, H.A.L. 1990.
Standardization in Language Testing. AILA Review 7.
This is the complete text of the edited volume, and contains the following papers:
- Guest-editor's Preface
John H. A. L. DE JONG 3-5
- Language Testing in Research and Education: The Need for Standards
Peter J. M. GROOT 6-23
- The Cambridge-TOEFL Comparability Study : An example of the Cross-National Comparison of Language Tests
Fred DAVIDSON & Lyle BACHMAN 24-45
- The Australian Second Language Proficiency Ratings (ASLPR)
David E. INGRAM 46-61
- Cross-National Standards: A Dutch-Swedish Collaborative Effort in National Standardized Testing
John H.A.L. DE JONG & Mats OSCARSON 62-78
- The Hebrew Speaking Test: An Example of International Cooperation in Test Development and Validation
Elana SHOHAMY & Charles W. STANSFIELD 79-90
- EUROCERT: An International Standard for Certification of Language Proficiency
Alex OLDE KALTER & Paul VOSSEN 91-105
- Response to Alex Olde Kalter and Paul Vossen
John READ 106-107
- Derwing, T. M., Rossiter, M. J., Munroe, M. J. and Thomson, R. I. 2004.
Second Language Fluency: Judgments on Different Tasks. Language Learning, 54, 4, 655 - 679.
- Chalhoub-Deville, M. 2001.
Language Testing and Technology: Past and Future. Language Learning and Technology, 5, 2, 95 - 98.
- Dikli, A. 2006.
An Overview of Automated Scoring of Essays. Journal of Technology, Learning, and Assessment, 5, 1.
- Dooey, P. 1999.
An investigation into the predictive validity of the IELTS Test as an indicator of future academic success
In K. Martin, N. Stanley and N. Davison (Eds), Teaching in the Disciplines/ Learning in Context, 114-118.
Proceedings of the 8th Annual Teaching Learning Forum, The University of Western Australia, Perth.
- Dorans, N. J. 2008.
The practice of comparing scores on different tests. R&D Connections 6. Princeton, NJ: Educational Testing Service.
- Dunkel, P. A. 1997. Computer-Adaptive Testing of Listening Comprehension: A Blueprint for CAT Development
The Language Teacher Online, 21, 10.
- Dunkel, P. A. 1999.
Considerations in developing or using
second /foreign language proficiency computer-adaptive tests
Language Learning & Technology 2, 2, 77-93
- Dunkin, M. J. 1997.
Assessing Teachers' Effectiveness. Issues in Educational Research, 7(1), 1997, 37-51.
- Educational Testing Service.
ETS Fairness Review & ETS Standards for Quality and Fairness.
- Elder, C. (1998).
What counts as bias in language testing? Melbourne Papers in Language Testing 7, 1.
- Emmerich, W., Enright, M. K., Rock, D. A. and Tucker, C. 1991.
The Development, Investigation, and Evaluation of New Item Types for the GRE Analytical Measure.
Educational Testing Service, Princeton NJ, ETS Research Report 91-16.
- Ennis, R. H. 1999.
Test Reliability: A Practical Exemplification of
Ordinary Language Philosophy. Philosophy of Education
- Erdosy, M. U. (2004). Exploring Variability in Judging Writing Ability in a Second Language: A Study of Four Experienced Raters of ESL Compositions. TOEFL Research Report 70. Princeton, NJ: Educational Testing Service
- ETS (2010). Linking TOEFL iBT Scores to IELTS Scores - A Research Report. Princeton, NJ: Educational Testing Service.
Read this in relation to:
Score Comparison Tool and,
Supplementary Comparison Tables
- Feast, V. 2002.
The Impact of IELTS scores on performance at university.
International Education Journal, 3, 4, 70 - 85.
- Fives, H. and DiDonato-Barnes, N. 2013. 'Classroom Test Construction: The Power of a Table of Specifications. Practical Assessment, Research & Evaluation 18(3).
- Frain, T. J. 2009. A Comparative Study of Korean University Students before and after a Criterion Referenced Test. Unpublished MEd. Thesis, University of Southern Queensland, Australia.
- Frary, R. B. 1995. More Multiple Choice Item Writing Do's and Don'ts. ERIC/AE Digest Series EDO-TM-95-4.
- Frary, R. B. 1996. Hints for Designing Effective Questionnaires. Practical Assessment, Research and Evaluation, 5(3).
- Frary, R. B. 1996. Hints for Designing Effective Questionnaires.
- Frary, R. B. 2002. A Brief Guide to Questionnaire Development.
- Fox, J. and Courchene, R. (2005). "The Canadian Language Benchmarks (CLB): A Critical Appraisal." Contact 31, 2, 7 - 28.
- Fulcher, G. (1987). "Tests of Oral Performance: the need for data-based criteria." English Language Teaching Journal 41, 4, 287 - 291.
- Fulcher, G. (1993).
The Construction and Validation of Rating Scales for Oral Tests in English as a Foreign Langauge. Unpublished PhD Dissertation, University of Lancaster, UK. This is a large file - when you've clicked just sit and wait for it to download. It will take a few minutes
- Fulcher, G. (1996). "Invalidating validity claims for the ACTFL Oral Rating Scale." System 24, 2, 163 - 172.
- Fulcher, G. (1996). "Testing tasks: issues in task design and the group oral." Language Testing 13, 1, 23 - 51.
- Fulcher, G. (1996). "Does thick description lead to smart tests? A data-based approach to rating scale construction". Language Testing 13, 2, 208 - 238.
- Fulcher, G. (1997). "An English Language Placement Test: Issues in reliability and validity." Language Testing 14, 2, 113 - 139.
- Fulcher, G. (1998). "Widdowson's model of communicative competence
and the testing of reading: An exploratory study." System 26, 3, 281 - 302.
- Fulcher, G. 1999.
Ethics in Language testing TAE SIG Newsletter - Special Conference Issue, Volume 1, No. 1
- Fulcher, G. (1999). "Assessment in English for Academic Purposes: Putting content validity in its place Applied Linguistics 20, 2, 221 - 236.
- Fulcher, G. (1999). "Computerizing an English language placement test." English Language Teaching Journal 53, 4, 289 - 299.
- Fulcher, G. (2000). "Computers in language testing." In Brett P. and Motteram, G. (Eds.) A special interest in computers: Learning and teaching with information and communications technologies. Manchester: IATEFL publications, 93 - 107. Reprinted with the kind permission of IATEFL.
- Fulcher, G. (2000). "The 'communicative' legacy in language testing." System, 28, 483 - 497.
- Fulcher, G. 2001.
Machines get clever at testing Education Guardian, 17 May.
- Fulcher, G. 2003.
Few ills cured by setting scores Education Guardian, 17 April.
- Fulcher, G. 2003. Interface design in computer-based language testing Language Testing 20, 4, 384 - 408.
- Fulcher, G. 2004.
Are Europe's tests being built on an 'unsafe' framework? Education Guardian, 18 March.
Read the response from Brian North
- Fulcher, G. (2004). "Deluded by artifices? The Common European Framework and harmonization." Language Assessment Quarterly, 1, 4, 253 - 266.
- Fulcher, G. 2008. "Testing Times Ahead?"
Liaison Magazine, Issue 1: July, 20 - 24.
Published by the UK Subject Centre for Languages, Linguistics and Area Studies, University of Southampton.
- Fulcher, G. 2009. Test use and political philosophy.
Annual Review of Applied Linguistics, 29, 3 - 20.
- Fulcher, G. (2011). Cheating gives lie to our test dependence, Guardian Weekly, 11th October 2011. Or you can download a pdf.
- Fulcher, G. 2014.
Philosophy and Language Testing. In Kunnan, A. J. (Ed.) The Companion to Language Assessment (pp. 1431 1451). London: John Wiley and Sons.
- Fulcher, G. and Bamford, R. (1996). "I didn't get the grade I need. Where's my solicitor?" System 24, 4, 437 - 448.
- Fulcher, G. and Davidson, F. (2008).
"Tests in Life and Learning: A Deathly Dialogue."
Educational Philosophy and Theory, 40, 3, 407 - 417.
- Fulcher, G. and Davidson, F. (2009). "Test Architecture, Test Retrofit."
Language Testing 26, 1, 123 - 144.
- Fulcher, G. & Marquez Reiter, R. 2003. Task difficulty in speaking tests Language Testing 20, 3, 321 - 344.
- Gebril, A. and Plakans, L. 2009.
Investigating source use, discourse features, and process in integrated writing tasks. Spaan Fellow Working Papers in Second or Foreign Language Assessment 7, 47 - 84.
- Geisinger, Kurt F. - Carlson, Janet F. 1995.
Testing Students with Disabilities
- Gibson, E. J., Brewer, P. W. Dholakia, A., Vouk, M. A., and Bitzer, D. L. 1995.
A Comparative Analysis of Web-Based Testing and Evaluation Systems. North Carolina University.
- Gilfert, S. 1996. A Review of TOEIC The Internet TESOL Journal 11, 8.
- Ginther, A. 2001.
Effects of the presence and absence of visuals on performance on TOEFL CBT listening-comprehension stimuli
TOEFL Research Report 66, Princeton, N.J.: Educational Testing Service.
- Glass, G. V. 1978. Standards and criteria Journal of Educational Measurement 15, 4, 237 - 261.
- Godwin-Jones, B. 2001.
Language Testing Tools and Technology Language Learning & Technology,
Vol. 5, No. 2, May 2001, 8-12
- Gorsuch, G. J. and Cox, T. 2000.
Something Old, Something New, Something Borrowed, Something....: Piloting a Computer Mediated Version of the Michigan Listening Comprehension Test
TESOL EJ 4, 4.
- Grabe, W. & Jiang, X. 2014.
Assessing Reading. In Kunnan, A. J. (Ed.) The Companion to Language Assessment (pp. 185 - 200). London: John Wiley and Sons.
- Grant. S. G. 2000 Teachers and Tests: Exploring Teachers' Perceptions of Changes in the New York State Testing Program Education Policy Analysis Archives, 8, 14.
- Godwin-Jones, B. 2001.
Emerging Tools: Language Testing Tools and Technologies.
Language Learning and Technology, Vol 5, No. 2, May 2001, 8 - 12.
- Gorin, J. S. 2007.
Reconsidering Issues in Validity Theory. Educational Researcher 36, 8, 456 - 462.
- Grabowski, K. C. 2007.
Reconsidering the measurement of pragmatic knowledge using a reciprocal written task format. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 7, 1.
- Gruba, P. A. 1999.
The role of digital video media in second language listening comprehension. University of Melbourne: Unpublished PhD thesis.
- Gu, L., Drake, S., and Wolfe, E. W. 2006.
Differential Item Functioning of GRE Mathematics Items Across Computerized and Paper-and-Pencil Testing Media. Journal of Technology, Learning, and Assessment 5, 4.
- Haji pour Nezhad, G. R. 2002.
Reading complexity judgments, Episode 1.
JALT Testing and Evaluation SIG Newsletter, 5, 3, 2 - 5.
- Haji pour Nezhad, G. R. 2002.
Reading complexity judgments, Episode 2.
JALT Testing and Evaluation SIG Newsletter, 6, 1, 2 - 5.
- Haji pour Nezhad, G. R. 2002.
Reading complexity judgments, Episode 3.
JALT Testing and Evaluation SIG Newsletter, 6, 2, 2 - 5.
- Haladyna, T. M. and Downing, S. M. (1989).
A Taxonomy of Multiple-Choice Item-Writing Rules.Applied Measurement in Education, 21(1), 37 - 50.
- Hamilton, L. S., Klein, S. P., and Lorie, W. No Date.
Using Web-Based Testing for Large-Scale Assessment Rand Education.
- Hansen, E. G., Forer, D. C., & Lee, M. J. 2004.
Toward accessible computer-based tests: Prototypes for visual and other disabilities.
TOEFL Research Report RR-78. Princeton, NJ: Educational Testing Service.
- Harding, L. 2008.
Accent and academic listening assessment: A study of test-taker perceptions.
Melbourne Papers in Language Testing 13, 1.
- Harlen, W. H. and Crick, R. D. 2002.
A Systematic Review of the impact of summative assessment and tests on students'
motivation for learning.
London: Institute of Education, Evidence for Policy and Practice Information
and Co-ordinating Centre.
- Hong, W-P, 2008.
Does high-stakes testing increase cultural capital among low-income and racial minority students?.
Educational Policy Analysis Archives, 16, 6.
- Huitt, B., Hummel, J. and Kaeck, D. 1995. Assessment, Measurement, Evaluation and Research Valdosta State University
- Hutchison, D. and Benton, T. 2009.
Parallel Universes and Parrallel Measures: Estimating the Reliability of Test Results.
London: OFQUAL and the National Foundation for Educational Research.
- Huhta, A. 2009.
An analysis of the quality of English testing for aviation purposes in Finland. Australian Review of Applied Linguistics 32(3).
- Jacobsen, M., Kremer, R., and Flores, R. 1999
WebCT in Computer Science New Currents in Teaching and Learning, 6, 3.
- Jamieson, J., Jones, S., Kirsch, I., Mosenthal, P., Taylor, C. 2000
TOEFL 2000 Framework: A Working Paper
Educational Testing Service, Princeton NJ.
- Jia, Y., and Zhang, W. 2007
Evaluating the construct validity of an EFL test for PhD candidates: A quantitative analysis of two versions
Shiken, 11, 1, 2 - 16.
- Joint Committee on Testing Practices. 2004.
Code of Fair Testing Practices in Education.
American Psychological Association.
- Kane, M. 2001.
Current Concerns in Validity Theory.
Journal of Educational Measurement, 38, 4, 319 - 342.
- Kane, M. 2010.
Errors of Measurement, Theory, and Public Policy.
12th Annual William H. Angoff Memorial Lecture. Princeton, NJ: Educational Testing Service.
- Kang, O. 2008.
Ratings of L2 oral performance in English: Relative impact of rater characteristics and accoustic measures of accendtedness. Spaan Fellow Working Papers in Second or Foreign Language Assessment 6, 181 - 205.
- Karavas, E., and Delieza, X. 2009.
On-site observation of KPG oral examiners: Implications for oral examiner training and evaluation.
Journal of Applied Language Studies 3, 1, 51 - 77.
- Kehoe, J. 1995.
Basic Item Analysis for Multiple-Choice Tests.
- Kehoe, J. 1995.
Writing Multiple Choice Test Items.
- Kenworthy, R. 2006.
Timed versus At-home Assessment Tests: Does Time Affect the Quality of Second Language Learners' Written Compositions?
TESOL-EJ 10, 1.
- Kenyon, D. M. and Malabonga, V. 2001.
Comparing examinee attitudes toward computer-assisted and otheroral proficiency assessments.
Language Learning and Technology, Vol 5, No. 2, May 2001, 60 - 83.
- Kim, H. J. and Shin, H. W. 2006.
A reading and writing placement test: Design, evaluation, and analyais. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 6, 2.
- Kirsch, I., Jamieson, J., Taylor, C., and Eignor, D. 1998.
Computer Familiarity Among TOEFL Examinees
TOEFL Research Report 59, Educational Testing Service,
- Kitao, S. K. and Kitao, K. 1996. Testing
Communicative Competence Internet TESOL Journal, 2, 5.
- Kitao, S. K. and Kitao, K. 1996.
Testing Grammar Internet TESOL Journal, 2, 6.
- Kluitmann, S. (2008).
Testing English as a Foreign Language. Two EFL-Tests used in Germany. Philologische Fakultat, Albert-Ludwigs-Universitat Freiburg.
- Kirkpatrick, R. (2011).
The Negative Backwash of Exam-Oriented Education on Chinese High School Students Language Testing in Asia 1(3), 55 - 71.
- Kitao, S. K. and Kitao, K. 1996.
Testing Listening Internet TESOL Journal, 2, 7.
- Knoch, U. 2008.
Collaborating with ESP Stakeholders in Rating Scale Validation: The case of the ICAO Rating Scale.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 7, 21 - 46.
- Knoch, U. 2009.
The assessment of academic style in EAP writing: The case of the rating scale.
Melbourne Papers in Language Testing 13, 1.
- Koizumi, R. 2006.
Relationships Between Productive Vocabulary Knowledge and Speaking Performance of Japanese Learners of English at the Novice Level. Unpublished PhD thesis, University of Tsukuba, Japan.
- Koizumi, R., and Hirai, A. (2012).
Comparing the story retelling speaking test with other speaking tests. JALT Journal 34(1).
- Koretz, D., Russell, M., Shin, C. D., Horn, C. and Shasby, K. 2002. Testing and diversity in postsecondary
education: The case of California Education Policy Analysis Archives, 10, 1.
- Kunnan, A. J. 2000.
Fairness and Justice for All. In A. J. Kunnan (Ed.) Fairness and Validation in Language Assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida. Studies in Language Testing 9. pp. 1 – 14. Cambridge: Cambridge University Press.
- Kyllonen, P. C. 2005.
The case for noncognitive assessments. R&D Connections 3. Princeton, NJ: Educational Testing Service.
- Laborda, J. G. 2007.
From Fulcher to PLEVALEX: Issues in Interface design, validity and reliability in Internet based Language Testing CALL-EJ Online 9, 1.
- Laborda, J. G. 2007.
On the Net: Introducing Standardized EFL/ESL Exams Language Learning & Technology 11, 2, 3 - 9.
- Laborda, J. G. 2012.
Preliminary Findings of the PAULEX Project: A Proposal for the Internet-based Velencian University Entrance Examination Journal of Language Teaching and Research 3, 2, 250 - 255.
- Lane, S. 1999.
Validity Evidence for Assessments Reidy Interactive Lecture Series.
- Lazaraton, A. and Wagner, S. (1996).
The Revised TSE test: Discourse Analysis of Native Speaker and Nonnative Speaker Data Research Report 96-10. Princeton NJ: Educational Testing Service.
- Lee, Y-W. 2005.
Dependability of Scores for a New ESL Speaking Test: Evaluating Prototype Tasks.. TOEFL Monograph Series MS-28. Princeton, NJ: Educational Testing Service.
- Lee, Y.-W., Breland, H., & Muraki, E. 2004.
Comparability of TOEFL CBT writing prompts for different native language groups.
TOEFL Research Report RR-77. Princeton, NJ: Educational Testing Service.
- Lewkowicz, J. A. 2000.
Authenticity in language testing: some outstanding questions. Language Testing 17, 1, 43 - 64.
- Liao, C-W, Hatrak, N. and Yu, G. (2010)
Comparison of Content, Item Statistics and Test-Taker Performance for the Redesigned and Classic TOEIC Listening and Reading Tests. Princeton NJ: Educational Testing Service
- Lightsone, K and Smith, S. M. 2009.
Student Choice between Computer and Traditional Paper-and-Pencil University Tests: What Predicts Preference and Performance?
Revue internationale des technologies en pedagogie universitaire / International Journal of Technologies in Higher Education, vol. 6, 1, 2009, p. 30-45.
- Lim, G. S. 2009.
Prompt and Rater Effects in Second Language Writing Performance Assessment. University of Michigan: Unpublished PhD Thesis.
- Lim, G. S. 2019.
Prompt and Rater Effects in Second Lnguage Writing Performance Assessment. Unpublished PhD Thesis, University of Michigan.
- Lim, G. S. 2014.
Assessing English in Europe. In Kunnan, A. J. (Ed.) The Companion to Language Assessment (pp. 1431 - 1451). London: John Wiley and Sons.
- Linn, R. L. 2003. Performance Standards: Utilitily for Different Uses of Assessments. Education Policy Analysis Archives Volume 11 Number 31.
- Linn, R. L. 2010. Comments on Atkinson and Geiser: Considerations for Colleage Admissions Tests. Educational Researcher 38, 9, 677 - 679.
- Linn, R. L., Baker, E. L. and Dunbar, S. B. 1991.
Complex, Performance-Based Assessment: Expectations and Validation Criteria. CSE Technical Report 331.
- Livingstone, S. A. 2009.
Constructed-response test questions: Why we use them; how we score them. R&D Connections 11. Princeton, NJ: Educational Testing Service.
- Liu, O L. 2009.
Measuring learning outcomes in higher education. R&D Connections 10. Princeton, NJ: Educational Testing Service.
- Loevinger, J. 1957.
Objective tests as instruments of psychological theory. Psychological Reports 3, 635 - 694. Southern Universities Press, Monograph Supplement 9.
- Loulou, D. 1995.
Making the A: How To Study for Tests.
ERIC/AE Digest Series EDO-TM-95-10
- Low, G. No date.
Communicative Testing as an Optimistic Activity.
Manuscript from the Language Centre, University of Hong Kong.
- Lynch, B. K. and Davidson, F. 1994.
Criterion-Referenced Language Test Development: Linking Curricula, Teachers and Tests.
TESOL Quarterly 28, 4, 727 - 743.
- Malone, M. 2000.
Simulated Oral Proficiency Interviews: Recent Developments. ERIC Digest.
- May, L. 2006.
An examination of rater orientations on a paired candidate discussion task through stimulated verbal recall. Melbourne Papers in Language Testing 11, 1.
- May, L. 2010.
Developing speaking assessment tasks to reflect the 'social turn' in language testing. University of Sydney Papers in TESOL 5, 1 - 30.
- McAulay, A. 2002.
Peer and Self-evaluation in Spoken Tests: Tools and Methods Internet TESOL Journal, September.
- McClellan, C. 2010.
Constructed-Response Scoring - Doing it Right R&D Connections 13. Princeton, NJ: Educational Testing Service.
- McLean, L., Myers, M., Smillie, C., and Vaillancourt, D. 1997. Qualitative Research Methods: An essay review. Education Policy Analysis Archives, 5, 13.
- McNamara, T. (1997). "Problematising content validity: the Occupational English Test (OET) as a measure of medical communication." Melbourne Papers in Language Testing 6(1) 19 - 43.
- Mehrens, A. A. No Date.
Preparing Students to Take Standardized Achievement Tests
- Messerklinger, J. 1997. Evaluating Oral Ability The Language Teacher Online, 21, 11.
- Messick, S. (1988). Consequences of Test Interpretation and Use: The Fusion of Validity and Values in Psychological Assessment Research Report 48, Princeton NJ: Educational Testing Service.
- Mills, A., Swain, L. and Weschler, R. 1996.
The Implementation of a First Year English Placement System Internet TESOL Journal, 2, 11.
- Milton, J. 2006.
French as a Foreign Language and the Common European Framework of Reference for Languages.
Proceedngs from the Crossing Frontiers: Languages and International Dimension conference, Cardiff University, 6 - 7 July.
- Mislevy, R. J., 1992.
Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. Princeton, NJ: Educational Testing Service.
- Mislevy, R. J., Behrens, J. T., Bennett, R. E., Demark, S. F., Frezzo, D. C., Levy, R., Robinson, D. H., Rutstein, D. W., Shute, V. J., Stanley, K. & Fielding, I. W. 2010.
On the roles of external knowledge representations in assessment design. Journal of Technology, Learning, and Assessment 8, 2.
- Mislevy, R. J., Chapelle, C., Chung, Y-R. and Xu, J. 2008.
Options for Adaptivity in Computer-Assisted Language Learning and Assessment. In Chapelle, C. A., Chung, Y-R., and Xu, J. (Eds.) Towards adaptive CALL: Natural language processing for diagnostic language assessment Ames, IA: Iowa State University, 9 - 24.
- Mislevy, R. J., Steinberg, L. S., and Almond, R. G. (2002).
Design and analysis in task-based language assessment. Language Testing 19, 4, 477 - 496.
- Mislevy, R. J. & Yin, C. 2009.
If Language is a Complex Adaptive System, What is Language Assessment? Paper presented at Language as a Complex Adaptive System conference at the University of Michigan, Ann Arbor, 7 - 9th November, 2008.
- Monaghan, W. 2006.
The facts about subscores. R&D Connections 4. Princeton, NJ: Educational Testing Service.
- Monaghan, W. and Bridgeman, B. 2005.
E-rater as a quality control on human scores. R&D Connections 2. Princeton, NJ: Educational Testing Service.
- Moritoshi, P. 2001.
The Test of English for International Communication (TOEIC): necessity, proficiency levels,
test score utilization and accuracy. University of Birmingham, UK: MA assignment.
- Moritoshi, P. 2002.
Validation of the Test of English Conversation Proficiency.
University of Birmingham: MA dissertation.
- Moodie, I. 2008.
Using Pair Work Exams for Testing in the ESL/EFL Conversation Classes.
Internet TESL Journal XIV, 8.
- Mueller, J. 2003.
Authentic Assessment Toolbox. North Central College, Naperville, IL.
- Mousavi, S. A. 2007.
Computer Package for the Assessment of Oral Proficiency of Adult ESL Learners: Implications for Score Comparability. Griffith University: Unpublished PhD Thesis.
- Newfields, T. 2005.
TOEIC Washback Effects on Teachers: A Pilot Study at One University Faculty
Educational Policy Archives, 14, 1.
- Newfields, T. 2006.
Teacher development and assessment literacy
Authentic Communication: Proceedings of teh 5th Annual JALT Pan-SIG Conference Shizuoka, Japan: Tokai University College of Marine Sciences, 48 - 73.
- Nichols, S. L. and Glass, G. V. 2006. High-Stakes Tesing and Student Achievement: Does Accountability Pressure Increase Student Learning?
Toyo University Keizai Ronshu, 31, 1, 83 - 106
- Norris, J. M. 2004.
Validity Evaluation in Foreign Language Assessment. Unpublished PhD thesis: University of Hawaii.
- North, B. 2004.
'Europe's framework promotes language discussion, not directives'. Education Guardian, 15 April.
A reply to Glenn Fulcher
- Norris, J. M. 2001.
Concerns with computerized adaptive oral proficiency assessment.
Language Learning and Technology, Vol 5, No. 2, May 2001, 99 - 105.
- Ohkubo, N. 2009.
Validating the integrated writing task of the TOEFL internet-based test (iBT): Linguistic Analysis of test takers' use of input material.
Melbourne Papers in Language Testing 14, 1.
- O'Loughlin, K. 2006.
Learning about second language assessment: Insights from a postgraduate student on-line subject forum.. University of Sydney Papers in TESOL 1, 71 - 85
- O'Loughlin, K. 2009.
Does it measure up? Benchmarking the written examination of a university English pathway program. Melbourne Papers in Language Testing 14, 1.
- O'Neil, H. F. and Schacter, J. 1997.
Test Specifications for Problem-Solving Assessment.
CRESST/University of California, Los Angeles: CSE Technical Report 463.
- O'Sullivan, B. 2007.
Testing Speaking in Larger Classes
Humanising Language Teaching 9, 4.
- O'Sullivan, B., Weir, C. J., and Saville, N.
Using observation checklists to validate speaking-test tasks. Language Testing 19, 1, 33 - 56.
- Papageorgiou, S. (2007). Relating the Trinity College London GESE and ISE exams to the Common European Framework of Reference: Piloting of the Council of Europe draft Manual. London: Trinity College London.
- Papajohn, D. 2006.
Standard setting for next generation TOEFL Academic Speaking Test (TAST): Reflections on the ETS Panel of International Teaching Assistant Developers
TESOL-EJ 10, 1.
- Park, T. 2004.
An investigation of an ESL placement test of writing using Many-facet Rasch Measurement
Teachers College, Columbia University Papers in TESOL and Applied Linguistics 4, 1.
- Peirce, B. N., and Stewart, G. 1997.
The Development of the Canadian Language Benchmarks Assessment TESL Canada Journal 14, 2, 17 - 31.
- Penfield, R. D. (2010).
Test-based grade retention: Does it stand up to pfoessional standards for fair and appropriate test use? Educational Researcher, 39, 2, 110 - 119.
- Perea, L. (2010).
Benefits of Teachers' Feedback to Reverse-Engineering Item Language Test Specifications from an Existing Item Bank. Texas Papers in Foreign Language Education 15(1), 30 - 54.
- Phakiti, A. 2006.
Modeling cognitive and metacognitive strategies and their relationship to EFL reading test performance. Melbourne Papers in Language Testing 11, 1.
- Poole, G. 2003.
Assessing Japan's Institutional Entrance Requirements. Asian EFL Journal 5, 1.
- Poonpon, K. 2010.
Expanding a Second Language Speaking Rating Scale for Instructional and Assessment Purposes.
Spaan Fellow Working Papers in Second or Foreign Language Assessment 8, 69 - 94.
- Popham, J. W. 2012.
Assessment Bias: How to Banish It. Boston MA: Pearson Education.
- Powers, D. E. 2010.
The case for a comprehensive, four-skills assessment of English-language proficiency R&D Connections 14. Princeton, NJ: Educational Testing Service.
- Praphal, K. 1990.
The relevance of language testing research in the planning of language programmes.
Thailand: Chulalongkorn University.
- Ranali, J. M. 2002.
Comparing scoring procedures on a cloze test.
University of Birmingham, UK: MA assignment.
- Read, J. 2004.
Second Language Vocabulary Testing: Taking a Broader Perspective. Paper delivered at the International Conference on English Instruction and Assessment.
- Read, J. 2007.
Second language vocabulary assessment: Current practices and new directions. Journal of English Studies, 7, 2, 105 - 125.
- Robb, T. N. & Ercanbrack, J. 1999.
A Study of the Effect of Direct Test Preparation on
the TOEIC Scores of Japanese University Students
TESOL-EJ, 3, 4.
- Roever, C. 2001.
Web based language testing.
Language Learning and Technology, Vol 5, No. 2, May 2001, 84 - 94.
- Roever, C. and Powers, D. E.. 2005.
Effects of language administration on a self-assessment of language skills.
TOEFL Monograph No. MS-27. Princeton, NJ: Educational Testing Service.
- Rosenfeld, M., Leung, S., & Oltman, P. K. . 2001.
The reading, writing, speaking, and listening tasks important for academic success at the undergraduate and graduate levels.
TOEFL Monograph No. MS-21. Princeton, NJ: Educational Testing Service.
- Rosenshine, B. 2003. High Stakes Testing: Another analysis.
Education Policy Analysis Archives
Volume 11 Number 24
- Ross, J. A. 2006.
The Reliability, Validity, and Utility of Self-Assessment.
Practical Assessment, Research and Evaluation
Volume 11 Number 10
- Rudner, L. 1994.
Questions to ask when evaluating tests.
ERIC Clearinghouse on Assessment and Evaluation.
- Rudner, L. 1998.
An Online, Interactive, Computer Adaptive Test Tutorial.
ERIC Clearinghouse on Assessment and Evaluation.
- Rudner, L. 2001.
Reliability. ERIC Clearinghouse on Assessment and Evaluation.
- Rudner, L. 2006.
An evaluation of IntelliMetric Essay Scoring System. Journal of Technology, Learning, and Assessment 4, 4.
- Runnels, J. 2014.
An exploratory reliability and content analysis of the CEFR-Japan's A-Level Can-Do Statements. JALT Journal 36(1).
- Russell, M.1999. Testing On Computers: A Follow-up Study Comparing Performance On Computer and On Paper Education Policy Analysis Archives, 7, 20.
- Russell, M. and Haney, W. 1997.
Testing Writing on Computers: An Experiment Comparing Student Performance on Tests Conducted via Computer and via Paper-and-Pencil Education Policy Analysis Archives, 5, 3.
- Russell, M. and Haney, W. 2000. Bridging the Gap between Testing and Technology in Schools. Education Policy Analysis Archives, 8, 19.
- Sanders, W. and Horn, S. P. 1995. Educational Assessment Reassessed: The Usefulness of Standardized and Alternative Measures of Student
Achievement as Indicators for the Assessment of Educational Outcomes Education Policy Archives, 3, 6.
- Sarle, Warren S. 1995. Measurement theory:
Frequently asked questions From the Disseminations of the International Statistical Applications Institute, 4th edition, Wichita: ACG Press, 61-66.
Also available at: ftp://ftp.sas.com/pub/neural/measurement.html
- Sasaki, M., and Hirose, K. 1996.
Explanatory Variables for EFL Students' Expository Writing. Language Learning 46, 1, 137 - 174.
- Sawaki, Y. 2001.
Comparability of Conventional and Computerized Tests of Reading in a Second Language. Language Learning and Technology
Vol. 5, No. 2, May 2001, pp. 38-59 .
- Sawaki, Y. and Nissan, S. 2009.
Criterion-related validity of the TOEFLiBT Listening Section. TOEFL Research Report 09-02. Princeton, NJ: Educational Testing Service.
- Scharber, C., Dexter, A. and Riedel, E. 2008.
Students' Experiences with an Automated Essay Scorer. The Journal of Technology, Learning, and Assessment.
- Sheehan, K. M., Kostin, I., Futagi, Y & Flor, M. 2010.
Generating Automated Text Complexity Classifications that are Aligned with Targeted Text Complexity Standards. ETS Research Report 10-28. Princeton NJ: Educational Testing Service.
- Shohamy, E. 2007.
Language Tests as Language Policy Tools. Assessment in Education 14, 1, 117 - 130.
- Sireci, S. G. 2007.
On Validity Theory. Educational Researcher 36, 8, 477 - 481.
- Sokolik, M. and Duber, J. 2002.
Grow Your Own: Online Placement Testing TESL-EJ, 6, 1.
- Spolsky, B. 1968. What does it mean to know a language? Or how do you get someone to perform his competence? Washington DC: ERIC Database.
- Stansfield, C. W. 1992. ACTFL Speaking Proficiency Guidelines Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Stansfield, C. W. 1993. Ethics, Standards, and Professionalism in Language Testing. Issues in Applied Linguistics 4(2), 189 - 206. Reproduced with kind permission of Charles Stansfield.
- Stansfield, C. W. 1996. Content Assessment in the Native Language Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Stansfield, C. W. & Kenyon, D. 1996. Simulated Oral Proficiency Interviews: An Update Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Stricker, L. J. 2002.
The Performance of Native Speakers of English and ESL Speakers on the Computer-Based TOEFL
and the GRE General Test.
Princeton NJ: Educational Testing Service, TOEFL Research Report 69.
- Suvorov, R. and Hegelheimer, V. 2014.
Computer-Assisted Language Testing. In Kunnan, A. J. (Ed.) The Companion to Language Assessment (pp. 594 - 613). London: John Wiley and Sons.
- Swain, M., Huang, L-S, Barkaoui, K., Brooks, L., and Lapkin, S. 2009.
The Speaking Section of the TOEFL iBT: Test-takers' Reported Strategic Behaviors.
Princeton NJ: Educational Testing Service, TOEFLiBT Research Report 09-30.
- Tannenbaum, J. 1996. Practical Ideas On Alternative Assessment For ESL Students Washington D.C.: ERIC Clearinghouse on Languages and Linguistics.
- Tannenbaum, R. J. and Wylie, E. C. 2008. Linking English language test scores onto the Common European Framework of Reference: An application of standard setting methodology. TOEFL iBT Report iBT-06. Princeton, N.J: Educational Testing Service.
- Tasdemir, M., Tasdemir, A., and Yildirim, K. (2009)Influence of Portfolio Evaluation in Cooperative Learning on Student Success.
Journal of Theory and Practice in Education, 5, 1, 53 - 66.
- Taylor, C. S. and Nolan, S. B. 1996.
What does the psychometrician's classroom look like? Reframing assessment concepts in the context of learning.
Educational Policy Archives, 14, 7.
- Taylor, C., Jamieson, J., Eignor, D., & Kirsch, I. 1998.
The relationship between computer familiarity and performance on computer-based TOEFL test tasks.
TOEFL Research Report RR-61. Princeton, NJ: Educational Testing Service.
- Taylor, L. (2009).
. Developing Assessment Literacy. Annual Review of Applied Linguistics 29, 21 - 36.
- Templer, B. 2004. High-Stakes Testing at High Fees: Notes and Queries on the International English Proficiency Assessment Market.
Journal for Critical Education Policy Studies, 2, 1.
- Thompson, G. 2009.
Reevaluating the Test Specifications for an Oral Proficiency Test? The Journal of Kanda University of International Studies 21, 233 - 260.
- Tsang, S. L., Katz, A. and Stack, J. 2008. Achieving Testing for English Language Learners, Ready or Not? Educational Policy Archives, 16, 1.
- Tuzi, F. 1997. Using Microsoft Word to Generate Computerized Tests Internet TESOL Journal, 3, 11.
- Wagner, E. 2002.
Video listening tests: A pilot study.
Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 2, 1.
- Wagner, E. 2007.
Are They Watching? Test-Taker Viewing Behavior During an L2 Video Listening Test.
Language Learning and Technology, 11, 1.
- Walker, M. E. 2007.
Is test score reliability necessary? R&D Connections 5. Princeton, NJ: Educational Testing Service.
- Wall, D., & Horak, T. 2006.
The impact of changes in the TOEFL examination on teaching and learning in central and eastern Europe. Phase I: The baseline study .
TOEFL Monograph No. MS-34. Princeton, NJ: Educational Testing Service.
- Wall, D., & Horak, T. 2008.
The impact of changes in the TOEFL examination on teaching and learning in central and eastern Europe. Phase 2: Coping with change .
TOEFL iBT Report No. iBT-05. Princeton, NJ: Educational Testing Service.
- Wall, D., & Horak, T. 2011.
The Impact of Changes in he TOEFL Exam on Teaching in a Sample of Countries in Europe: Phase 3, The Role of the Coursebook, Phase 4, Describing Change. TOEFL iBT Report No. iBT-17. Princeton, NJ: Educational Testing Service.
- Wang, J., and Brown, M. S. 2007.
Automated Essay Scoring Versus Human Scoring: A Comparative Study. Journal of Technology, Learning, and Assessment, 6, 2.
- Weideman, A. 2006.
Assessing Academic Literacy in a Task-Based Approach.
Language Matters 37, 1, 81 - 101.
- Wendler, C. and Powers, D. 2009.
What does it mean to repurpose a test? R&D Connections 9. Princeton, NJ: Educational Testing Service.
- Wigglesworth, G. & Keegan, P. 2014.
Assessing Australian and New Zealand Indigenous Languages. In Kunnan, A. J. (Ed.) The Companion to Language Assessment (pp. 1949 - 1960). London: John Wiley and Sons.
- Wilson, N. 1998.
Educational Standards and the Problem of Error. Educational Policy Archives, 6, 10.
- Wolfe, E. W., Matthews, S., and Vickers, D. 2010.
The effectiveness and efficiency of distributed online, regional online, and regional face-to-face training for writing assessment raters. Journal of Technology, Learning, and Assessment 10, 1.
- Wolfe, E. W. and Manalo, J. R. 2004.
Composition Medium Comparability in a Direct Assessment of Non-native English Speakers.
Language Learning and Technology, 8, 1, 52 - 65.
- Wright, P. W. D. and Wright, P. D. 2004.
Understanding Tests and Measurements for the Parent and Advocate.
- Wylie, E.
An overview of the International Second Language Proficiency Ratings (ISLPR).
Australia: Griffith University Centre for Applied Linguistics and Languages.
- Yen, D. A. and Kuzma, J. No date.
Higher IELTS score, higher academic performance? The validity of IELTS in predicting the academic performance of Chinese students. Mimeo: University of Worcester.
- Yerkes, R. M. (1921).
Psychological Examining in the United States Army. Memoirs of the National Academy of Science, Volume 15.
- Yoff, L. 1997. 'An overview of ACTFL proficiency interviews. A test of speaking ability.' JALT Testing and Evaluation SIG Newsletter,
1, 2, 3 - 9.
- York, T. T., Gibson, C. and Rankin, S. 2015. 'Defining and measuring academic success. Practical Assessment, Research & Evaluation 20(5).
- Young, J. W. 2008.
Ensuring valid test content tests for English language learners. R&D Connections 8. Princeton, NJ: Educational Testing Service.
- Young, V. M. and Kim. D. H. 2010.
Using Assessments for Instructional Improvement: A Literature Review. Educational Policy Analysis Archives 18, 19.
- Yu, E. 2006. A Comparative Study of the Effects of a Computerized English Oral Proficiency Test Format and a Conventional Speak Test Format. Unpublished PhD Thesis: Ohio State University.
- Zechner, K. and Xi, X. 2008.
Towards automatic scoring of a test of spoken language with heterogeneous task types. Proceedings of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications Association for Computational Linguistics, Columbus Ohio, 98 - 106.
- Zimmerman, D. W. and Zumbo, B. D. 2009.
Hazards in choosing between pooled and separate variances t tests. Psicologica 30, 371 - 390.
- Zumbo, B. D. 2009.
Validity as Contextualized and Pragmatic
Explanation, and Its Implications for Validation Practice. In Robert
W. Lissitz (Ed.) The Concept of Validity: Revisions, New Directions
and Applications, (pp. 65-82). IAP - Information Age Publishing,
Inc.: Charlotte, NC.