|
Language Testing Bytes is a podcast to accompany the SAGE journal Language Testing. Three or four times per year, we will release a podcast in which we discuss topics related to a particular issue of the journal. This may be an interview with a contributor to the journal, or another expert in the field. You can download the podcast from this website, from ltj.sagepub.com, or you can subscribe to the podcast through iTunes.
|
|
Coming Soon: In issue 9, scheduled for early Spring, we will be joined by Luke Harding of the University of Lancaster to talk about the role played by speaker accent in listening tests.
|
Current Journal Content
Validating score interpretations and uses
by Kane, M.
The argument-based approach to validation involves two steps; specification of the proposed interpretations and uses of the test scores as an interpretive argument, and the evaluation of the plausi... (show all)
The argument-based approach to validation involves two steps; specification of the proposed interpretations and uses of the test scores as an interpretive argument, and the evaluation of the plausibility of the proposed interpretive argument. More ambitious interpretations and uses tend to involve an extended network of inferences and assumptions and require extensive evidence for their support. Simpler interpretations do not claim much, and therefore, may not require much evidential support. The evaluation of score based decisions generally requires an evaluation of the consequences of the decision rule. In any case, the claims that are being made need to be justified.
(show less)
Validity argument for language assessment: The framework is simple...
by Chapelle, C. A.
Grounding the argument-based framework for validating score interpretations and uses
by Oller, J. W.
Kane’s argument-based framework is summarized and examined. He implicitly appeals to the backgrounded concepts of fairness and justice. From there it is a short distance to grounding the whole syst... (show all)
Kane’s argument-based framework is summarized and examined. He implicitly appeals to the backgrounded concepts of fairness and justice. From there it is a short distance to grounding the whole system in the mundane notion of truth. In fact, valid argument systems must depend on representations that are ‘true’ by virtue of agreement with purported facts. As a friendly amendment, therefore, I argue that (provided the ceteris paribus, all else being equal, requirement is met) agreement with known facts in testing, experimental research, and scientific measurement counts for a great deal more than disagreement. It follows by Peircean ‘exact logic’ that higher test scores (if the tests have any validity at all) are invariably more informative (interpretable in general) and thus more useful than lower scores. Why? Because higher scores show more agreement between the test-makers and the higher scoring test-takers about whatever facts (or performances) may be at issue. Exceptions are cases where the ceteris paribus requirement is not met. Necessary (but testable) inferences follow for interpretations and uses of ‘cutscores.’
(show less)
Kane, validity and soundness
by Davies, A.
Confidence scoring of speaking performance: How does fuzziness become exact?
by Jin, T., Mak, B., Zhou, P.
The fuzziness of assessing second language speaking performance raises two difficulties in scoring speaking performance: indistinction between adjacent levels and overlap between scales. To address... (show all)
The fuzziness of assessing second language speaking performance raises two difficulties in scoring speaking performance: indistinction between adjacent levels and overlap between scales. To address these two problems, this article proposes a new approach, confidence scoring, to deal with such fuzziness, leading to confidence scores between two adjacent levels applied to three scales. Since confidence scores have to be transformed to an exact score for test interpretation and use, membership functions and rule bases are applied and a confidence scoring algorithm is developed. Confidence scoring is demonstrated in the paper by an example to facilitate easy understanding. The paper then describes a pilot study that was conducted to try out the confidence scoring design. Initial results reveal that: first, confidence scoring is as feasible as traditional scoring; second, confidence scoring performs better in scoring dependability and in correlations with established benchmarks. At the end of the article, further studies are called for in order to build a validity argument and make further revisions to the confidence scoring method described here.
(show less)
Note-taking quality and performance on an L2 academic listening test
by Song, M.-Y.
This study investigated the relationships among the quality of L2 test takers’ notes evaluated in terms of different levels of information and test takers’ performance on open-ended listening tasks... (show all)
This study investigated the relationships among the quality of L2 test takers’ notes evaluated in terms of different levels of information and test takers’ performance on open-ended listening tasks tapping into different comprehension subskills. In addition, this study examined the invariance of the structural relationships among the variables across two different note-taking formats, that is, a blank format and an outline format, by employing a multi-group structural equation modeling (SEM) approach. The results indicated that note quality measures, in particular the number of topical ideas found in the notes and the organization of these notes, may be good indicators of test takers’ second language academic listening proficiency. It was also found that despite the invariance of structural relationships among variables across the two note-taking formats, the associations between the open-ended listening measures and note quality measures were slightly stronger in the outline format than in the blank format. The implications of these results for L2 academic listening assessment are considered.
(show less)
TOEFL iBT speaking test scores as indicators of oral communicative language proficiency
by Bridgeman, B., Powers, D., Stone, E., Mollaun, P.
Scores assigned by trained raters and by an automated scoring system (SpeechRaterTM) on the speaking section of the TOEFL iBT™ were validated against a communicative competence criterion. Specifica... (show all)
Scores assigned by trained raters and by an automated scoring system (SpeechRaterTM) on the speaking section of the TOEFL iBT™ were validated against a communicative competence criterion. Specifically, a sample of 555 undergraduate students listened to speech samples from 184 examinees who took the Test of English as a Foreign Language Internet-based test (TOEFL iBT). Oral communicative effectiveness was evaluated both by rating scales and by the ability of the undergraduate raters to answer multiple-choice questions that could be answered only if the spoken response was understood. Correlations of these communicative competence indicators from the undergraduate raters with speech scores were substantially higher for the scores provided by the professional TOEFL iBT raters than for the scores provided by SpeechRater. Results suggested that both expert raters and SpeechRater are evaluating aspects of communicative competence, but that SpeechRater fails to measure aspects of the construct that human raters can evaluate.
(show less)
Re-fitting for a different purpose: A case study of item writer practices in adapting source texts for a test of academic reading
by Green, A., Hawkey, R.
The important yet under-researched role of item writers in the selection and adaptation of texts for high-stakes reading tests is investigated through a case study involving a group of trained item... (show all)
The important yet under-researched role of item writers in the selection and adaptation of texts for high-stakes reading tests is investigated through a case study involving a group of trained item writers working on the International English Language Testing System (IELTS). In the first phase of the study, participants were invited to reflect in writing, and then audio-recorded in a semantic-differential-based joint discussion, on the processes they employed to generate test material. The group were next observed at a simulated item writers’ editing meeting to refine their texts and items for an IELTS reading test module. The participants’ written descriptions and recorded discussions provided rich data on how source texts were perceived, selected and adapted for the Test. The study reports findings from textual analyses using indices of readability and lexical density from the original material sourced by the item writers and their adapted versions for the test. Results from qualitative and quantitive analyses are discussed in terms of the implications for the IELTS reading module of editing actions such as: reducing redundancy and technical language, changing styles, deciding on potentially sensitive issues and relationships between texts and test items. The important issue of text authenticity in tests such as IELTS is also broached.
(show less)
Factor structure of the revised TOEIC(R) test: A multiple-sample analysis
by In'nami, Y., Koizumi, R.
This study examined the factor structure of the listening and reading sections of the revised Test of English for International Communication (TOEIC®) test. The data from the TOEIC IP (institutiona... (show all)
This study examined the factor structure of the listening and reading sections of the revised Test of English for International Communication (TOEIC®) test. The data from the TOEIC IP (institutional program) test taken by 569 English learners were randomly split into two samples (n = 285 vs. 284). Four models (higher-order, correlated, uncorrelated, and unitary) were hypothesized on the basis of the literature and were tested with each sample. The results from confirmatory factor analysis suggested that the correlated model fit the data best in both samples. Further, multiple-sample analysis using the two samples supported an invariance of factor loadings, measurement error variances, factor variances, and factor covariances for the correlated model in the revised TOEIC test. The presence of distinctive factors of listening and reading skills supports the reporting of separate scores for each skill, whereas the relatively high correlation between the two factors may support single score reporting. This is in accordance with the formats used to report revised TOEIC test scores. The results of the current study provide empirical support for the reporting practice of the revised TOEIC test and thus for test interpretation based on the test scores.
(show less)
Norman Segalowitz, Cognitive Bases of Second Language Fluency
by Dimova, S.
Z.H. Han and T. Cadierno (Eds), Linguistic Relativity in SLA: Thinking for Speaking
by Davies, A.
Manuscript Submission Information
Free Sample Copy
Email Alerts
|
Language Testing is an international peer reviewed journal that
publishes original research on language testing and assessment. Since
1984 it has featured high impact papers covering theoretical issues,
empirical studies, and reviews. The journal's wide scope encompasses
first and second language testing and assessment of English and other
languages, and the use of tests and assessments as research and
evaluation tools. Many articles also contribute to methodological
innovation and the practical improvement of testing and assessment
internationally. In addition, the journal publishes submissions that
deal with policy issues, including the use of language tests and
assessments for high stakes decision making in fields as diverse as
education, employment and international mobility. The journal welcomes
the submission of papers that deal with ethical and philosophical issues
in language testing, as well as technical matters. Also of concern is
research into the washback and impact of language test use, and
ground-breaking uses of assessments for learning. Additionally, the
journal wishes to publish replication studies that help to embed and
extend our knowledge of generalisable findings in the field. Language
Testing is committed to encouraging interdisciplinary research, and is
keen to receive submissions which draw on theory and methodology from
different fields of applied linguistics, as well as educational
measurement, and other relevant disciplines.
|
|
|
Podcasts
Issue 8: Tan Jin and Barley Mak on Confidence Scoring
In Issue 29(1) of the journal three authors from the Chinese University of Hong Kong have a paper on the application of fuzzy logic to scoring speaking tests. This is termed 'confidence scoring', and the first two authors join us on Language Testing Bytes to explain a little more about their novel approach.
Download:
Confidence Scoring
Or Listen Now:
Previous Issues
Issue 7: Mark Wilson on Measurement Models
Mark Wilson delivered the Messick Memorial Lecture at the Language Testing Research Colloquium in Melbourne, 2006, on new developments in measurement models to take into account the complexity of language testing. In Language Testing 28(4) we publish the paper based on this lecture, and Mark joins us on Language Testing Bytes to talk about his work in this area.
Download:
Standards-Based Testing
Or Listen Now:
Issue 6: Craig Deville and Micheline Chalhoub-Deville on Standards-Based Testing
Standards-Based Testing is highly controversial for its social and educational impact on schools and bilingual communities, and the technical aspects that rely to a significant extent on expert judgment. In issue 28(3) we discuss the issues surrounding Standards-Based Testing in the United States with the guest editors of a special issue on this topic. The collection of papers that they have brought together, along with reviews of recent books on the topic, and test review, constitute a state of the art volume for the field.
Download:
Standards-Based Testing
Or Listen Now:
Issue 5: John Read on Vocabulary
The journal has seen a flurry of articles on vocabulary testing in recent months, and issue 28(2) is no exception, with Marta Fairclough's paper on the lexical recognition task. It seemed like an appropriate moment to conisder why vocabulary is receiving so much attention, and so we turned to Professor John Read of the University of Auckland, New Zealand, to give us an overview of current research and activity within the field.
Download:
John Read on Vocabulary
Or Listen Now:
Issue 4: Khaled Barkaoui and Melissa Bowles on Think Aloud Protocols
In Language Testing 28(1), 2011, Khaled Barkaoui has an article on the use of think-alouds to investigate rater processes and decisions as they rate essay samples. The focus is not on the raters, but on whether the research method is a useful tool for the purpose. In this podcast he explains his findings, and their importance. We are then joined by Melissa Bowles who has recently published The Think-Aloud Controversy in Second Language Research, to explain precisely what the problems and possibilities of think-alouds are in language testing research.
Download:
Khaled Barkaoui and Melissa Bowles on Think Aloud Protocols
Or Listen Now:
Issue 3: Jim Purpura on Grammar
Language Testing 27(4), 2010, contains an article by Carol Chapelle and colleagues on testing productive grammatical ability. We thought this would be an excellent opportunity to look at what is going on in the field of assessing grammar, and what issues currently face the field. Jim Purpura agreed to talk to us on Language Testing Bytes.
Jim Purpura on Testing Grammar
Or Listen Now:
Issue 2: Xiaoming Xi on Automated Scoring
Language Testing 27(3), 2010, is a special issue guest edited by Xiaoming Xi on the automated scoring of writing and speaking tests. In this podcast she talks about why the automated scoring of speaking and writing tests is such a hot topic, and explains the possibilities, limitations and current research issues in the field.
Download:
Xiaoming Xi on Automated Scoring
Or Listen Now:
Issue 1: Mike Kane on Validation
In Language Testing 27(2), 2010, Mike Kane contributed a response to an article on fairness in language testing. We thought this was an excellent opportunity to ask him about his approach to validation, and how he sees 'fairness' fitting into the picture.
Download:
Mike Kane on Validation
Or Listen Now:
How to put the podcast onto your iPod
- Decide which of the podcasts below you would like to listen to. Right click on the link, and select 'save target as' to download it into a folder on your computer.
- Open iTunes. Click on 'file' and then 'new playlist'. Name your playlist 'Language Testing Bytes'.
- Click on the playlist from the iTunes menu.
- Open the folder in which you saved the podcast, then drag the podcast from the folder and drop it into the playlist.
- Syncronize your iPod.
- When you next access your iPod go to the Language Testing Bytes playlist to play the podcast.
Alternatively, just pop it on whichever mp3 player you currently
use, or subscribe to the SAGE Podcast on iTunes.
|