The TOEFL iBT (Test of English as a Foreign Language, internet-based test) employs a structured scoring system that converts performance across four sections into both individual section scores and a single total score reported on a 0–120 point scale. Each of the four sections—Reading, Listening, Speaking, and Writing—is scored independently on a 0–30 scale, and these four section scores are summed to produce the final total. Understanding this mechanics is essential for candidates who wish to set evidence-based score targets, interpret their performance reports accurately, and allocate preparation time in proportion to each section's weight in the overall score. This article provides a thorough, section-by-section breakdown of how scores are generated, what the scaled scoring model actually measures, and how candidates can leverage this knowledge to construct a more efficient preparation programme.
The TOEFL iBT scoring framework at a glance
The TOEFL iBT reports five distinct scores: four section scores and one total score. Each section is worth a maximum of 30 points, yielding a combined maximum total of 120 points. ETS (Educational Testing Service), the organisation that administers the test, uses a combination of human raters and automated scoring algorithms to ensure consistency and reliability across the millions of tests administered globally each year. The scoring process is designed to evaluate academic English communication ability across four core dimensions: reading comprehension, listening comprehension, spoken production and interaction, and written production.
The total score is not simply a raw count of correct answers; it is a transformed score that accounts for the relative difficulty of questions encountered. This transformation, known as scaling, ensures that a score of 25 on the Reading section means the same level of ability regardless of which specific set of questions a candidate answered. This feature is particularly important in a computer-adaptive testing environment where different test forms may vary slightly in difficulty. Candidates should understand that the scoring system is engineered for fairness and comparability, not merely for ranking.
- Four sections, each scored 0–30
- Total score = sum of four section scores, maximum 120
- Human raters used for Speaking and Writing; automated scoring supports consistency
- Scaled scoring enables comparison across different test forms
How each TOEFL iBT section is scored
Each section of the TOEFL iBT employs a distinct scoring methodology tailored to the language skill being assessed. Understanding the specific scoring criteria for each section allows candidates to align their practice with the exact dimensions that raters evaluate.
Reading section scoring
The Reading section consists of 35–36 questions based on academic passages. Rather than scoring each question equally, ETS assigns differential point values based on question type and difficulty. Multiple-choice questions typically carry one point each, while more complex items such as prose summary and fill-in-the-blank questions may carry two or more points. The raw score—the number of points earned—is then converted to a scaled score on the 0–30 scale through a statistical process called equating, which adjusts for variations in test form difficulty.
Candidates frequently ask whether all Reading questions are weighted equally. The answer is nuanced: while the total raw point value is normalised, certain item formats contribute disproportionately to the raw score ceiling. Familiarity with the full range of Reading question types—including inference questions, vocabulary-in-context items, and text insertion tasks—provides a strategic advantage during preparation.
Listening section scoring
The Listening section comprises 28–39 questions based on audio recordings of academic lectures and conversations. Like the Reading section, scoring begins with a raw point total that is subsequently converted to a scaled score of 0–30. The questions test comprehension of main ideas, supporting details, speaker attitude, and pragmatic inferences. Multiple-choice questions with a single correct answer typically earn one point, while multiple-answer questions may earn up to two points depending on the number of correct selections.
The Listening section is particularly sensitive to note-taking quality and the ability to track speaker transitions. Because the audio cannot be replayed in the standard test format, developing efficient aural comprehension and recording skills during preparation directly impacts the raw score achievable in this section.
Speaking section scoring
The Speaking section contains four tasks: one independent task (Personal Preferred Topics) and three integrated tasks that combine listening, reading, and speaking skills. Each response is rated by both a human rater and ETS's SpeechRater automated scoring engine. The human rater evaluates the response holistically on dimensions including delivery, language use, and topic development. The SpeechRater evaluates acoustic and linguistic features such as pronunciation, fluency, vocabulary complexity, and syntactic variety.
The final score for each task ranges from 0 to 4, and these task scores are converted to a 0–30 scaled section score. The independent task and the three integrated tasks carry different weights in the conversion table. Understanding this weighting helps candidates allocate their response-planning time appropriately across tasks of varying significance.
Common evaluation criteria across Speaking tasks include: clear pronunciation and natural pacing, appropriate vocabulary selection for academic contexts, coherent organisation with identifiable introduction and conclusion, and substantive elaboration beyond minimal responses.
Writing section scoring
The Writing section comprises two tasks: an Integrated Writing task that requires candidates to read, listen, and then write a response synthesising the two sources, and an Independent Writing task that asks candidates to state and defend a personal opinion in essay form. Both responses are scored by a human rater and e-rater, ETS's automated essay-scoring engine. Each essay receives a score of 0 to 5, and these task scores are converted to a 0–30 scaled section score using a weighted formula.
Key evaluation dimensions for Writing include: development and support of ideas, logical organisation and coherence, appropriate and accurate vocabulary use, and grammatical accuracy in sentence construction. The Integrated Writing task additionally measures the ability to synthesise information from two sources and accurately represent the listening passage's relationship to the reading passage.
Understanding scaled scores versus raw scores
The distinction between raw scores and scaled scores is fundamental to understanding the TOEFL iBT reporting system. A raw score represents the unadjusted number of points earned on a specific test form. Because different test forms contain different numbers of questions and varying proportions of item difficulties, a raw score from one form is not directly comparable to the same raw score from another form.
Scaled scores resolve this comparability problem through a statistical procedure that adjusts raw scores based on the difficulty characteristics of the specific test form administered. This procedure ensures that a scaled score of 25 on the Reading section reflects the same underlying ability regardless of whether the candidate took an easier or harder version of the test. The scaling process is applied independently to each section, and the resulting scaled scores are the values reported to candidates and institutions.
Candidates should note that the relationship between raw scores and scaled scores is not linear. In the lower and upper score ranges, small changes in raw score often produce larger changes in scaled score than equivalent raw-score changes in the middle range. This non-linear relationship has practical implications for score improvement strategies: moving from 20 to 25 on a section may require fewer additional correct answers than moving from 25 to 29, depending on the specific section and the candidate's current ability level.
The scaled score system ensures fairness across test forms, but it also means that raw-score targets alone are insufficient for strategic preparation. Candidates who track only the number of correct answers—without accounting for item difficulty and question weighting—may misjudge their readiness for the actual test.