The TOEFL iBT Writing Task 2 — often called "Writing for an Academic Discussion" — asks a candidate to contribute one written post to a simulated online discussion alongside two classmates' posts. The candidate has ten minutes to read the prompt, take a position, and write a single response of about 120 words, after which a short optional break leads into the integrated Writing Task 1. The task looks small, but the rubric is dense: a single 1-to-6 score is produced by human raters plus an e-rater engine, and a single point on the writing scale can move the overall iBT score. Most candidates reading this lose marks not because their grammar is poor, but because they misread what the rater is actually scoring. The rest of this article walks through the rubric levers, the planning budget, the scaffolding patterns, and the tactical errors that determine whether a response lands at 4 or climbs to 5.
The shape of the prompt: what the candidate actually sees on screen
Every TOEFL Writing Task 2 prompt presents the same skeleton. A short professor prompt frames a question on an academic topic — usually inside higher-education territory such as pedagogy, campus policy, or research methods. Two short posts from named classmates follow; each states a position with one or two reasons. A text box waits for the candidate's own contribution. The candidates are told to read the question and the two posts, take a clear position, and support it with reasons and examples. No minimum word count is enforced, but the integrated e-rater has internal thresholds tied to length, and the response is expected to be substantive enough to demonstrate stance, support, and synthesis. The whole exchange is described in the official materials as an online discussion in an academic class.
Three features of this prompt format are worth memorising before a practice session. First, the question is always framed in a way that has at least two defensible sides, so a candidate cannot fake expertise by restating a fact — they must choose, defend, and add value. Second, the two classmates' posts are written in deliberately distinct voices, so a response that simply agrees with both, or contradicts both without reasoning, loses the synthesis element the rubric rewards. Third, the prompt appears on the screen with a built-in timer that begins the moment the task is revealed; the candidate cannot pause it. That last point is the one that turns a 4 into a 5 in practice: the ten minutes have to be spent on planning before they are spent on sentences.
Many candidates treat the prompt as a mini-essay. It is closer to a focused argument: one position, two or three supporting reasons, and a visible engagement with at least one of the classmates' posts. The rater's eye is trained to look for the spine of the argument, not for decorative vocabulary. Trying to write a long response usually produces a string of loosely related sentences that drift away from the classmate posts; trying to write a very short one usually starves the rubric of evidence that the candidate can develop ideas at all. A target in the 120-to-180-word window is what most strong candidates converge on after a few practice rounds.
How the 1-to-6 score is actually built
The TOEFL iBT Writing Task 2 response is scored on a 1-to-6 scale rather than the older 0-to-30 scale. Three human raters do not, in fact, all touch the response: a single human rater assigns the overall score, while the e-rater engine produces a separate automated score. The two are combined through a rule specified by ETS, and the higher of the combined and the human score is reported. Knowing this changes the way a candidate should write, because the e-rater is not impressed by clever idioms; it scans for sentence-level features, vocabulary range, and the structural fingerprints of an organised response. The human rater reads for content, development, and the visible quality of the thinking.
Three rubric dimensions drive both scores, even though the official materials describe them as a single holistic judgement. The first is task fulfilment, meaning the response takes a clear position, addresses the question, and engages with the classmates' posts. The second is development, meaning the position is supported with reasons and examples rather than asserted as a bare claim. The third is language use, covering grammar, vocabulary, and the range of sentence structures a candidate can deploy without losing accuracy. Candidates who score 5 typically show all three at a working academic level, while 4-level responses tend to satisfy one or two strongly and the third only weakly.
The rater's reading order
In my experience marking or simulating marking with students, the rater's eyes do something predictable. The first five seconds go to the opening sentence: does the candidate state a position, or do they hedge into nothing? The next ten seconds look for the body: are there visible reasons, or is the candidate padding with restatements? The final pass checks language: is the range wide enough that the response would not be flagged as formulaic? A response that fails the first check rarely recovers, even if its grammar is clean. This is why a forceful opening sentence is the single highest-leverage move in the ten-minute budget.
The 10-minute planning budget: where most candidates lose their score
Ten minutes is not a long time. Reading the prompt, planning, drafting, and checking inside that window forces a candidate to make deliberate trade-offs. The most common trade-off I see — and the one that costs a band point more often than any grammar problem — is the choice to start writing before planning. Candidates who jump straight into the text box produce responses that look like a list of opinions, with a stance buried in the third sentence and the engagement with classmates reduced to a token mention. The same ten minutes, spent roughly 3 minutes on reading and outlining and 7 minutes on drafting, produces a noticeably tighter response. Three to four minutes of outline time is the practical sweet spot for a 120-to-180-word response.
The outline itself does not need to be elaborate. A workable plan has four ingredients: a clear stance in one sentence, two or three supporting reasons, one piece of evidence or example for the strongest reason, and a short note on which classmate's post the candidate will engage with and how. Candidates who skip the example step almost always end up repeating the same reason twice in different wording, which the rubric reads as low development. Candidates who skip the engagement note end up ignoring the classmates entirely, which the rubric reads as low task fulfilment. The four-line outline is what separates these two failure modes from a 5-level response.
A worked outline for a representative prompt
Consider a prompt asking whether professors should record and post their lectures. Two classmates post: one argues yes, for accessibility reasons; the other argues no, because students stop attending in person. A strong four-line outline would read: stance — recordings should be posted, with a clear caveat; reason 1 — supports students with health or work conflicts; reason 2 — reduces note-taking pressure and lets students focus on comprehension; example — a specific scenario such as a part-time worker who cannot attend every session; engagement — agree with the accessibility classmate, partially concede the attendance point, and add a fix. A draft built from this outline usually lands between 140 and 170 words and contains every rubric ingredient the rater looks for.
Scaffolding templates that hold the response together
Strong responses tend to share three scaffolding patterns, and weaker responses tend to share three failure patterns. Recognising both is faster than memorising sample essays. Below are the patterns ranked by how often they appear in scored responses; each pattern is something a candidate can practise into muscle memory without sounding formulaic, provided the inside of the template is filled with topic-specific content.
- Stance-first opening. Sentence 1 names the position. Sentence 2 previews the strongest reason. The reader knows within ten words where the response is going.
- Reason-and-example body. Each supporting reason is followed by an example, an analogy, or a concrete scenario. Reasons never repeat each other in different words.
- Synthesis closer. The final sentence refers back to at least one classmate by name or by content, and either concedes a point, extends the classmate's reasoning, or proposes a compromise.
Weak responses tend to do the opposite on all three. They open with a long setup sentence that delays the stance to sentence three. They list reasons without examples. They close with a generic summary sentence such as "in conclusion, I agree" that ignores the classmates. None of these failure patterns destroys the response on its own, but the three together reliably cap a response at 4 even when the language is excellent.
Three failure patterns to avoid
The first failure pattern is stance drift, where the candidate starts by agreeing with one classmate, drifts into the language of the other, and finishes with a sentence that contradicts the opening. The second is reason repetition, where two reasons are stated as separate sentences but are actually the same idea in different words; the rubric reads this as a single reason padded out. The third is classmate erasure, where the response is well-organised and well-written but the candidates' posts are never referred to; the rubric treats this as failure to engage with the discussion format, which is the specific task being tested. Avoiding all three is a matter of planning, not of language.
Language use: the rubric's most forgiving dimension
Language use is the dimension candidates worry about most and the one the rubric is most forgiving on. The 1-to-6 scale does not require native-like fluency. It requires that the candidate's grammar and vocabulary are accurate enough to communicate ideas without distracting the rater, and that the range of sentence structures is wide enough to show control. A response written entirely in short, simple sentences can score 4 on language if the ideas are strong; the same response written in long, complex sentences with frequent errors can score 3 because the errors interrupt the rater's reading. Range matters, but accuracy matters more.
Practically, this means candidates should aim for a mix of sentence lengths rather than a single register. Two short sentences, a medium sentence, and a longer complex sentence, repeated across the response, produce a readable rhythm that signals control. Candidates who try to sound academic by using long noun phrases and rare vocabulary often introduce agreement errors and article errors that pull the language score down. In my experience the cleanest path to a 5 is to write in the register the candidate is most comfortable with, then add one or two slightly more complex structures where the candidate is sure the grammar is correct.
Vocabulary that helps without showing off
Some vocabulary moves pay off cheaply. Replacing "I think" with a stronger stance verb such as "I would argue," "the evidence suggests," or "in my view" raises the perceived confidence of the response without raising the difficulty of the grammar. Replacing "very important" with a more precise word such as "essential," "central," or "decisive" raises the perceived range of the vocabulary. Replacing "a lot of students" with a more academic phrase such as "a significant number of students" raises the perceived register. None of these moves is risky in terms of grammar, and together they lift a response from sounding conversational to sounding academic. They are the cheapest upgrades available inside a ten-minute budget.