IELts Speaking Part 2, commonly referred to as the long turn or cue card task, requires candidates to speak continuously for up to two minutes on a given topic. Unlike Part 1, which involves short exchanges guided by the examiner, and Part 3, which invites discussion at a conceptual level, Part 2 places the candidate in the role of primary speaker. The examiner's task shifts to active listening and assessment rather than questioning. This structural difference has direct consequences for how candidates should prepare their responses. Understanding the assessment criteria—specifically fluency and coherence, lexical resource, grammatical range and accuracy, and pronunciation—is essential for every candidate aiming for band 7 or above. This article presents a systematic four-strand framework for planning and delivering the long turn, equipping candidates with a repeatable method that aligns directly with what examiners mark.
Understanding the IELTS Speaking Part 2 format and assessment stakes
The cue card task presents candidates with a prompt that typically asks them to describe a person, place, object, experience, or activity. One minute is allocated for note-making before the candidate speaks for one to two minutes. The examiner does not intervene during the speaking turn unless the candidate falls silent for an extended period. The prompt always includes two or three sub-questions that anchor the description. For example, a cue card about a book might ask the candidate to name the book, describe its plot or content, explain why it was significant, and describe how it influenced the candidate's thinking.
The scoring stakes are significant because Part 2 accounts for a substantial portion of the speaking overall band. The four criteria are weighted equally, meaning that lexical sophistication alone cannot compensate for disjointed organisation. Equally, a well-organised but lexically limited response will plateau at a lower band. The challenge for candidates lies in simultaneously managing content organisation, language production, and time pressure within a single sustained turn. Most preparation programmes focus heavily on vocabulary and grammar, often neglecting the explicit teaching of discourse-level structuring skills that Part 2 demands.
The examiner assesses the response holistically across the four criteria. In Part 2 specifically, coherence is evaluated through the logical sequencing of ideas, the use of cohesive devices, and the completeness with which the candidate addresses all aspects of the cue card. A response that covers only one or two sub-questions will be penalised on coherence regardless of language quality. Similarly, a response that rambls without clear direction will score lower on fluency, which encompasses the pace, continuity, and purposeful progression of speech.
The four-strand framework: a systematic approach to planning and delivery
The most reliable method for achieving a high band in Part 2 is to adopt a four-strand framework. Each strand corresponds to a paragraph or thematic segment of the two-minute talk, and each is designed to address a specific dimension of the cue card prompt. This framework transforms the anxiety-inducing blank page of the one-minute preparation period into a structured note-taking exercise with predictable slots. The four strands are: introduction and key descriptor, supporting detail and example, significance or consequence, and conclusion or personal reflection.
The first strand—the introduction and key descriptor—occupies approximately 20 to 25 seconds of the response. In this strand, the candidate states the subject clearly and provides one or two immediate identifying details. If describing a person, the candidate might state the person's role and one defining characteristic. If describing a place, the candidate names it and establishes its nature. This strand answers the most fundamental question posed by the cue card and provides the listener with a clear anchor for everything that follows. The language here should be precise but not elaborate; the goal is clarity rather than impressiveness.
The second strand—supporting detail and example—fills approximately 50 to 60 seconds and forms the substantive core of the response. Here the candidate expands on the subject by describing attributes, events, or characteristics in greater depth. Sensory details, specific moments, and concrete examples elevate this strand from vague generalisation to vivid description. For instance, rather than stating that a city is beautiful, the candidate might describe the sound of rain on tin rooftops in a particular neighbourhood or the way light filtered through morning mist over a river. Concrete imagery is scored positively under lexical resource because it demonstrates range and precision rather than the repetition of generic adjectives.
The third strand—significance or consequence—requires the candidate to step back from description and analyse why the subject matters. This strand answers the implicit why question that underlies most cue card prompts. It might explore what the experience taught the candidate, how it changed their perspective, or what it revealed about the person or place being described. This strand distinguishes band 7 responses from band 6 responses because it demonstrates a capacity for reflection and evaluation, qualities that the speaking rubric explicitly rewards. Candidates who omit this strand risk presenting a purely descriptive response that lacks depth.
The fourth strand—conclusion or personal reflection—occupies approximately 20 to 30 seconds and provides a natural closing. This strand does not need to introduce new substantive content. Instead, it offers a brief personal statement or forward-looking comment that signals the end of the talk. Phrases such as "what I remember most," "that experience remains important to me," or "I hope to return someday" serve this function effectively. The examiner uses this signal to begin forming the assessment judgment, and a clean conclusion contributes positively to the fluency score by demonstrating purposeful speech organisation.
Effective note-taking within the one-minute preparation window
The one-minute preparation period is not a test of memory; it is a structured opportunity to map the response before speaking begins. Effective use of this window separates candidates who speak confidently for the full two minutes from those who exhaust their material within 40 seconds and resort to repetition or silence. The note-taking approach must be deliberate, abbreviated, and aligned with the four-strand framework.
Begin by identifying the key word or phrase in each sub-question on the cue card. Underline or circle the action verbs—describe, explain, discuss, mention—as these indicate the type of information required. Then allocate the four strands to the sub-questions. Not every sub-question requires its own strand; some sub-questions are answered within a single strand. For example, a cue card with three sub-questions might allocate the first two sub-questions to strand one, the third to strand two, and the significance element to strand three.
Notes should be written in abbreviated form using key words, not full sentences. The candidate's own memory and language competence will generate the full sentences during the speaking turn. Writing out complete sentences during preparation wastes time and creates a false sense of security, because the candidate will inevitably deviate from the scripted text once speaking begins. Instead, use single words or short phrases to trigger the relevant content. A cluster of three to four keywords per strand provides sufficient scaffolding without creating dependency on the notes.
Time management during preparation is critical. Allocate approximately 15 seconds to reading the cue card and identifying the sub-questions, 30 seconds to writing notes in the four-strand format, and 15 seconds to reviewing the notes mentally before speaking. Candidates who spend too long writing and too little time mentally rehearsing the overall shape of the response often lose direction mid-sentence. A brief mental run-through of the four strands—naming each strand silently to oneself—ensures continuity before the first word is spoken.
Coherence and cohesion: the discourse-level markers that examiners track
Coherence in IELTS Speaking Part 2 operates at two levels: the macro level of idea sequencing and the micro level of linguistic cohesion. Candidates who understand only one level risk producing responses that are locally fluent but globally disorganised. The examiner assesses coherence holistically, meaning that the overall arc of the response matters as much as the individual sentences within it.
At the macro level, coherence is achieved through the logical progression from the introduction of the subject to its significance and conclusion. Each strand should flow naturally into the next, and the listener should never be left wondering why a particular detail was introduced or where the response is heading. The four-strand framework provides this macro structure automatically, but candidates must consciously signal transitions between strands. Verbal markers such as "the thing that struck me most was," "another aspect worth mentioning," and "looking back, I think what mattered most was" alert the listener to a shift in direction and reinforce the sense of purposeful organisation.
At the micro level, cohesion is achieved through the use of reference, substitution, and linking words. Reference devices—pronouns such as he, she, it, and they—prevent unnecessary repetition of nouns and maintain the thread of the narrative. Substitution avoids repetition of verbs and adjectives. Linking words—because, although, since, which, and where—create complex sentences that demonstrate grammatical range while simultaneously connecting ideas. However, candidates must use linking words naturally rather than inserting them artificially. Overusing connectors such as "moreover" or "in addition" creates a mechanical rhythm that sounds prepared rather than spontaneous.
The Pronunciation criterion includes the assessment of discourse features such as pausing, stress, and intonation. Effective pausing—not hesitation, but deliberate silence at the end of a strand before transitioning to the next—signals planning and organisation. It is a positive feature that demonstrates the candidate is in control of the material. Intonation patterns that reflect the structure of the response, such as a slight rise at the beginning of a new strand and a settling tone at the conclusion, reinforce the sense of a well-shaped talk.