PTE Academic Speaking

PTE Academic is a computer-based English language proficiency test widely accepted by universities, governments, and professional bodies across the globe. Its Speaking section presents candidates with a range of task types, each placing distinct demands on working memory, oral production, and time management. Two of the most cognitively demanding items in the Speaking module are Repeat Sentence and Describe Image. While both require verbal output and are scored on pronunciation, fluency, and content, the mental operations they demand differ substantially. Understanding these differences is not an academic exercise — it is a practical preparation strategy that allows candidates to allocate attention and rehearsal time where they will yield the greatest score improvement.

What makes Repeat Sentence and Describe Image cognitively different

Repeat Sentence presents candidates with a recorded sentence, typically between three and nine seconds in length, which must be reproduced immediately after the audio ends. The task taps into auditory-verbal working memory — the capacity to encode acoustic information, hold it temporarily, and retrieve it for oral reproduction. Describe Image, by contrast, requires candidates to view a static image on screen and produce a spoken summary within 25 seconds. Here, the cognitive load shifts from auditory encoding to visual processing, lexical retrieval, and structured oral composition.

In Repeat Sentence, the content is fully supplied by the stimulus. Candidates do not choose what to say; they must reproduce it with high fidelity. The challenge lies in the brevity of the listening window and the speed at which encoding must occur. In Describe Image, the challenge is the inverse: the candidate must generate content from a visual prompt, organise it into a coherent narrative, and deliver it fluently — all within a constrained time window. These are fundamentally different cognitive operations, and approaching them with the same strategy is a common error.

The table below summarises the primary cognitive demands of each task type.

Dimension	Repeat Sentence	Describe Image
Primary stimulus	Auditory (recorded sentence)	Visual (static image)
Cognitive operation	Encoding, storage, retrieval, reproduction	Visual parsing, content generation, organisation, production
Content source	Provided by audio stimulus	Must be generated by candidate
Time to prepare	None — immediate reproduction required	25 seconds preparation + 40 seconds response
Working memory demand	High (auditory buffer)	Moderate to high (generative planning)

The three scoring pillars: pronunciation, fluency, and content

PTE Academic employs automated scoring across three integrated dimensions for Speaking tasks: pronunciation, oral fluency, and content. Each contributes a weighted portion to the overall Speaking score, and understanding how they interact is essential for targeted preparation.

Pronunciation is scored by the algorithm's analysis of vowel and consonant production, stress patterns, and intonation contour. A score of 90 or above indicates near-native production; a score below 50 suggests significant deviation from expected acoustic models. Candidates whose first language has a substantially different phonemic inventory — for example, tonal languages or languages with distinct consonant clusters — should prioritise targeted pronunciation drilling.

Oral fluency measures the smoothness, rhythm, and natural pacing of speech. Self-corrections, repetitions, false starts, and prolonged pauses all reduce the fluency score. The scoring algorithm rewards continuous, unhurried speech that mirrors natural English prosody. In Describe Image, maintaining fluency is particularly challenging because candidates must simultaneously generate and articulate content under time pressure.

Content in Repeat Sentence is assessed against the original stimulus — the closer the reproduction to the source, the higher the content score. In Describe Image, content is evaluated against a checklist of key elements present in the image. Missing significant elements incurs content penalties, while irrelevant additions do not contribute positively.

Memory encoding strategies for Repeat Sentence

Since Repeat Sentence places its primary cognitive load on working memory, candidates benefit from understanding how auditory encoding operates and how to optimise it during the three-to-nine-second listening window.

The first principle is holistic listening. Attempting to transcribe the sentence mentally word by word during playback fragments attention and reduces encoding efficiency. Instead, candidates should aim to perceive the sentence as a prosodic unit — noting its rhythm, stress patterns, and intonation contour alongside the lexical content. English sentences carry meaning not only through word choice but also through stress and phrasing. The sentence "She decided to leave early" and "She decided to LEAVE early" place different emphases, and capturing this prosodic information supports more accurate reproduction.

The second principle is immediate chunking. Once the audio ends, candidates have no additional time before the recording starts. Effective candidates rehearse the sentence silently during the brief pause between the audio ending and the recording prompt. This covert rehearsal leverages the phonological loop component of working memory to maintain the acoustic trace until production begins.

A practical exercise involves shadowing practice — listening to English audio (podcasts, news broadcasts, academic lectures) and repeating sentences aloud immediately after hearing them. This trains the auditory-motor connection that Repeat Sentence demands, building automaticity in the encoding-to-production pipeline.

Common pitfalls in Repeat Sentence and how to avoid them

One of the most frequent errors is prioritising accuracy over fluency. Candidates who pause mid-sentence to correct themselves sacrifice fluency points that often outweigh the marginal content gain. The scoring algorithm penalises interruptions more heavily than minor lexical substitutions, provided the overall meaning is preserved. A candidate who says "The graph shows the relationship between supply and demand in the year twenty-twenty" instead of the exact "between supply and demand in 2020" loses negligible content credit but far more from the hesitation that accompanies the correction.

Another pitfall is starting the response before the recording indicator appears. The system requires the microphone to be active during scoring. Speaking too early and then pausing to wait for the indicator wastes the initial portion of the response, which the algorithm cannot capture.

Structured output frameworks for Describe Image

Describe Image presents a different cognitive challenge: content generation under time pressure. Candidates must scan the image, identify key elements, organise them into a logical sequence, and articulate the description within 40 seconds. Without a structured framework, candidates risk rambling, missing critical elements, or running out of time before covering the image adequately.

The most effective framework for Describe Image is a four-part structure: identify the type of image, describe the main element, note key details and trends, and offer a brief conclusion or implication. This structure is applicable across the range of image types — bar charts, line graphs, pie charts, maps, photographs, and process diagrams — though the specific content of each section varies.

For a bar chart, the candidate identifies the chart type, names the axes and the variables being compared, describes the most notable bar(s), notes any trends or anomalies, and concludes with the overall takeaway. For a line graph, the candidate identifies the trend direction, notes any peak, trough, or inflection points, mentions specific values if they are clearly visible, and summarises the pattern. For a pie chart, the candidate identifies the segments, highlights the largest and smallest, and draws a comparative observation. For a process diagram, the candidate describes the starting point, the sequence of stages, the end point, and any notable features of the process.

The 25-second preparation window should be used to mentally run through this framework twice: once to identify the image type and key elements, and a second time to plan the specific vocabulary and phrasing. Candidates should avoid writing notes on the erasable booklet during this phase — the cognitive cost of handwriting diverts attention from visual scanning and planning.

Microphone technique and acoustic considerations

PTE Academic uses a head-mounted microphone to capture spoken responses. Microphone technique is an often-overlooked factor in scoring, yet acoustic quality directly influences the algorithm's ability to evaluate pronunciation and fluency accurately.

The microphone should be positioned approximately two to three centimetres from the corner of the mouth, angled slightly downward. This placement captures the direct sound of articulation while minimising breath noise and plosive bursts that can overload the input signal. Candidates who position the microphone directly in front of the mouth risk capturing excessive airflow on stops such as /p/, /t/, and /k/.

Volume and clarity are more important than pitch or pace. The algorithm is trained on a wide range of speaker characteristics, but a response recorded at very low volume may fall below the signal threshold for reliable analysis. Candidates should speak at a comfortable, conversational volume — slightly louder than normal conversational speech in a quiet room.

Room acoustics can also influence the signal. Test centres are designed to minimise reverberation, but candidates should be aware that speaking too loudly in a small testing booth can introduce echo effects. Consistent, moderate volume with the microphone at the recommended distance provides the cleanest input.

Managing time pressure across both task types

Time pressure operates differently in Repeat Sentence and Describe Image, and effective pacing strategies must account for these differences.

In Repeat Sentence, the time is externally controlled — the audio duration determines the listening window, and the recording begins automatically. The candidate's control is limited to the quality of encoding and the smoothness of production. Preparation strategy for Repeat Sentence therefore focuses on speed of encoding and reduction of production hesitations, rather than on time management per se.

In Describe Image, the candidate has a preparation phase of 25 seconds and a response window of 40 seconds. Within the 25-second preparation phase, experienced candidates allocate approximately 10 seconds to visual scanning and element identification, 10 seconds to framework selection and mental drafting, and 5 seconds to final vocabulary planning. This allocation prevents over-planning, which wastes preparation time, and under-planning, which leads to disorganised output.

Within the 40-second response window, a useful guideline is to aim for continuous speech of 30 to 35 seconds, leaving a natural pause at the end. Responses that end abruptly at 20 seconds rarely score well on content because they have not had sufficient time to cover the image comprehensively. Responses that extend beyond 40 seconds are cut off by the system, which truncates the final words and disrupts the fluency score.

Building a targeted preparation programme

Effective preparation for Repeat Sentence and Describe Image requires a structured approach that addresses each task's specific demands. The following programme components are recommended for candidates at the intermediate-to-advanced level seeking to improve their Speaking scores.

Baseline assessment — Complete a full-length PTE Academic practice test to establish current scores in pronunciation, fluency, and content for each Speaking task type. This identifies which dimension is the primary constraint.
Targeted drilling — Isolate the weakest task type and spend focused practice sessions exclusively on it. For Repeat Sentence, use shadowing exercises with graded difficulty. For Describe Image, practise the four-part framework with a wide variety of image types, including those outside the candidate's familiar subject area.
Feedback loops — Use PTE Academic official practice materials or reputable third-party platforms that provide automated scoring. Compare scores across multiple attempts to track improvement in each dimension.
Pacing simulation — Practise under timed conditions that mirror the actual test, including the transition between tasks. This trains the cognitive switching cost between auditory processing and visual generation.
Pronunciation mapping — Identify the specific phonemic contrasts that differ between your first language and English. Use minimal pair exercises and IPA reference charts to systematically address persistent errors.

Conclusion and next steps

Repeat Sentence and Describe Image are both integral components of the PTE Academic Speaking section, yet they place markedly different cognitive demands on candidates. Repeat Sentence challenges auditory-verbal working memory and immediate oral reproduction accuracy, while Describe Image tests visual parsing, generative planning, and structured oral composition under time pressure. By understanding the distinct cognitive architecture of each task, candidates can adopt targeted strategies — holistic listening and covert rehearsal for Repeat Sentence, a four-part descriptive framework and disciplined time allocation for Describe Image — rather than applying generic speaking advice to both.

Preparation should be approached systematically: establish a baseline score, identify the binding constraint in pronunciation, fluency, or content, and design a drilling programme that isolates and strengthens the specific skill in question. Consistent practice under realistic timing conditions, combined with attention to microphone technique and acoustic factors, will yield more reliable score improvements than unfocused repetition.

TestPrep's complimentary diagnostic assessment offers a natural starting point for candidates seeking a sharper preparation plan tailored to their current profile and target score.

Frequently asked questions

What is the difference in scoring weight between pronunciation, fluency, and content in PTE Academic Repeat Sentence?

In Repeat Sentence, all three scoring dimensions — pronunciation, oral fluency, and content — contribute to the final Speaking score. Content is evaluated against the original stimulus, meaning the closer your reproduction is to the source sentence, the higher your content score. Pronunciation and fluency function as independent scorers, and their combined contribution often outweighs a minor content reduction. This means that even if you miss a small detail, maintaining smooth delivery and clear pronunciation is strategically important.

How should I allocate the 25-second preparation time in PTE Academic Describe Image?

A practical allocation is approximately 10 seconds for initial visual scanning and element identification, 10 seconds for selecting and customising the four-part framework (type identification, main element, key details, conclusion), and 5 seconds for final vocabulary planning. This approach prevents both over-planning, which wastes preparation time, and under-planning, which leads to disorganised or incomplete responses during the 40-second speaking window.

Does speaking faster improve my fluency score in PTE Academic Describe Image?

No. Oral fluency in PTE Academic is not measured by words per minute but by the smoothness and natural rhythm of speech. Self-corrections, repetitions, false starts, and prolonged pauses all reduce the fluency score. The optimal pace is a natural, unhurried conversational speed — slightly faster than relaxed everyday speech — delivered continuously without interruptions. Speed alone without smoothness produces a choppy acoustic profile that scores poorly.

How does the PTE Academic microphone position affect my speaking score?

The microphone should be positioned approximately two to three centimetres from the corner of your mouth, angled slightly downward. This placement captures articulate speech clearly while minimising breath noise and plosive bursts. Speaking too softly can push your signal below the algorithm's reliable detection threshold, potentially reducing accuracy in both pronunciation and fluency scoring. Consistent, moderate volume at the recommended distance provides the cleanest acoustic input for automated evaluation.

Can I use the same preparation strategy for Repeat Sentence and Describe Image in PTE Academic?

While both are Speaking tasks scored on pronunciation and fluency, Repeat Sentence and Describe Image require different cognitive operations and should be prepared separately. Repeat Sentence benefits from auditory encoding drills such as shadowing, while Describe Image requires structured output practice using a consistent descriptive framework across different image types. Applying a single strategy to both tasks is a common error that limits score improvement in one or both areas.

PTE Academic Speaking: attention allocation for Repeat Sentence and