PTE Academic Speaking: Repeat Sentence & Describe Image

PTE Academic Speaking evaluates a candidate's ability to produce spoken English across six item types, each with distinct input conditions and response demands. Among these, Repeat Sentence and Describe Image occupy opposite poles of the speaking assessment spectrum: the former requires accurate auditory reproduction within a tight time window, while the latter demands sustained, structured oral output drawn from visual input under time pressure. Understanding how the three scoring dimensions — content, fluency, and pronunciation — operate across these two tasks is the foundation of effective preparation.

The three scoring dimensions in PTE Academic Speaking

Before examining individual item types, it is essential to establish how the automated scoring engine evaluates every spoken response in the PTE Academic Speaking section. Each response is assessed simultaneously across three dimensions, each contributing a portion of the total marks awarded for that item.

The content dimension measures how accurately and comprehensively a response covers the required subject matter. For Repeat Sentence, content refers to reproducing the full sentence that was heard. For Describe Image, content refers to naming the main elements visible in the image. In both cases, the content score is not a simple binary pass-or-fail — partial credit may be available when certain elements are captured but others are omitted.

The fluency dimension evaluates the rhythm and continuity of speech delivery. It is not a simple word-per-minute count. Instead, the scoring engine detects hesitation markers such as false starts, repetitions, and elongated pauses, and assigns a score that reflects the overall smoothness of the response. A measured, unhurried pace that avoids these markers generally scores well.

The pronunciation dimension operates on a scale of 0 to 90 and reflects how closely a candidate's vowel sounds, consonant precision, and word-level stress patterns align with those of standard Australian, British, or American English. The engine analyses acoustic features rather than lexical complexity, meaning vocabulary depth does not directly influence this dimension.

These three dimensions apply to all six PTE Academic Speaking item types, but their relative importance shifts considerably depending on the task type. The sections that follow examine how this plays out in Repeat Sentence and Describe Image.

Repeat Sentence: listening, memory, and faithful reproduction

Repeat Sentence is arguably the most demanding item type in the PTE Academic Speaking section because it combines receptive listening comprehension with productive oral output in a single seamless task. You hear a sentence of between 3 and 20 words, and you must reproduce it as faithfully as possible within a time window of 3 to 9 seconds depending on the length of the sentence.

The challenge is that the audio plays only once. There is no rewind, no pause, and no second chance. You must process, hold in auditory memory, and reproduce the sentence with accuracy, natural delivery, and clear pronunciation — all within the same tight window.

The scoring breakdown for Repeat Sentence reflects three simultaneous assessments: content accuracy (how much of the sentence you reproduce), fluency (how smoothly you deliver it), and pronunciation (how clearly your speech is parsed by the automated scoring engine). The content dimension is binary in effect: reproducing the complete sentence earns full content marks, while omitting or altering key words reduces that score proportionally.

A critical update to the PTE Academic scoring model means that content now carries the largest single share of the Repeat Sentence score — candidates who produce an accurate, complete sentence while maintaining good fluency and pronunciation achieve the highest overall marks. Those who sacrifice content accuracy for delivery elegance tend to score lower. This has direct implications for preparation strategy.

The listening phase: capturing the full sentence before you speak

Success in Repeat Sentence begins with the listening phase. When the audio begins, your priority is to hear the complete sentence without interruption. You need to identify the subject, the verb, and any modifiers or objects. You cannot write anything down. Everything must be held in auditory memory.

Short sentences of fewer than 8 words demand particular attention to precision. One missed word constitutes a large proportion of the total content and will reduce the content score significantly. Sentences of 8 to 15 words offer a slightly more forgiving listening window but still carry a risk of missing the sentence-ending element. For these mid-length sentences, it helps to anchor key words using rhythmic markers — a mental beat pattern that helps you retain the sequence of words.

Sentences longer than 15 words require chunking. Break the sentence into groups of 4 to 6 words and identify the main subject and verb first, then layer in the descriptors. This reduces the cognitive load of retaining a long string of information in short-term memory.

The speaking phase: close imitation without overthinking

When the audio ends, you begin speaking immediately. The goal is faithful reproduction — close imitation of what was heard, not paraphrase. You do not add transitional phrases, substitute synonyms, or rephrase the sentence for stylistic reasons. You repeat what was said.

Maintain a steady speaking pace. Do not rush to finish early. A natural phrase-level rhythm with appropriate word-level stress is preferable to accelerated delivery that sounds forced. If you reach a point of hesitation mid-sentence, pause silently rather than inserting filler words such as "uh" or "um" — the algorithm interprets these as interruption signals and they are treated as fluency penalties.

Consciously leaving a small buffer of approximately 10 percent of the available time at the end of your response helps ensure you have finished naturally without cutting the sentence short. This does not mean speaking slowly on purpose; it means maintaining a pace that allows you to complete the sentence without rushing the final words.

Fluency in Repeat Sentence: what the scoring engine actually measures

The fluency dimension of Repeat Sentence is scored by the engine on the basis of hesitation and continuation patterns. A common misconception is that speaking faster earns a higher fluency score. The opposite is often true. The algorithm detects unnatural acceleration — fast, clipped delivery that breaks phrase-level grouping — and rewards consistent, unhurried speech with clear phrase boundaries.

Develop a speaking rhythm that prioritises smooth phrasing over speed. Read short sentences aloud at a comfortable pace, focusing on connecting words within a phrase while taking brief pauses at natural phrase boundaries. This builds the muscle memory for phrase-level fluency without sacrificing content accuracy.

Pronunciation: vowels, consonants, and stress patterns

The pronunciation dimension is scored on a 0-to-90 scale and reflects the acoustic characteristics of your spoken output. The engine evaluates vowel sounds, consonants, and word-level stress. Non-native speakers often struggle with vowel sounds that differ significantly from their first language — for example, confusing the short /e/ in "red" with the long /eː/ in "read," or failing to distinguish the schwa /ə/ in unstressed syllables. Minimal pair drills — practising pairs of words that differ by a single sound — are among the most targeted exercises for improving this dimension.

Consonant clarity is equally important, particularly at the ends of words. Weak final consonant voicing — pronouncing "bag" as "ba" — reduces the engine's ability to parse your final syllable correctly. Focused practice on final consonant sounds improves clarity and contributes directly to a higher pronunciation score.

Describe Image: structure, content coverage, and time management

Describe Image presents a different challenge. You are shown an image — a bar chart, line graph, pie chart, table, map, diagram, or photograph — and you have 25 seconds to study it before the recording window opens. Once it opens, you have 40 seconds to describe the image in your own words.

Unlike Repeat Sentence, where the input is auditory and the output is fixed, Describe Image requires you to process visual information, select the most relevant elements, organise a logical verbal response, and deliver it within a fixed time window — all while the clock is running.

Content in Describe Image: identifying and naming the main elements

The content dimension in Describe Image is awarded based on how many of the main image elements you identify and name. The scoring guide identifies a set of content points for each image — typically the title, key figures or categories, trend direction, and significant comparative relationships or numerical values shown in any axis labels or legends.

During the 25-second preparation window, your goal is to identify and mentally map all significant content points. Scan the image in a consistent order: first identify the title or overall topic, then read the axis labels and legend, then locate the highest and lowest values, then identify any trends or comparative patterns. This systematic scan ensures you do not overlook any major content element.

When the recording window opens, you begin describing these elements in a clear, organised sequence. Cover the title or overall topic first, then describe the key figures and categories, then explain the trend or relationship shown, and finish with a conclusion about what the image demonstrates. This arc — from identification to analysis to conclusion — gives your response a logical structure that covers all content points.

Fluency and pronunciation in Describe Image: plain language is sufficient

Fluency and pronunciation scoring in Describe Image follow the same engine-based criteria as all other speaking items. Strong delivery — natural, unhurried, without hesitation markers — earns high fluency marks. Pronunciation is assessed using the same acoustic criteria.

One important point: the scoring guide does not reward vocabulary complexity in Describe Image. Plain, clear language that accurately describes the image scores equally well on the fluency and pronunciation dimensions as more sophisticated vocabulary. The emphasis is on coverage and clarity, not lexical sophistication. Focus your preparation on structuring a complete response rather than on acquiring elaborate descriptive vocabulary.

Common pitfalls and how to avoid them

Despite the apparent simplicity of both item types, candidates consistently make errors that significantly reduce their scores. Understanding these pitfalls before they become habits is far more effective than correcting them during preparation.

Pitfall 1: No structured plan for Describe Image. The most common error in Describe Image is starting to speak without a clear mental structure, which leads to repetition, circular descriptions, and awkward pauses that waste the 40-second response window. The remedy is to spend the entire 25-second preparation window mapping out what you will say. Identify the type of image, name its main elements, note the trend or key relationship, and plan a brief concluding statement. By the time the recording starts, you should have a verbal roadmap ready to execute.

Pitfall 2: Treating the template as a script. Some candidates learn a fixed sequence of words and repeat it verbatim for every Describe Image. This produces a recognisable template response that the scoring engine may treat as lacking genuine content coverage. More importantly, it scores poorly on the fluency dimension because the delivery sounds mechanical. The correct approach is to use a structural template — an organisational framework rather than a word-for-word script — and fill it with specific content drawn from each individual image.

Pitfall 3: Freezing on dense or unfamiliar images. When an image is visually complex or contains unfamiliar subject matter, some candidates produce very brief responses because they feel they lack the vocabulary to describe it. The scoring system rewards content coverage, not vocabulary complexity. If you can see numbers, state them. If you can see an arrow indicating an upward trend, say so. The algorithm rewards content coverage — plain language that accurately maps the image scores more points than complex vocabulary used imprecisely.

Pitfall 4: Over-relying on speed in Repeat Sentence. Candidates who speak quickly to finish early tend to sacrifice pronunciation clarity and increase their hesitation marker count. The scoring engine detects unnatural acceleration and treats it as a fluency penalty. There is no advantage to finishing ahead of the time limit. A measured pace that completes the sentence clearly and naturally produces a higher combined score across all three dimensions.

A focused four-week study programme for speaking item types

Building a structured preparation programme that targets both the specific task demands of Repeat Sentence and Describe Image alongside general oral fluency development produces better results than practising task types in isolation. The following framework is designed to be implemented over four weeks with consistent daily practice.

For Repeat Sentence, allocate 10 to 15 items per day. Begin with audio materials that contain sentences of moderate length, spoken at normal conversational pace. Podcasts and academic lecture recordings are effective sources. Play each sentence once, reproduce it immediately, then compare your response against the original. Use a recording device to track your content accuracy and fluency patterns over time.

For Describe Image, conduct 5 to 8 practice items per session. Study the image during the full 25-second preparation window, then deliver your response within the 40-second window. Review each response by checking whether your structure covered the title, main figures, trend, and conclusion. Refine your structural template based on what you observe across sessions.

In parallel with these task-specific sessions, integrate daily general oral fluency practice: shadowing exercises where you repeat audio immediately after hearing it, minimal pair drills targeting your specific pronunciation challenges, and sustained read-alouds of academic texts at a measured pace. This builds the underlying fluency and pronunciation capacity that supports strong performance across all speaking item types.

After four weeks of this combined approach, candidates typically report that their fluency patterns feel more natural, their content strategies for each task type are automated, and their Describe Image responses cover the required elements consistently within the 40-second window. The key is consistent daily practice combined with task-specific focus rather than generic speaking exercises.

Next steps

The strategies outlined above reflect general principles of PTE Academic Speaking assessment. Scores and item formats are updated periodically by the test administrator, and candidates are advised to consult the official PTE Academic scoring guide for the most current information before their test date.

Approaching each speaking item type with a clear understanding of the specific skills being assessed, and building a targeted preparation plan that develops both task-specific competence and underlying oral fluency, positions candidates for strong performance across the full PTE Academic Speaking section. TestPrep's complimentary diagnostic assessment offers a practical starting point for candidates who wish to evaluate their current performance and develop a sharper preparation plan.

Frequently asked questions

How does the PTE Academic Speaking scoring system evaluate my response?

Every spoken response in the PTE Academic Speaking section is assessed simultaneously across three dimensions: content, fluency, and pronunciation. The content dimension measures how accurately and comprehensively the response covers the required subject matter. The fluency dimension evaluates the rhythm and continuity of speech delivery, penalising hesitation markers such as false starts, repetitions, and elongated pauses. The pronunciation dimension operates on a 0-to-90 scale, reflecting how closely the candidate's vowel sounds, consonant clarity, and word-level stress align with standard English acoustic norms. The relative weight of each dimension varies by item type.

Can strong fluency and pronunciation compensate for weaker content in Repeat Sentence?

Not fully. Following the 2023 scoring update, content accuracy carries the largest single share of the Repeat Sentence score. While fluency and pronunciation contribute to the overall mark, producing a response that is fluent and well-pronounced but omits or distorts significant portions of the original sentence will result in a lower total score. Effective preparation must therefore prioritise faithful reproduction of the full sentence while simultaneously maintaining good delivery quality.

How did the 2023 PTE Academic scoring update affect Repeat Sentence?

The 2023 update to the PTE Academic scoring model increased the relative weight of the content dimension across speaking item types, with the change most visibly affecting Repeat Sentence. Content accuracy now represents the dominant share of the item score, meaning that reproducing the complete sentence accurately is the primary determinant of performance. This update shifted preparation priorities away from pure delivery elegance and towards a balanced approach that values content accuracy alongside fluency and pronunciation.

What should I do when Describe Image shows a complex or unfamiliar image type?

When faced with a dense or unfamiliar image, candidates should focus on describing what is visually present using clear, plain language rather than searching for specialist vocabulary. The scoring system rewards content coverage — naming the main elements, stating values, and identifying trends — rather than lexical complexity. A structured template that covers the title, main figures, trend, and conclusion ensures a complete response regardless of the image type. With practice, the 25-second preparation window becomes sufficient to map all significant content points before the recording begins.

Is a faster speaking pace rewarded in the PTE Academic Speaking fluency dimension?

No. A common misconception is that speaking faster earns a higher fluency score. The automated scoring engine detects hesitation markers and unnatural acceleration patterns, and rewards consistent, unhurried delivery with clear phrase-level boundaries. Measured speech at a comfortable pace generally scores better than accelerated delivery that sounds forced or choppy. For Describe Image, the 40-second response window is long enough to deliver a complete, well-structured response at a natural pace without rushing.

Repeat Sentence vs Describe Image: PTE Speaking Scoring