The GMAT Data Insights section is the youngest part of the GMAT Focus Edition, and for most candidates reading this it is also the section where score gains arrive earliest — provided the right mistakes are corrected. Twenty questions, 45 minutes, a mix of Data Sufficiency, Multi-Source Reasoning, Table Analysis, Graphics Interpretation, Two-Part Analysis, and the odd Data Sufficiency variant built around a chart. The pattern of mistakes candidates make is unusually stable across cohorts, which is good news: a small number of recurring error patterns, once named, can be trained out within a focused three-week block.
This article walks through the recurring error patterns I see most often in candidate error logs, why each one survives even when topic knowledge is solid, and the tactical fix that closes the gap. The aim is to leave you with a working diagnostic: when a Data Insights mock comes back flat, you should be able to point at the specific fault line rather than re-doing the whole section in a blur.
The seven recurring error families on GMAT Data Insights
Most candidates sitting the GMAT Focus assume Data Insights errors come from unfamiliar chart types, weak statistics, or bad arithmetic. Those are real, but they account for under a third of the loss in a typical error log. The dominant loss comes from a small cluster of behavioural and methodological mistakes that survive intact across months of practice. Before drilling content, a serious candidate should map their own errors to the families below and rank them by frequency.
How to read your own error log against this taxonomy
Take the last 60 Data Insights questions you have attempted in timed conditions. Tag each error with one of the seven families. If a single family accounts for 40% or more of the loss, that is your first three-week project. In my experience this distribution rule works: candidates with engineering backgrounds tend to over-represent the "arithmetic slip" family, candidates from humanities or consulting backgrounds over-represent the "skim" family, and most retakers over-represent the "answer-the-wrong-question" family because they have trained themselves to move fast and not check the stem.
- Stem misread: answering the wrong question, including missing the word "EXCEPT", "most likely", "inferred", "could be true", or treating a question as numerical when it is logical.
- Data overload: treating every data point as load-bearing, when the chart is designed so that 70% of the visible numbers are decorative.
- Skim-and-snap: locking onto the first plausible answer before checking whether a second screen, a footnote, or a unit conversion contradicts it.
- Arithmetic slip: clean method, dirty execution — a percentage misapplied, a denominator inverted, a per-thousand ratio read as a percent.
- DS logic gap: treating the two statements as a single block rather than testing Statement 1 alone, then Statement 2 alone, then both.
- MSR tunnel vision: ignoring the secondary tab, the email, the exhibit on the second page — questions are designed to require two or three sources.
- Pacing panic: the 2:15 average per question collapses to under 90 seconds for the last four items, and two correct answers are lost in the final four minutes.
The next sections take each of these families apart, show a representative example, and describe the tactical fix that actually moves the score. None of the fixes are exotic — they are habits, not hacks.
Stem misread: answering a different question than the one asked
Stem misread is the single most expensive family in most candidate logs, and the one most likely to be invisible to the candidate themselves. The question is read, the chart is read, an answer is selected — but on review the candidate sees the question they thought they were answering, not the one in front of them. The word that does the damage is often small: EXCEPT, most likely, must be true, could be true, or a question mark that flips the polarity of the entire prompt.
Why the family survives practice
Most practice happens on autopilot. The candidate builds a habit of reading the first eight words of the stem, the chart, and the answer choices, then pattern-matching. Under time pressure this habit compounds: the eye skips over the operator that controls direction, and the brain answers the question it expected. The fix is not "read more carefully" — that is unteachable. The fix is to install a small physical habit: underline the operator of the question with the cursor or a finger, every time, for the first two weeks of practice. The mark forces a 200-millisecond pause that the brain uses to re-anchor.
What the fix looks like in practice
For a Table Analysis item that asks which row is least likely to show a margin contraction in the next period, the operator is least likely. Underline it, then read the table, then read the answer choices in full. The cost is roughly 8 seconds per question; the saving on a single misread is worth 90 to 120 seconds of total time and one correct answer. Across 20 questions the net is positive even on a tight 45-minute clock. For Multi-Source Reasoning prompts that end with a compound condition (for example, "if and only if revenue per active user is below the period median"), the operator is the compound condition, not the noun. Underline the whole conditional.
One diagnostic that catches this family cleanly: take ten untimed questions, and for each one write the operator in your own words before reading the chart. If your wording differs from the stem's actual meaning in more than one of the ten, the family is dominant and needs the underlining habit, not more practice questions.
Data overload: treating every cell as load-bearing
Data Insights questions are designed to be data-rich and question-thin. A typical Graphics Interpretation item presents four lines, two axes, ten data points per line, a legend, and a small footnote — and then asks one focused question that hinges on two of those numbers. Candidates who try to internalise the whole screen pay a triple penalty: time lost to reading, working memory cluttered, and a higher chance of mis-extracting the numbers that actually matter.
The triage rule
Before reading any data, read the question stem to completion and identify the variable being asked about, the unit, the time period, and the condition (greater than, less than, equal to, rank). Only then look at the chart, and look only for those four things. If the stem asks for the percent change in revenue between two periods for a specific segment, ignore the cost line, ignore the second axis, ignore the footnote about regional split. Find revenue, find the two periods, find the segment. Three reads, one calculation.
Worked example
Consider a Multi-Source Reasoning set with an email describing a pricing change, a table showing quarterly revenue for two product lines, and a chart showing conversion rates by channel. The question asks: By what percent did revenue per converted user change for Product Line A in the channel flagged in the email? The trap for the data-overload candidate is to read the email, the table, and the chart in full, then start computing. The triage candidate reads the stem, identifies the four anchors (Product Line A, revenue per converted user, the channel in the email, the change), and then opens the email only to identify the channel, then the table for revenue and conversions, then the chart only if a channel-level breakdown is needed. Two source-opens, one division, one subtraction, one percent change. The time saving is 60 to 90 seconds, and the error rate drops because each step depends on fewer remembered values.
| Source opened | What you look for | What you ignore |
|---|---|---|
| Email / prompt | Channel named in the email | Background narrative |
| Table | Revenue and converted users for Product Line A | Product Line B rows, cost columns |
| Chart | Channel-level conversion if not in the table | Other channels, time trends |
For most candidates reading this, the data-overload fix is the single highest-leverage change in the first two weeks, because it unlocks speed that the rest of the section depends on.
Skim-and-snap: locking onto the first plausible answer
Skim-and-snap is the cousin of stem misread, but with a different signature. The stem is read correctly, the chart is read correctly, the first answer choice that fits is selected, and the second, third, fourth, and fifth are not seriously evaluated. The cost is most visible on questions where the question writer has placed a near-miss distractor in choice A and a more precise answer in choice C or D. Choice A is selected because it is close enough to feel right.
Why skim-and-snap survives content review
Content review reinforces the pattern. When a candidate reviews a question they got right by snapping, the review confirms the snap was right. When they review a question they got wrong by snapping, they conclude they need more practice, not a habit change. The family is therefore self-concealing, which is why it tends to be the third or fourth most common family in a log rather than the first — candidates have to be told to look for it.
The "three-second rejection test"
For every wrong answer choice, force a written or spoken rejection reason of at least one clause. "Wrong because the unit is thousands, not millions." "Wrong because the question asks for a difference, not a ratio." "Wrong because the period is the wrong one." If you cannot produce a rejection reason in three seconds, the choice is not actually rejected — it is just not selected. Unselected wrong answers return on the next practice set. The test is cheap, and over a week of practice it converts skim-and-snap into a more deliberate elimination habit.
Arithmetic slip: clean method, dirty execution
Arithmetic slips are the family candidates are most willing to admit, because they look like bad luck. A percentage of 18 read as 81. A denominator of 240 read as 420. A unit conversion from millions to thousands missed by a factor of 1,000. The method was right; the execution was wrong. Under test conditions, an arithmetic slip costs the same as a content gap: one question, no points.