GMAT Critical Reasoning inference questions are the items where a short argument, a short stimulus, or a short dialogue is presented and the candidate is asked what follows from it. The phrasing varies — "which of the following can be logically drawn," "which must also be true," "the argument implies which of the following" — but the underlying demand is identical: extract a conclusion that is provable from the text, and reject anything that is merely consistent with it. On the GMAT Focus Edition's Verbal section these items sit alongside strengthen, weaken, assumption, evaluate, and plan questions, and they are the family most often misclassified by candidates who over-read or under-justify their choice. This article lays out a working method for inference stems: how to read the stimulus, what to test for, how to triage the five answer choices, and how to build a daily drill that turns inference items from a coin-flip into a steady source of points.
What the GMAT Focus Edition actually asks of an inference question
An inference question on the GMAT Focus Edition Verbal section is a question in which the correct answer is a statement that the stimulus logically compels. The candidate is not asked to agree with the argument, to attack it, to support it, or to fill a gap in it. The candidate is asked to identify a proposition that is guaranteed by what the stimulus already says. A useful working definition: an inference is something that is true in every possible world in which the stimulus is true. If a counter-example can be constructed in which the stimulus still holds but the proposed answer does not, the answer is not a valid inference.
This standard is stricter than everyday usage. In conversation, "infer" often means "suspect" or "guess with some basis." On the exam, an inference is a logical entailment, not a hunch. The difference is the source of most of the wrong answers. A choice that is plausible, that is consistent with the tone of the stimulus, that picks up vocabulary from it, and that an attentive reader would not be surprised to see — none of that matters. The only thing that matters is whether the stimulus forces the choice to be true.
Three structural facts about the GMAT Focus Edition shape how an inference item is built. First, the Verbal section is computer-adaptive at the section level, which means the difficulty of the inference questions the candidate sees is calibrated to early performance. Second, the question pool is finite and curated, so the same six or seven stem phrasings recur again and again; learning those phrasings is faster than learning to "think critically." Third, the test penalises guessing very lightly compared with older formats, so leaving a hard inference blank and spending the time on a stronger item downstream is often a correct call, but only when the candidate has actually triaged the hard item rather than panicked.
The four stem shapes you will meet
- Must be true / can be inferred / is implied. A statement that the stimulus guarantees. The correct answer is provable from the text alone.
- Must also be true / would be true if the argument were true. A weaker form of the same demand — the candidate accepts the argument as true and looks for a further consequence.
- Could be true / could be logically drawn. The trickiest family. The correct answer is one that is consistent with the stimulus but is not forced by it. The distractors are usually forced-but-wrong or contradicted-by-the-text.
- Must be false / which is most weakened by / which cannot be true. A negation-style inference. The candidate looks for a choice that the stimulus rules out.
For most candidates, the must-be-true stem is the cleanest, and the could-be-true stem is where points are dropped. Building comfort with both is the first tactical priority.
Reading the stimulus for inference, not for argument
Most GMAT Critical Reasoning stimuli are short — between 60 and 130 words in the Focus Edition — and the candidate has between 90 seconds and two minutes to handle the question. The first tactical decision is how to read the passage. Inference stimuli are not arguments in the classical sense. Some of them are arguments, some are dialogues, some are descriptions of a study, and some are explanations of a phenomenon. Trying to force every stimulus into a conclusion-plus-premises template wastes clock and produces a diagram that does not match the question's demand.
For an inference question, the reading task is narrower. The candidate needs to extract three things: the subject of the stimulus (a person, a group, a theory, a study, a market), the central claim or finding about that subject, and the limits on that claim. The limits are where most inference answers live. If a study of 200 corporate lawyers in a single city is described, the only inferences available are about those 200 lawyers in that city. Any answer that generalises to lawyers, to the city, or to white-collar workers is almost certainly a distractor. Reading for limits is the single highest-leverage habit a candidate can build.
A useful drill: after reading the stimulus, write down the narrowest possible restatement of the claim. Then look at the answer choices and ask which one is forced by that narrow claim. Choices that require the claim to be wider, deeper, or more general than the restatement are eliminable on sight. In my experience this single step eliminates two of the five choices on most inference items before the candidate has read the choices in full.
A worked micro-example
Stimulus: "A survey of 1,200 apartment dwellers in Capital City found that 58 per cent were dissatisfied with the speed of their internet service. Capital City has approximately 240,000 apartment dwellings." Inference stem: "which of the following can be logically drawn from the passage?" A high-scoring candidate does not jump to the choices. They note: the claim is about 1,200 surveyed apartment dwellers in one city; it is not about all apartment dwellers, not about Capital City residents generally, not about internet services outside apartments. A choice such as "Most apartment dwellers in Capital City are dissatisfied with their internet service" is unsupported because the survey is a sample, not a census. A choice such as "At least 600 of the surveyed apartment dwellers were dissatisfied" is provable because 58 per cent of 1,200 is 696, and at least 600 follows. That second choice is the kind of inference the test rewards.
The must-be-true test, applied in 60 seconds
The must-be-true test is the operational form of the inference definition. A statement is a valid inference if the candidate can construct a brief logical argument of the form "because the stimulus says X, the statement must be true." If the argument requires an additional premise, a probability claim, a generalisation, or a value judgement that the stimulus does not supply, the statement is not a valid inference. The test is mechanical and runs in roughly 30 to 45 seconds once internalised.
The mechanics are these. The candidate reads the proposed answer and asks four questions, in order. First, does the statement contradict the stimulus on any axis — quantitative, qualitative, temporal, or causal? If yes, eliminate. Second, does the statement require a generalisation that the stimulus does not authorise (a sample to a population, a single case to a category, a finding to a recommendation)? If yes, eliminate. Third, does the statement introduce a causal claim that the stimulus only describes as a correlation or a coincidence? If yes, eliminate. Fourth, is the statement provable from the stimulus alone, using only the most literal reading of the words? If yes, the statement survives; if the candidate has to interpret, soften, or extend a single word to make it work, the statement is suspect.
Step four is the most common failure point. Inference answers should be boring. They should read like a paraphrase of part of the stimulus with a small, almost arithmetic consequence drawn from it. The most attractive distractors are the answers that require a candidate to honour the spirit of the passage while bending a single word — "residents" becomes "citizens," "may" becomes "will," "some" becomes "most." These are exactly the choices the test uses to harvest points from candidates who are reading thoughtfully but not literally.
Common pitfalls and how to avoid them
- The plausible paraphrase. A choice that captures the spirit of the stimulus but adds a word the stimulus does not contain. Reject whenever the new word is doing real work.
- The sample-to-population slide. A choice that converts a survey, a study, or a single case into a universal claim. Reject unless the stimulus explicitly states that the sample is representative.
- The probability slide. A choice that converts a finding of "may," "could," or "is associated with" into a finding of "does" or "will." Reject unless the stimulus supplies the probability as a number.
- The reversed polarity. A choice that flips the direction of the stimulus — "more A than B" becomes "more B than A," "disagree" becomes "agree." Reject by reading the polarising word carefully.
- The out-of-scope import. A choice that names a new actor, location, or mechanism that the stimulus does not mention. Reject on first pass.
Could-be-true stems: the family that punishes over-reading
The could-be-true stem is, in my experience, the family where Verbal scores above 80 most often stall. The phrasing is the same as must-be-true in tone — "which of the following could be logically drawn," "which could be true" — but the demand is reversed. The correct answer is not forced; it is merely consistent. The candidate has to find the one choice that does not contradict the stimulus, and the four distractors include items that contradict it, items that go beyond it, and items that are forced but irrelevant.
The mental move is different. On a must-be-true stem, the candidate hunts for a choice that the stimulus guarantees. On a could-be-true stem, the candidate hunts for a choice that the stimulus does not rule out. The first half of the test is the same — contradictions and out-of-scope imports are still eliminated. The second half is where candidates lose the thread, because they start looking for the "best" answer instead of the "any-world-in-which" answer.
A practical way to handle this is the negative-search method. The candidate reads each of the four wrong choices first and asks: can I construct a brief scenario, consistent with the stimulus, in which this choice is false? If the candidate can construct such a scenario in under 10 seconds, the choice is eliminated. The remaining choice, which the candidate cannot rule out, is the answer. This sounds slow; in practice it runs faster than the positive search, because most distractors on a could-be-true stem are eliminable on a single contradiction or generalisation, and the negative method forces the candidate to name the contradiction explicitly.
Negative-search in practice
Stimulus: "Editorial: The city's new bicycle lane on Main Street has been in place for 18 months. During that time, accidents involving cyclists on Main Street fell by 22 per cent." Stem: "which of the following could be true?" A candidate using the negative search reads the four wrong choices and asks: can I build a small world, consistent with the editorial, in which this choice is false? For a choice such as "the new lane caused the reduction in accidents," the answer is yes — the editorial reports a correlation, not a cause, and a small world in which other factors drove the reduction is easy to construct. For a choice such as "the number of cyclists using Main Street increased during the 18-month period," the answer is also yes — the editorial does not mention ridership. For a choice such as "cyclists on Main Street now have the lowest accident rate of any road in the city," the answer is yes — the editorial says nothing about other roads. By the time the candidate reaches the final choice, the only one they cannot rule out — perhaps a choice that the rate of accidents per cyclist fell — is the answer.