What determines the magnitude of attentional capture by deviant sound events? We combined the cross-modal oddballdistraction paradigm with sequence learning to address this question. Participants responded to visual targets, eachpreceded by tones that formed a repetitive cross-trial standard sequence. In Experiment 1, with the standard tone sequence…-660-440-660-880-… Hz, either the 440 Hz or the 880 Hz standard was occasionally replaced by one of two deviant tones(220 Hz and 1100 Hz), that either differed slightly (by 220 Hz) or markedly (by 660 Hz) from the replaced standard. InExperiment 2, with the standard tone sequence …-220-660-440-660-880-660-1100-… Hz, the 440 Hz and the 880 Hzstandard was occasionally replaced by either a 220 Hz or a 1100 Hz pattern deviant. In both experiments, a high-pitchdeviant was more captivating when it replaced a low-pitch standard, and a low-pitch deviant was more captivating when itreplaced a high-pitch standard. These results indicate that the magnitude of attentional capture by deviant sound eventsdepends on the discrepancy between the deviant event and the expected event, not on perceived local change.