Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Cost of ungrammatical predictions during online sentence processing: evidence against surprisal

View through CrossRef
The surprisal metric (Hale, 2001; Levy, 2008) successfully predicts syntactic complexity in a large number of online studies (e.g., Demberg and Keller, 2009; Levy and Keller, 2013). Surprisal assumes a probabilistic grammar that drives the expectation of upcoming linguistic material. Consequently, wrong predictions lead to a processing cost, presumably due to reranking related computations (Levy, 2013). Critically, surprisal assumes that the predicted parses generated by the probabilistic grammar are grammatical. However, it has been found that syntactic predictions can be ungrammatical (e.g., Apurva & Husain, 2018). Consequently, similar to reranking costs incurred due to incorrect (grammatical) predictions, a cost should also appear for ungrammatical predictions. Evidence for such a cost during comprehension will not be explained by the surprisal metric. To test the ecological validity of the surprisal metric, it becomes critical to investigate if ungrammatical predictions incur a cost. In this study, we investigate this issue in Hindi (a verb-final language) using a cloze task followed by a self-paced reading (SPR) study. All analyses were carried out in R using linear mixed models. Log RTs (reading time) were used for the RT analyses. In the cloze study (N=30), participants were asked to complete the sentences (such as 1a, 1b) meaningfully using the SPR paradigm. The two conditions differed in the case markers on the three nouns. 12 sets of experimental items along with 64 fillers were used. Participants’ responses were coded for the predicted verb class and the overall grammaticality of the completion (grammatical prediction vs ungrammatical prediction). 1a. hari-ne geeta-se umesh-ko…. Hari-ERG Geeta=ABL Umesh=ACC. 1b. hari-ko geeta-ne umesh-ko …. Hari-ACC Geeta-ERG Umesh-ACC. Grammaticality analysis of the completion data showed that participants make more ungrammatical completions in conditions (b) compared to (a) (z=5.25). The overall grammatical completions in condition (a) was 96% while in (b) it was 60%. In addition, the verb class analysis showed that in both conditions participants completed the sentences with a transitive non-finite verb followed by a ditransitive matrix verb (hereafter T.NF-DT.M) most frequently. T.NF-DT.M were predicted in 33% instance in condition (a) and 34% in condition (b) (z=0.18). Given the similar cloze probabilities, the surprisal metric will predict no difference in RT at T.NF-DT.M in the two conditions during online processing (cloze probabilities can be used to compute surprisal, see Levy and Keller, 2013). If the RTs at T.NF-DT.M in condition (a) is less than (b) that would be better explained by the higher cost due to the ungrammatical prediction. To ascertain this, we conducted an SPR study (n=50) using items similar to the ones used in the previous experiment (see, 2a and 2b). The critical region was T.NF-DT.M. 24 set of items along with 72 fillers were constructed. 2a hari-ne geeta-se umesh-ko milne ko kaha, Hari-ERG Geeta=ABL Umesh=ACC meet-inf(T.NF) told(DT.M) 2b hari-ko geeta-ne umesh-ko milne ko kaha , ... Hari-ACC Geeta=ERG Umesh=ACC meet-inf(T.NF) told(DT.M) While the prediction of T.NF-DT.M is the same in the two conditions, % ungrammatical predictions are more in (b) vs (a). Results show that the RT in (a) < (b) at the critical region (t=2.32). This goes against the surprisal metric and shows the cost incurred due to ungrammatical predictions. Our work establishes that the cost of ungrammatical predictions indeed appears during online processing. This processing cost is not predicted by a metric like surprisal and highlights its limitations. This study also provides evidence against the robust predictions in head-final languages. It suggests that the prediction mechanism in such languages is more nuanced and points to the need to study the nature of ungrammatical predictions during processing.
Center for Open Science
Title: Cost of ungrammatical predictions during online sentence processing: evidence against surprisal
Description:
The surprisal metric (Hale, 2001; Levy, 2008) successfully predicts syntactic complexity in a large number of online studies (e.
g.
, Demberg and Keller, 2009; Levy and Keller, 2013).
Surprisal assumes a probabilistic grammar that drives the expectation of upcoming linguistic material.
Consequently, wrong predictions lead to a processing cost, presumably due to reranking related computations (Levy, 2013).
Critically, surprisal assumes that the predicted parses generated by the probabilistic grammar are grammatical.
However, it has been found that syntactic predictions can be ungrammatical (e.
g.
, Apurva & Husain, 2018).
Consequently, similar to reranking costs incurred due to incorrect (grammatical) predictions, a cost should also appear for ungrammatical predictions.
Evidence for such a cost during comprehension will not be explained by the surprisal metric.
To test the ecological validity of the surprisal metric, it becomes critical to investigate if ungrammatical predictions incur a cost.
In this study, we investigate this issue in Hindi (a verb-final language) using a cloze task followed by a self-paced reading (SPR) study.
All analyses were carried out in R using linear mixed models.
Log RTs (reading time) were used for the RT analyses.
In the cloze study (N=30), participants were asked to complete the sentences (such as 1a, 1b) meaningfully using the SPR paradigm.
The two conditions differed in the case markers on the three nouns.
12 sets of experimental items along with 64 fillers were used.
Participants’ responses were coded for the predicted verb class and the overall grammaticality of the completion (grammatical prediction vs ungrammatical prediction).
1a.
hari-ne geeta-se umesh-ko….
Hari-ERG Geeta=ABL Umesh=ACC.
1b.
hari-ko geeta-ne umesh-ko ….
Hari-ACC Geeta-ERG Umesh-ACC.
Grammaticality analysis of the completion data showed that participants make more ungrammatical completions in conditions (b) compared to (a) (z=5.
25).
The overall grammatical completions in condition (a) was 96% while in (b) it was 60%.
In addition, the verb class analysis showed that in both conditions participants completed the sentences with a transitive non-finite verb followed by a ditransitive matrix verb (hereafter T.
NF-DT.
M) most frequently.
T.
NF-DT.
M were predicted in 33% instance in condition (a) and 34% in condition (b) (z=0.
18).
Given the similar cloze probabilities, the surprisal metric will predict no difference in RT at T.
NF-DT.
M in the two conditions during online processing (cloze probabilities can be used to compute surprisal, see Levy and Keller, 2013).
If the RTs at T.
NF-DT.
M in condition (a) is less than (b) that would be better explained by the higher cost due to the ungrammatical prediction.
To ascertain this, we conducted an SPR study (n=50) using items similar to the ones used in the previous experiment (see, 2a and 2b).
The critical region was T.
NF-DT.
M.
24 set of items along with 72 fillers were constructed.
2a hari-ne geeta-se umesh-ko milne ko kaha, Hari-ERG Geeta=ABL Umesh=ACC meet-inf(T.
NF) told(DT.
M) 2b hari-ko geeta-ne umesh-ko milne ko kaha , .
Hari-ACC Geeta=ERG Umesh=ACC meet-inf(T.
NF) told(DT.
M) While the prediction of T.
NF-DT.
M is the same in the two conditions, % ungrammatical predictions are more in (b) vs (a).
Results show that the RT in (a) < (b) at the critical region (t=2.
32).
This goes against the surprisal metric and shows the cost incurred due to ungrammatical predictions.
Our work establishes that the cost of ungrammatical predictions indeed appears during online processing.
This processing cost is not predicted by a metric like surprisal and highlights its limitations.
This study also provides evidence against the robust predictions in head-final languages.
It suggests that the prediction mechanism in such languages is more nuanced and points to the need to study the nature of ungrammatical predictions during processing.

Related Results

Study on Electromagnetic Shielding of Infrared /Visible Optical Window
Study on Electromagnetic Shielding of Infrared /Visible Optical Window
In allusion to electromagnetic radiation damage that existed in daily life, social safety and military field, electromagnetic shielding technology of infrared and infrared optical ...
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
This review summarizes the evidence from six randomized controlled trials that judged the effectiveness of systematic review summaries on policymakers' decision making, or the most...
Thematic Roles of Sentence Elements Found in "Me Before You" Movie
Thematic Roles of Sentence Elements Found in "Me Before You" Movie
Sentence is very important in learning language. Sentence is used in every language activity. For understanding sentence, we must study structure of the sentence, elements that for...
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Parsing errors in Hindi: Investigating limits to verbal prediction in an SOV language
Parsing errors in Hindi: Investigating limits to verbal prediction in an SOV language
The role of prediction during sentence comprehension is widely acknowledged to be very critical in SOV languages. Robust clause-final verbal prediction and its maintenance have be...
Identifying Links Between Latent Memory and Speech Recognition Factors
Identifying Links Between Latent Memory and Speech Recognition Factors
Objectives: The link between memory ability and speech recognition accuracy is often examined by correlating summary measures of performance across various tasks, but i...
Cash‐based approaches in humanitarian emergencies: a systematic review
Cash‐based approaches in humanitarian emergencies: a systematic review
This Campbell systematic review examines the effectiveness, efficiency and implementation of cash transfers in humanitarian settings. The review summarises evidence from five studi...
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic 
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic 
Abstract Background: To minimize the risk of infection during the COVID-19 pandemic, the learning mode of universities in China has been adjusted, and the online learning o...

Back to Top