Disfluencies as intra-utterance dialogue moves ∗

Although disfluent speech is pervasive in spoken conversation, disfluencies have received little attention within formal theories of grammar. The majority of work on disfluent language has come from psycholinguistic models of speech production and comprehension and from structural approaches designed to improve performance in speech applications. In this paper, we argue for the inclusion of this phenomenon in the scope of formal grammar, and present a detailed formal account which: (a) unifies disfluencies (self-repair) with Clarification Requests, without conflating them, (b) offers a precise explication of the roles of all key components of a disfluency, including editing phrases and filled pauses, and (c) accounts for the possibility of self addressed questions in a disfluency.


Introduction
Although disfluencies are pervasive in spoken conversation, they have typically been viewed by theoretical linguists as the "untouchables" of language -elements not fit to populate the grammatical domain. Their very existence is a significant motivation for the competence/performance distinction Chomsky 1965 and for the assumption that spoken language is not the input for language acquisition Chomsky 1972. Indeed even quite recently researchers highly skeptical of the competence/performance distinction could suggest that "[t]he competence approach uncontroversially excludes performance mishaps such as false starts, hesitations, and errors from the characterization of linguistic knowledge." Seidenberg (1997Seidenberg ( : 1599. In contrast to this malign attitude to disfluencies, Schegloff, Jefferson & Sacks (1977) initiated the study of such utterances among conversation analysts, showing that self-corrections share many properties with clarificational and correctional utterances made by the other interlocutor. Over the last twenty years there has been increasing interest in the study of selfcorrections, hesitations, and other disfluencies among psycholinguists e.g., Levelt 1983, Herbert Clark & FoxTree 2002, Bailey & Ferreira 2007, phoneticians e.g., Candea, Vasilescu & Adda-Decker 2005, Horne 2012 and computational linguists and researchers on speech processing e.g., Shriberg 1994, Heeman & Allen 1999, Johnson & Charniak 2004 In this paper, we present a detailed formal grammatical account which: i. unifies disfluencies (self-repair) with Clarification Requests (CRs), without conflating them, ii. offers a precise explication of the roles of all key components of a disfluency, including editing phrases and filled pauses, iii. accounts for the possibility and range of self addressed questions in a disfluency.
1 Even in the realm of terminology there is no shortage of controversy. NLP and speech researchers tend to use disfluency, in contrast to self-repair or self-correction, used by conversation analysts, who avoid the former term given its negative implicatures. The more medically-oriented literature uses dysfluency to refer inter alia to stuttering, as Robert Eklund (p.c.) alerted us; see also Eklund 2004: chap. 2. As will become clear, our choice of disfluency is not intended to disparage or impute "abnormality" to this ubiquitous class of utterances.

9:2
Disfluencies as intra-utterance dialogue moves Beyond the need for assuming an incremental perspective towards language processing, an assumption that has in any case become increasingly influential in recent years Kempson, Meyer-Viol & Gabbay 2000, H. Rieser & Schlangen 2011a, our account will involve positing no additional mechanisms beyond those already needed for the interpretation of dialogue. We will see that disfluencies manifest precisely the characteristics one expects of a grammatical phenomenon: they exhibit both significant cross-linguistic variation at all linguistic levels and also potential universals and, far from constituting meaningless "noise", participate in semantic and pragmatic processes such as anaphora, conversational implicature, and discourse particles, as illustrated in (1). In all three utterances in this example, the semantic process is dependent on the reparandum (the phrase to be repaired) as the antecedent: (1) a. Peter was, well, he was fired. (Example from Heeman & Allen 1999; anaphor refers to material in reparandum.) b. A: Because I, any, anyone, any friend, anyone, I give my number to is welcome to call me (Example from the Switchboard corpus Godfrey, Holliman & McDaniel 1992; implicature based on contrast between repair and reparandum: It's not just her friends that are welcome to call her when A gives them her number.) c. The other one did, no, other ones did it. (Example from BNC (file KB8, line 1705); material negated by no originates in the reparandum.) The structure of the paper is the following: in Section 2 we review the "syntax" of disfluencies, give a classification of types of disfluencies, make some observations about what desiderata for a discourse theory of disfluencies are, in particular arguing that it needs to be grounded within a grammar, and we critically review previous work on disfluencies. Section 3 provides background about the formal dialogue theory we utilize, KoS 2 Ginzburg 2012, and in particular explains how it can be used to analyze clarification interaction. In Section 4 we offer an informal sketch of our analysis of disfluencies. Section 5 spells out this analysis for the two classes of disfluencies that we argued earlier need to be distinguished. Section 6 offers some brief conclusions.
2 KoS is not an acronym, despite emphasizing a Konversationally Oriented Semantics.  General pattern of self-repair As has often been noted (see e.g., Levelt 1983, and references therein for earlier work), speech disfluencies follow a fairly regular pattern. The elements of this pattern are shown in Figure 1, annotated with the labels introduced by Shriberg 1994, who was building on earlier work of Levelt 1983. Of these elements, all but the moment of interruption and the continuation are optional. The presence of elements and their relations can be used as the basis for classifying disfluencies into different types McKelvie 1998, Heeman & Allen 1999 • If the alteration differs strongly from the reparandum and does not form a coherent unit together with the start, or if alteration and continuation are not present at all, the disfluency can be classified as an aborted utterance, or false start.
• If the alteration "replaces" the reparandum, the disfluency is a repair.
• If the alteration elaborates on the reparandum, it is a reformulation.
The following gives examples for these three classes, in the order they were mentioned: 3 (2) a.  3 These examples, and most others in this section, are taken from the Switchboard corpus (Godfrey, Holliman & McDaniel 1992), with disfluencies annotated according to Godfrey, Holliman & McDaniel 1992: "+" marks the moment of interruption and separates reparandum from alteration, "{ }" brackets editing terms and filled pauses, and "[]" brackets the disfluency as a whole.

9:4
Disfluencies as intra-utterance dialogue moves Within the class of repairs, a further distinction can be made Levelt 1983: • appropriateness-repairs replace material that is deemed inappropriate by the speaker given the message she wants to express (or has become so, after a change in the speaker's intentions or in the state of the world that is being described), while • error-repairs repair material that is deemed erroneous by the speaker.
Finally, these types of disfluencies can be, with a nod to the similarly named distinction in the DAMSL annotation scheme Core & Allen 1997, labelled backward looking disfluencies, as here the moment of interruption is followed by an alteration that refers back to an already uttered reparandum. We can distinguish from these types those disfluencies where the moment of interruption is followed not by an alteration, but just by a completion of the utterance which is delayed by a filled or unfilled pause (hesitation) or a repetition of a previously uttered part of the utterance (repetitions). We will call this kind of disfluency forward looking; 4 the following gives some examples of such disfluencies. (

Desiderata for a theory of disfluencies
We now make some observations about disfluencies that a theory of their semantics and pragmatics must address. 4 Levelt 1983 refers to such disfluencies as covert repair.

Disfluencies are recognized incrementally
As with many kinds of linguistic structure, the structure of a disfluency (as indicated in Figure 1) is not given en bloc, but rather must be recognized incrementally. The listener faces what Levelt 1983 called the continuation problem, which is roughly the problem of how to integrate the material from the alteration into the previous material; the solution of this problem requires computation of what the reparandum is. Levelt (1983: 492) proposes rules based on lexical identity (word identity convention) and categorial identity (category identity convention). We will be proposing to add to these rules content-based conventions for identifying the reparandum. The semantics of the reparandum can also be more directly relevant to the semantics of the alteration, namely in cases where anaphora in the alteration involves reference to an entity introduced in the reparandum which is not meant to be repaired or corrected (i.e., the antecedent is part of the anticipatory retracing), as in the following examples: (4) From Shriberg 1994: Our dog likes-he loves the beach.
(5) From Heeman & Allen 1999 (repeated from above (1) (7): "[in the fresh start in utterances 13.4-5] S replaces the proposal introduced in 9.1-13.2 with a new one, but in doing so he assumes that the engine at Avon, engine E1 is part of the common ground. If the repair process were to take place before discourse referents are established and reference resolution is performed, the referent would be removed, and we would end up with a pronoun without antecedent."

Disfluencies have significant discourse effects
Recent psycholinguistic studies have shown that both the simple fact that a disfluency is occurring and its content can have significant discourse effects, which show in different behaviour of listeners. Bailey & Ferreira 2007 found that "filled pauses may inform the resolution of whatever ambiguity is most salient in a given situation", and Brennan & Schober 2001 found that in a situation with two possible referents, the fact that a description was selfcorrected enabled listeners to draw the conclusion that the respective other referent was the correct one, before the correction was fully executed. Similarly, Arnold, Kam & Tanenhaus 2007 showed that during reference resolution what we call forward looking disfluencies allow listeners to infer that the speaker is having difficulty with lexical retrieval, which in a reference identification task leads listeners to look at those objects that are more difficult to name, a finding that has been replicated in a corpus study on more naturalistic dialogues reported in Schlangen, Baumann & Atterer 2009. (Interestingly, as Arnold, Kam & Tanenhaus 2007 report, the effect of the disfluencies to make reference to difficult-to-describe objects more likely goes away if listeners are told their partners suffer from aphasia and have problems finding words.)

2.2.3
Disfluencies are related to other dialogue moves Figure 2 illustrates the continuity between more typically described types of (discourse) correction and clarification on the one hand and disfluencies on the other. It shows (constructed) examples of "normal" discourse correction (a), two uses of clarification requests (b & c), correction within a turn (d), other-correction mid-utterance (e), and two examples of self-correction as discussed above (f & g). The first four examples clearly are instances of phenomena within the scope of discourse theories. What about the final two?
There are clear similarities between all these cases: (i) material is presented publicly and hence is open for inspection; (ii) a problem with some of the material is detected and signalled (i.e., there is a "moment of interruption"); (iii) the problem is addressed and repaired, leaving (iv) the incriminated material with a special status, but within the discourse context. That (i)-(iii) describe the situation in all examples in Figure 2 should be clear; that (iv) is the case also for self-corrections can be illustrated by the next example (re-9:7 Ginzburg, Fernández, & Schlangen Figure 2 Continuity between (discourse) corrections and clarifications and disfluencies peated from above), which shows that self-corrected material is also available for later reference: We take this as evidence that it would be desirable to have a model that brings out these similarities between these phenomena, while respecting their differences.

Disfluencies are in the grammar
In the introduction, we already mentioned that grammarians have usually assumed that an analysis of disfluencies is outside the scope of the grammar; indeed their existence is an important motivation for the competence/performance distinction. The question of whether to include a set of linguistic utterance types X within the grammar has frequently preoccupied grammarians, but has rarely been addressed systematically. 6 We offer here various arguments for why the view of a disfluency-free grammar is untenable, though, as will become clear, the discussion raises some deep issues that we cannot resolve here.
For a start, it is instructive to think about disfluencies by analogy with friction. Non-disfluent speech is analogous to frictionless motion. Some of the time it is useful to ignore the effects of friction, but the theory of motion is required to explicate the existence and quantitative effects of friction. Whereas it seems plausible that not all disfluencies are consciously produced by the speaker, for the addressee they always form part of the string of phonemes perceived which needs to be parsed and interpreted.
More concretely, disfluencies display an important characteristic of grammatical processes, namely cross-linguistic variation. This has been documented in some detail in comparative work between morphosyntactic aspects of repair on a wide range of languages by Fox and collaborators (e.g., Fox, Hayashi & Jasperson 1996, Wouk et al. 2009, Fox, Maschler & Uhmann 2010 7 and in phonetic analysis of hesitation markers Candea, Vasilescu & Adda-Decker 2005. 8 Here we briefly note some evidence concerning hesitation markers and editing phrases. Concerning the former, we note that there is some variation in how hesitation is typically expressed in various languages, 7 In a study of seven languages with significantly different typological characteristics Wouk et al. 2009 find important correlations between the diversity of length in a language's lexicon and the site of repair initiation: for instance, Chinese displays a strong preference for initiating repair in monosyllabic words, in contrast to Japanese where the preference is for initiation in multisyllabic words. Fox, Maschler & Uhmann 2010 demonstrate significant differences across English, Hebrew, and German in the distribution of words where recycling (reutterance of a word, typically as a hesitation device) and replacement (repairs where the alteration is distinct from the reparandum, used in self-correction) occur: for instance, English's majoritarian category for recycling is the subject pronoun, whereas for both German and Hebrew it is the preposition; German replacement favours verbs and determiners, in marked contrast to English and Hebrew, which favour nouns. Patterns such as these seem strongly related to word order and complexity of inflectional morphology. With respect to the latter, a child acquiring English needs to discover that no can be used in a self-correction, but, for instance, the closely related word nope cannot. Similarly, a trilingual acquiring English, German, and French will need to learn that enfin can be used in a self-correction, whereas finally and schließlich, which are often interchangeable with enfin, cannot be so used: Conversely, we suggest that disfluencies are also involved in grammatical universals. We postulate the following: (14) a. if NEG is a language's word that can be used as a negation and in cross-turn correction, then NEG can be used as an editing phrase in backward looking disfluencies. These considerations argue for the fact that the elements participating in disfluencies are subject to phonological, syntactic, and semantic constraints internal to individual languages, as well as exhibiting universal properties common to many languages. They strongly suggest, then, that disfluencies are part and parcel of grammatical systems of natural languages. Of course part of the reluctance to accord disfluency-containing utterances the status of utterances internal to the grammatical system derives from the assumption that the task of grammar is to characterize the "well formed" utterances of a given language, which apparently implicates inter alia the fluency of such utterances. The force of this view has weakened with the increasing recognition that "grammaticality" is a gradable rather than a classifying notion Keller 2000. Thus, A. Clark, Giorgolo & Lappin 2013 propose a gradient notion of grammaticality that arises via a set of scoring procedures for mapping the logprob value of a sentence on the basis of the properties of the sentence and the corpus containing the sentence. Such a view can be generalized into a view of grammar as a mechanism that enables us to characterize the coherently interpretable conversational events.

Previous work on disfluencies
Disfluencies have received a fair amount of attention both in psycholinguistics and in computational linguistics. In this section we give a brief overview of the most prominent approaches in these fields. To the best of our knowledge, none of the existing approaches has studied disfluencies from a semantic point of view, incorporating them into the grammar, and proposing a general framework that offers a treatment of disfluencies alongside other dialogue moves -as we shall propose here.
It is not surprising that computational linguists have been concerned with disfluencies because automatic natural language understanding systems that deal with spoken input cannot succeed unless disfluencies can be handled. The main concern of computational linguists has been to detect and process disfluencies automatically. To this end, many corpus studies have been performed, which have provided very valuable information concerning the structural properties, the distributional characteristics, and the frequency of different types of disfluencies Godfrey, Holliman & McDaniel 1992, Shriberg 1994, Besser & Alexandersson 2007. This information has been exploited to recognize disfluencies automatically either by means

9:12
Disfluencies as intra-utterance dialogue moves of rules McKelvie 1998, Core 1999 or by leveraging statistical information Stolcke & Shriberg 1996, Heeman & Allen 1999 Detecting the presence of disfluencies is of course only the first step in being able to handle them appropriately. In computational linguistics, the predominant approach to processing disfluencies after they have been detected has been to filter them out before or during parsing, prior to any process of semantic interpretation Stolcke & Shriberg 1996, Heeman & Allen 1999, Charniak & Johnson 2001. While this kind of filtering approach may have practical advantages (as the interpretation module does not have to deal with disfluencies), theoretically such a model is implausible, given that rather long segments can be self-corrected (as in the next example), so that this model would entail the claim that interpretation can lag behind for arbitrarily long intervals, running against much evidence in psycholinguistics for the immediacy of interpretation (as we mentioned in Section 2.2.1). The filtering approach has therefore received strong criticism from authors in psycholinguistics Lickley 1994 (15) A. Within psycholinguistics, researchers have looked into a wide variety of aspects related to disfluencies. From the point of view of language production, the main concern has been how speakers monitor and correct their speech Levelt 1983, 1989, Van Wijk & Kempen 1987. Regarding language comprehension, some authors have investigated the pragmatic effects triggered by disfluencies (we have already mentioned several studies in Section 2.2.2 showing that disfluencies can lead listeners to draw inferences on the information state of the speaker), while others have been concerned with how disfluencies are recognized and processed by the human parser (e.g., Levelt 1983, Ferreira, Lau & Bailey 2004, Bailey & Ferreira 2007). Clark initiated a line of research to which we add here, where disfluencies are considered genuine communicative acts used by speakers as part of their repertoire of strategies to achieve synchronisation Herbert Clark 2002. For instance, Herbert Clark & FoxTree 2002 claimed that filled pauses (in our terminology, forward looking disfluencies) are lexical items with the conventionalised meaning a short / slightly longer break in fluency is coming up. 10 However, no semantic formalisation of Clark's seminal work has been given.
As we mentioned in Section 2.2.1, Levelt 1983 suggested syntactic conventions that would allow listeners to solve the continuation problem they face when a repetition or a repair (what we are calling backward looking disfluencies) is processed: what is the reparandum and where does the repair start? He proposes two syntactic constraints, word identity and category identity, that would guide listeners in identifying the onset of the reparandum. Word identity applies when the first word of the repair is identical to a word in the original utterance, which would then be taken as the point where the reparandum starts. Category identity is meant to apply in cases where there isn't an identical word but only a match in the syntactic category of a word in the original utterance and the first word of the repair. Levelt sees the interruption moment as a sort of coordinating connective: "The original utterance and the repair are, essentially, delivered as two conjuncts. The syntax of repairing is governed by a rule of well-formedness, which acknowledges this coordinating character of repairs." Levelt 1989, p. 499. Ferreira, Lau & Bailey 2004, building in part on Levelt's ideas, propose a more concrete model cast in the formalism of Tree Adjoining Grammar. Their "disfluency reanalysis" approach centres around a parsing operation of "Overlay". According to this approach, the incremental parser, upon encountering new material that cannot be attached to an existing node in the syntactic tree being constructed, attempts to overlay the tree corresponding to the alteration material on top of the reparandum tree. For this, the parser relies on recognizing root node identities between the syntactic trees of the reparandum and the alteration. The new tree prevails but, crucially, "[t]he reparandum tree has some effect on processing because it was not deleted but rather covered up with the replacement/repair tree. The unique bits of that tree are therefore still somewhat visible to the processor, and so they can affect its operations" Ferreira, Lau & Bailey 2004, p. 742. This arguably accounts for some processing effects such as a "lingering" effect of the argument structure of a repaired verb. 11 Since these proposals are strictly concerned with syntactic constraints, it is difficult to judge whether they could allow for some degree of transparency to reach the interpretation processing module. Nevertheless they are interesting because they leave open the possibility that the meaning of the disfluency and the reparandum could indeed influence the process of disfluency recognition (hence fulfilling one of our desiderata discussed above). However, both Levelt's and Ferreira and colleagues' models also seem to miss the similarities between self-correcting disfluencies and other types of corrections we have discussed above; they also cannot explain why it seems possible to take over the turn both in backward looking disfluencies and forward looking ones, as was shown above.
As will become clear below, our approach incorporates the insights of these models regarding structural parallelism and makes a clear step forward by adding an account of the semantics of disfluencies which, in addition, connects them to other dialogue moves. We start by providing in the next section background on the dialogue framework we use here, namely KoS, describing in particular how this framework deals with "between-utterance" clarification moves (of the types (a)-(c) from Figure 2). In Section 4 we then sketch the (very few) extensions that are needed to capture disfluencies as well, which we will develop formally in section 5. We defer to future work the important tasks of specifying a grammar that can incorporate incremental parsing and interpretation of disfluency-containing utterances and the identification of reparanda.

Dialogue gameboards
KoS is formulated within the framework of Type Theory with Records (TTR) Cooper 2005, Cooper & Ginzburg 2014, a model-theoretic descendant of Martin-Löf Type Theory Ranta 1994 and of situation semantics Barwise & Perry 1983, Cooper & Poesio 1994, Ginzburg & Sag 2000. TTR enables one to develop a semantic ontology, including entities such as events, propositions, and questions. With the same means TTR enables the construction of a grammatical ontology consisting of utterance types and tokens and of an interactional domain in which agents utilize utterances to talk about the semantic universe. What makes TTR advantageous for our dialogical aims is that it provides access to both types and tokens at the object level. This plays a key role in developing metacommunicative interaction, as we shall see below, in that it enables simultaneous reference to both utterances and utterance types.
For current purposes, the key notions of TTR are the notion of a judgement and the notion of a record.
• The typing judgement: a : T classifying an object a as being of type T .
• Records: A record is a set of fields assigning entities to labels of the form (16a), partially ordered by a notion of dependence between the fields -dependent fields must follow fields on which their values depend. A concrete instance is exemplified in (16b). Records are used here to model events and states, including utterances, and dialogue gameboards. 12 Cooper & Ginzburg 2014 suggest that for events with even a modicum of internal structure, one can enrich the type theory using the "String theory" developed by Tim Fernando (e.g., Fernando 2007).

9:16
Disfluencies as intra-utterance dialogue moves • Record Types: a record type is simply a record where each field represents a judgement rather than an assignment, as in (17).
The basic relationship between records and record types is that a record r is of type RT if each value in r assigned to a given label l i satisfies the typing constraints imposed by RT on l i . More precisely, The record iff a 1 : T 1 , a 2 : T 2 , . . . , a n : T n To exemplify this, (19a) is a possible type for (16b), assuming the conditions in (19b) hold. Records types are used to model utterance types (aka as signs) and to express rules of conversational interaction. Armed with these basic logical notions, let us return to characterizing conversational states. On the approach developed in KoS, there is actually no single context -instead of a single context, analysis is formulated at a level of information states, one per conversational participant. The type of such information states is given in (20a), which shows the split into a dialogue gameboard and a private part of the information state. We leave the structure of the private part unanalyzed here (for details on this, see e.g., Larsson 2002) and focus on the dialogue gameboard, which represents information that arises from publicized interactions. Its structure is given in (20b): In this view of context: • The spkr/hearer roles serve to keep track of turn ownership.
• FACTS represents the shared knowledge conversationalists utilize during a conversation. More operationally, this amounts to information that a conversationalist can use embedded under presuppositional operators.
• Pending: represents information about utterances that are as yet ungrounded. 13 Each element of Pending is, for reasons explained below, a locutionary proposition, a proposition individuated by an utterance event and a grammatical type that classifies that event.
• Moves: represents information about utterances that have been grounded.
The main motivation is to segregate from the entire repository of pre-13 Here grounding (in the sense of Herbert. Clark & Schaefer 1989, Herbert Clark 1996 refers to the process of establishing presuppositions that utterances are mutually understood.

9:18
Disfluencies as intra-utterance dialogue moves suppositions information on the basis of which coherent reactions to the latest conversational move can be computed.
• QUD: (mnemonic for Questions Under Discussion) -questions that constitute a "live issue". That is, questions that have been introduced for discussion at a given point in the conversation and not yet been downdated: A query q updates QUD with q, whereas an assertion p updates QUD with p?. There are additional, indirect ways for questions to get added into QUD, the most prominent of which is during metacommunicative interaction (see below). Being maximal in QUD (MaxQUD) corresponds to being the current "discourse topic" and is a key component in the theory.
A conversational state c1 will be a record r 1 such that (21) holds; in other words, r 1 should have the make up in (21a) and the constraints in (21b) need to be met: 14 The basic units of change are mappings between dialogue gameboards that specify how one gameboard configuration can be modified into another on the basis of dialogue moves. We call a mapping between DGB types a conversational rule. 15 The types specifying its domain and its range we dub, 14 In the sequel we omit utterance times for simplicity. 15 We view the conversational rules as embodying the conversationalists' knowledge of dialogical semantics. However, as discussed in detail in Ginzburg 2012, some rules are clearly parameterized by indubitably pragmatic information, viz. information originating from the private part of the information state, for instance the conditions under which a question is downdated from QUD, exemplified in (23) below. This view of there being "dialogical 9:19 Ginzburg, Fernández, & Schlangen respectively, the preconditions and the effects, both of which are supertypes of DGBType. Examples of such rules, needed to analyze querying and assertion interaction and whose use is exemplified in (23) below, are given in (22). 16 semantics" seemingly deviates from certain conceptions of the semantics/pragmatics border, as pointed out to us by an anonymous reviewer for Semantics and Pragmatics, where traditionally semantics stopped at the turn boundary. We return to this issue, albeit briefly, in footnote 21.
16 These rules employ a number of abbreviatory conventions. First, instead of specifying the full value of the list Moves, we record merely its first member, which we call LatestMove. Second, the preconditions can be written as a merge of two record types DGBType − ∧ merge PreCondSpec, one of which, DGBType − , is a supertype of DGBType and therefore represents predictable information common to all conversational rules; PreCondSpec represents information specific to the preconditions of this particular interaction type. Similarly,the effects can be written as a merge of two record types DGBType 0 ∧ merge ChangePrecondSpec, where DGBType 0 is a supertype of the preconditions and ChangePrecondSpec represents those aspects of the preconditions that have changed. So we can abbreviate conversational rules as in (i); the unabbreviated version of Ask QUD-incrementation would be as in (ii): . QSPEC: this rule characterizes the contextual background of reactive queries and assertions -if q is MaxQUD, then subsequent to this either conversational participant may make a move constrained to be q-specific (i.e., either About or Influencing q). 17  Accept move: specifies that the background for an acceptance move by B is an assertion by A and the effect is to modify Latest-Move. 17 We notate the underspecification of the turn holder as TurnUnderspec, an abbreviation for the following specification which gets unified together with the rest of the rule:  e. Fact Update/ QUD Downdate: given an acceptance of p by B, p can be unioned into FACTS, whereas QUD is modified by the function NonResolve. NonResolve is a function that maps a partially ordered set of questions poset(q) and a set of propositions P to a partially ordered set of questions poset (q) which is identical to We exemplify how these rules work in (23) Assert QUD-incrementation Assert QUD-incrementation Three comments on (23b) should be added, two specific and one methodological. One minor point is that B's acceptance is vague: we have assumed it involves accepting (3b) and (3a) and is neutral with respect to whether q0 has been exhaustively discussed. But clearly, it could also be interpreted as only accepting (3b) or as closing the discussion completely. A more significant point that will apply to other examples we consider below concerns the ordering on QUD. (23b) illustrates why QUD should not be viewed as a stack, but rather a partially ordered set: (3b) addresses the initial question posed, not (directly) the issue of whether Peter is a good candidate, the most recently introduced issue. Data such as these, as well as from multi-party dialogue, motivated Ginzburg 2012 to propose that when a question q is pushed onto QUD it doesn't subsume all existing questions in QUD, but rather only those on which q does not depend: (24) q is QUD mod(dependence) maximal iff for any q 0 in QUD such that ¬Depend(q, q 0 ): q q 0 .

Ginzburg, Fernández, & Schlangen
This is conceptually attractive because it reinforces the assumption that the order in QUD has an intuitive semantic basis. One effect this has is to ensure that any polar question p? introduced into QUD, whether by an assertion or by a query subsequent to a wh-question q on which p? depends does not subsume q. 18 A final, methodological point: (23b) exemplifies (an initial version of) KoS's theory of conversational relevance. Pretheoretically, conversational relevance relates an utterance u to an information state I just in case there is a way to successfully update I with u. Ginzburg 2010, 2012 defines two notions of relevance, a simpler one at the level of moves, i.e., illocutionary contents of utterances, as above, and a somewhat more complex one at the level of utterances. 19 Thus, given the rules posited so far (25b) is recognized as relevant as a follow-up to (25a), whereas (25c) is not: The theory discussed in Section 3.2 will accommodate the latter as relevant as well. Thus, one of the empirical tests of KoS, as with other theories of dialogue, is the class of utterances they can classify as relevant, akin to notions of generative capacity for theories of syntax. 20

Grounding and clarification
Given a setup with DGBs as just described and associated update rules, distributed among the conversationalists, it is relatively straightforward to provide a unified explication of grounding conditions and the potential for Clarification Requests (or CRification) and (metacommunicative) correction. 21 We explain how this can be done, while motivating in particular the information associated with the contextual field Pending. Schegloff 1987 points out 20 We thank an anonymous reviewer for Semantics and Pragmatics for raising this issue. 21 In line with our earlier discussion and responding to a query by an anonymous reviewer, we view the knowledge of the grounding conditions and potential clarification moves for a particular utterance type as part of an interlocutor's dialogical competence. Given that this competence draws heavily on grammatical knowledge, as will become clear below, we believe this justifies viewing this as semantic for whatever it is worth. How to characterize the relevance of such responses? The data we have just seen in (26)-(29) indicates that the search space for potential clarification questions is small. We will suggest that this can be modelled in terms of a small number of schemas of the form: "if u is an utterance and u0 is a constituent of u, add the clarification question CQ(u0) into QUD." To understand why, we first need to consider how utterances are integrated into the DGB.

9:24
In terms of the Dialogue GameBoard the issue can be formulated as follows: what information needs to be associated with Pending to enable the formulation of grounding conditions/CR potential? The requisite information needs to be such that it enables the original speaker to interpret and recognize the coherence of the range of possible clarification queries that the original addressee might make.
Ginzburg 2012 offers detailed arguments on this issue, including considerations of the phonological/syntactic parallelism exhibited between CRs and their antecedents and the existence of CRs whose function is to request repetition of (parts of) an utterance, see (26) above. Taken together with the obvious need for Pending to include values for the contextual parameters specified by the utterance type, Ginzburg concludes that the type of Pending combines tokens of the utterance, its parts, and of the constituents of the content with the utterance type associated with the utterance. An entity that fits this specification is the locutionary proposition defined by the utterance: in the immediate aftermath of a speech event u, Pending gets updated with a record of the form of (30a) of type locutionary proposition (LocProp). Here T u is a grammatical type for classifying u that emerges during the process of parsing u. In the most general case, given the need to accommodate structural ambiguity, it should be thought of as a chart Cooper 2012, but in the cases we consider here it can be identified with a sign in the sense of Head Driven Phrase Structure Grammar (HPSG). The relationship between u and T u -describable in terms of the proposition p u given in (30b) -can

9:26
Disfluencies as intra-utterance dialogue moves be utilized in providing an analysis of grounding/CRification conditions, as shown in (31). 23 Grounding: p u is true: the utterance type fully classifies the utterance token. b. CRification: p u is false, either because T u is weak (e.g., incomplete word recognition) or because u is incompletely specified (e.g., incomplete contextual resolution -problems with reference resolution or sense disambiguation).
It is useful to conceive of the integration of an utterance in an information state as a potentially cyclic process. Instantiation of some, perhaps all, contextual parameters will occur as soon as an utterance has taken place, assuming T u is uniquely specified; if this is not the case, then CRification can occur on that level. Parameter instantiation can also take place subsequently, as when more information is provided as a consequence of CRification. Given this, utterance integration can be broken into three components: i. We exemplify this series of contextual updates in (32): a. An utterance type akin to an HPSG sign; we subsequently call this type IGH : A locutionary proposition whose situational component is u0 (with four sub-utterances u is ,u Georges ,u here ,u is georges here ) and whose type component is IGH: sit-type = In(l, g)

9:28
Disfluencies as intra-utterance dialogue moves c. A DGB in the immediate aftermath of an utterance classified by the type IGH; we note for future reference also certain utterancerelated presuppositions that must be in place -the fact that u0 is the most recent utterance and the existence of appropriate witnesses for the contextual parameters l and g, corresponding to the sub-utterances here and Georges.
In(l, A,B ), Named(Georges,g), A witness for the contextual parameters of IGH: f. The evolution of the DGB after using the rule of Contextual extension with the witness w0:

Ginzburg, Fernández, & Schlangen
We concentrate here on characterizing the range of possible CRs, specifically intended content CRs (28); analogous remarks apply to other types of CRs. The non-sentential CRs in (33a) and (33b) are interpretable as in the parenthesized readings. This provides justification for the assumption that the context that emerges in clarification interaction involves the accommodation of an issue -one that for A's utterance in (33), assuming the sub-utterance Bo is at issue, could be paraphrased as (33c). The accommodation of this issue into QUD could be taken to licence any utterances that are co-propositional with this issue, where Co-propositionality is the relation between utterances defined in (34). This will also allow as relevant responses corrections, as in (33d): (34) Co-propositionality a. Two utterances u 0 and u 1 are co-propositional iff the questions q 0 and q 1 they contribute to QUD are co-propositional.
Co-propositionality for two questions means that, modulo their domain, the questions involve similar answers. For instance Whether Bo left, Who left, and Which student left (assuming Bo is a student) are all co-propositional. In the current context co-propositionality amounts to either a CR which differs from MaxQUD at most in terms of its domain, or a correction -a proposition that instantiates MaxQUD. We also note one fairly minor technical modification to the DGB field QUD, motivated in detail in Fernández 2006, Ginzburg 2012, assuming one wishes to exploit QUD to specify the resolution of non-sentential utterances such as short answers, sluicing, and various other fragments. QUD tracks not simply questions qua semantic objects, but pairs of entities: a question and an antecedent sub-utterance. This latter entity provides a partial specification 24 Recall from the assertion protocol that asserting p introduces p? into QUD.

9:30
Disfluencies as intra-utterance dialogue moves of the focal sub-utterance, and hence it is dubbed the focus establishing constituent (FEC) (cf. parallel element in higher order unification-based approaches to ellipsis resolution e.g., Gardent & Kohlhase 1997.) Thus, the FEC in the QUD associated with a wh-query will be the wh-phrase utterance, the FEC in the QUD emerging from a quantificational utterance will be the QNP utterance, whereas the FEC in a QUD accommodated in a clarification context will be the sub-utterance under clarification. Hence the type of QUD is InfoStruc, as defined in (35) Parameter Identification (36) underpins CRs such as (37b)-(37c) as followups to (37a). We can also deal with corrections, as in (37d). B's corrective utterance is co-propositional with λxMean (A, u0, x), and hence allowed by the specification: 25 In the case of singleton values for the FEC we will typically abuse notation and identify the set by its single member. To exemplify our account of how CRs get integrated in context, we exemplify in Figure 3 how the same input leads to distinct outputs on the "public level" of information states. In this case this arises due to differential ability to anchor the contextual parameters. The utterance u0 has three sub-utterances, u1, u2, u3, given in Figure 3 with their approximate pronunciations. A can ground her own utterance since she knows the values of the contextual parameters, which we assume here for simplicity include the speaker and the referent of the sub-utterance Bo. This means that the locutionary proposition associated with u0 -the proposition whose situational value is a record that arises by unioning u0 with the witnesses for the contextual parameters and whose type is given in Figure 3 -is true. This enables the "canonical" illocutionary update to be performed: the issue whether b left becomes the maximal element of QUD. In contrast, let's assume that B lacks a witness for the referent of Bo. As a result, the locutionary proposition associated with u0 which B can construct is not true. Given this, B uses the CCUR parameter identification to build a context appropriate for a clarification request: B increments QUD with the issue λxMean (A,u2,x), and the locutionary proposition associated with u0 which B has constructed remains in Pending. The final generalizations we need to make are along two dimensions. First, whereas for semantically based CRification, it is sufficient to think 26 This is modelled after the proposal of Purver 2004 for analyzing cases such as (i), which he calls fillers:

9:32
Disfluencies as intra-utterance dialogue moves Speaker's witnesses for dgb-params: # Speaker's DGB update: Addressee's witnesses for dgb-params: Addressee's DGB update:  So far, the only non-grounding action we have considered is clarification interaction, in which there is a missing witness for a contextual parameter or phonological type. This triggers a query for that information and a unification of the required information into the representation of the utterance. (Metacommunicative) corrections are a variant on this theme: instead of a missing witness, they involve (pointing out) an incorrect witness, which needs to be replaced by the correct value. As we pointed out above, we have an account for the coherence of content-oriented corrections (see (23)) and metacommunicative ones (see (37d)); what remains to specify for the latter is the effect on the DGB. 27 One possible means for unifying the update and downdate/replacement associated with clarification interaction and corrections, respectively, would be to use an operation such as asymmetric unification in which later information takes precedence. Such a logical operation, named priority union, is specified by Grover et al. 1994, who exemplify a number of its uses. Given the complexity of this operation, however, we postulate an additional update operation, which effects replacements of the desired kind: 27 We do not offer an account here of how dialogue participants actually decide the intended content of a correction if more than a single interpretation is possible in principle. Our basic strategy is to assume that it is sufficient to be able to represent all possible choices, leaving the actual mechanism of choice to an external processing account.

9:34
Disfluencies as intra-utterance dialogue moves To exemplify this, we consider the cross-turn self-correction example in (41) In more detail: after the utterance of Is Georges here, A's FACTS will include the presuppositions that the most recent speech event is u0 (Is Georges here), which includes as sub-utterance u georges , and that u0 is classified by the type IGH; the DGB is essentially the following: In(l, A,B ), Named(Georges,g), Classify(IGH,u0) . . . Ginzburg, Fernández, & Schlangen This allows for parameter identification to be used -the issue What did A mean by u georges becomes MaxQUD with Georges as fec. This licences as LatestMove I meant Jacques, which in turn leads to an update of QUD: Named(Georges,georges), Named(Jacques,jacques), Classify(I meant Jacques,u1) . . .
Accepting this gives rise to an application of Pending replacement, which modifies the original locutionary proposition: u0 is modified to a record v0 with the referent jacques replacing georges and the utterance type is now IJH (Is Jacques here?) whose phon includes the form jacques; the maximal element of Pending, MaxPending, is modified accordingly: Classify(I meant Jacques,u1), Named(Jacques,jacques), Disfluencies as intra-utterance dialogue moves As can be readily observed, the utterance u0 is still a component of facts in FACTS, and hence also its sub-utterance u georges . Neither utterance is a component of Pending, whose content will be subject to uptake in the next utterance. Given that they are in FACTS, referential possibilities to those two utterances (Is Georges here and Georges) -and to the referent of Georges -are not eliminated.
4 From clarification requests to disfluency: Informal sketch The approach described above for CRs and self/other-corrections at a crossturn level extends relatively seamlessly to self-corrections, hesitations, and other types of intra-turn disfluencies. Before going into the technical details, we sketch the account at an informal level, indicating some of its main consequences.
As we pointed out above, the main idea underlying KOS's theory of CRs is that in the aftermath of an utterance u a variety of questions concerning u and definable from u and its grammatical type become available to the addressee of the utterance. These questions regulate the subject matter and ellipsis potential of CRs concerning u and generally have a short lifespan in context.
We propose that a very similar account applies to disfluencies. As the utterance unfolds incrementally there arise questions about what has happened so far (e.g., what did the speaker mean with sub-utterance u1?) or what is still to come (e.g., what word does the speaker mean to utter after sub-utterance u2?). Or slightly more technically, we suggest that incrementally certain utterance monitoring and utterance planning questions can be pushed on to QUD.
By making this assumption we obtain a number of positive consequences. We can: i. explain similarities to other-corrections: the same mechanism is at work, differentiated only by the questions that get accommodated.
ii. explain how the other can take over & do the second part of the disfluency: if what did I want to say / what do I want to say next is indeed a question under discussion, then it should in principle also be possible for the interpreter to address that.
iii. explain how inferences can be drawn from the disfluency: Once the question what do I want to say next has been pushed on QUD, the 9:37 Ginzburg, Fernández, & Schlangen addressee can ask why did he raise that question?, just like she can do with any other question that someone raises. And often a good answer is because he really doesn't know, and a good reason for that could be that it is indeed difficult to know that, which makes sense for this thing here which doesn't really have a good name, as opposed to that thing over there, which can be named easily. This would actually also explain the finding of Arnold, Kam & Tanenhaus 2007, namely that if you explain to subjects that the speaker has a pathology that makes it hard for them to remember names for things, the inference that uh uh means that they are trying to describe the thing that is hard to describe goes (largely) away (see Section 2.2.2). In our approach, this would then just not be a good answer anymore to the question why did he raise that question.
iv. explain internal coherence of disfluencies: #I was a little bit + swimming is an odd disfluency, it can never mean I was swimming in the way that I was a little bit + actually, quite a bit shocked by that means I was quite a bit shocked by that. Why? Because swimming is not a good answer to What did I mean to say when I said a little bit?.
v. explain why a reformulation can implicate that the original use was unreasonable: examples like (45) involve quantity implicatures. These can be explicated based on reasoning such as the following: I could have said (reparandum), but on reflection I said (alteration), which differs only in filtering away the requisite entailment.

An incremental perspective
As we have seen, there are quite a number of benefits that arrive by integrating CRs and disfluencies within one explanatory framework. Still, attractive as it might be, there is some technical work to be done.
In fact, the only modification we make is to extend Pending to incorporate utterances that are in progress, and hence, incompletely specified seman-

9:38
Disfluencies as intra-utterance dialogue moves tically and phonologically. This presupposes the use of a grammar which can associate syntactic types and contents on a word by word basis. For dialogue this is a move that has extensive motivation (for a review see e.g., H. Rieser & Schlangen 2011a and for detailed evidence the papers in H. Rieser & Schlangen 2011b.). There is by now a long tradition within certain grammatical frameworks of specifying grammars to ensure incremental processing, emanating from Categorial Grammar, Lexicalized Tree Adjoining Grammar, and various subsequent frameworks such as Dynamic Dependency Grammar Milward 1994, andDynamic Syntax Kempson, Meyer-Viol &Gabbay 2000. From a semantic point of view, as emphasized by Milward 1994, one of the main requirements is that a non-trivial semantic representation is built word by word . . . What constitutes a non-trivial representation is debatable. The position taken here is that it must use all the information given so far. Thus, an acceptable representation for the sentence fragment John likes would be λxlike(john , x), but not a semantic product such as john * λ(x, y). Milward (1994: 569) Specifying a grammatical framework of the required kind constitutes a paper in its own right. Nonetheless, the closest in spirit is recent work on incremental semantic construction for dialogue by Peldszus, Buß, et al. 2012 and, based on the framework of Robust Minimal Recursion Semantics (RMRS) Copestake 2007, which enables predicate-argument structure to be underspecified. Peldszus and Schlangen formulate and implement an algorithm for interpreting an incrementally provided syntactic representation in a top-down left-to-right fashion. They argue for this strategy (as opposed to e.g., a bottom-up one) as it provides monotonic semantic interpretation that gets further specified as each word gets encountered. Concretely for us, this means that the elements of constits, the potential objects of repair, have their syntactic and semantic classifications constructed monotonically, as long as no repair act occurs.
Here we illustrate their account with one of their examples reformulated using TTR, simplifying and modifying it in various respects, in particular abstracting away from one of their main contributions -the semantic combinatorics. 28 In the example that follows (syntax in Figure 4, semantics in (46)) semantic material added by a given word after the initial word is in bold face. The imperative verb take introduces both illocutionary force and a predicate with two roles, one of which is identified with the addressee; the demonstrative determiner introduces a contextual parameter which is identified with the role of object taken (the label y); book introduces a restriction on that contextual parameter; in introduces a descriptive predicate with two roles, one of which is identified with y.
For our current purposes, the decisions we need to make can be stated independently of the specific grammatical formalism used. The main assumptions we are forced to make concern Pending instantiation and contextual instantiation and more generally, the testing of the fit between the speech events and the types assigned to them. We assume that this takes place incrementally. For concreteness we will assume further that this takes place word by word, though examples like (47), which demonstrate the existence of word-internal monitoring, show that this is occasionally an overly strong assumption.

Backward looking disfluencies
Our analysis now distinguishes between backward looking disfluencies (BLDs) and forward looking disfluencies (FLDs). BLDs we assume are possible essentially at any point where there is "correctable material". Technically this amounts to Pending not being empty. We assume that editing phrases are, at least in some cases, contentful constituents of the repair. ward loo This is implemented by the rule in (48) Backward Looking Appropriateness Repair. Given that u0 is a constituent in MaxPending, it is possible to accommodate as MaxQUD the following InfoStruc: the issue is what did A mean by u0, whereas the FEC is u0; this specifies that the follow-up utterance needs to be co-propositional with MaxQUD.
(48) Backward Looking Appropriateness Repair: Disfluencies as intra-utterance dialogue moves In short, this rule, which is equivalent to Parameter Identification (36) -apart from underspecifying the turn holder -allows us to analyse the alteration (and the editing terms, if present) of a BLD as providing an answer to an issue that has been accommodated as MaxQUD and whose fec corresponds to the reparandum of the disfluency. Since the rule leaves the next turn-taker underspecified, it can also deal with other-corrections and content CRs, such as those in (37b)-(37d).
To make all this clearer, we consider an example in detail. We emphasize that this treatment is almost identical to example (41) we discussed in Section 3.2; the sole difference here is that the self-correction occurs mid-utterance and, hence, necessitates using an incremental content (the one from (46d)).

(49)
Take that book in I mean from the shelf A utters Take that book in. Backward Looking Appropriateness Repair licences the accommodation of What did A mean by uttering in? as MaxQUD, which in turn licences I meant from as an utterance co-propositional with MaxQUD. Subsequent to this Pending Replacement applies and the utterance continues.
In detail: after the utterance of Take that book in, A's FACTS will include the presuppositions that the most recent speech event is u0 (Take that book in), which includes as sub-utterance u in ; The DGB is essentially the one in (50): sit-type = T Take that book in . . .
This allows for Backward Looking Appropriateness Repair to be used. Its effects are shown in (52): the issue What did A mean by u in becomes MaxQUD, with the reparandum in as fec. This licences as LatestMove I meant from: sit-type = T Take that book in . . .
Disfluencies as intra-utterance dialogue moves Accepting this gives rise to an application of Pending replacement, which modifies the original locutionary proposition: u0 is modified to a record v0 with the relation from replacing in and the utterance type is now Take that book from whose phon includes the form from; MaxPending is modified accordingly: We now turn to a slightly different example that can be analysed in essentially the same way as (49). Whereas in (49) the editing terms I mean plus the alteration from the shelf form a canonical sentential structure, in (54) the alteration headphones is non-sentential. We assume this non-sentential utterance is interpreted in precisely the same way as a short answer like (55) (see e.g., Ginzburg & Sag 2000, Fernández 2006, Ginzburg 2012. After the application of Backward Looking Appropriateness Repair, the issue What did A mean with the utterance earphones becomes QUD-maximal with earphones as fec. This licences the bare fragment headphones, which gets the reading I mean headphones. This analysis would extend to the following example due to Levelt (1989), with MaxQUD.q = what did A mean by FEC? and the FEC = to the right (the occurrence after and): (56) To the right is yellow, and to the right-further to the right is blue.

9:45
Ginzburg, Fernández, & Schlangen Our analysis presupposes that the addressee is able to compute the question to be accommodated and its FEC once she has processed the reparandum on the basis of (syntactic) parallelism between reparandum and alteration. The rule-governed nature of this process has been argued for previously by Levelt 1989, who posited a well-formedness (coordination) rule which he argued disfluencies need to observe 29 (see also Hindle 1983, Morrill 2000. That this task facing the addressee is computable is clear given that one can automatically filter disfluencies with rule-based disfluency parsers that essentially rely on identifying (and removing) the reparandum (see e.g., Charniak 2004 andSchuler 2008).

Some more BLD examples
We consider some more examples, which do not, we think, require any modification to our basic analysis, but point to some other interesting empirical issues. The first example we consider is (57). This differs from (49) in one significant way-a different editing phrase is used, namely no, which has distinct properties from I mean. 30 (57) From yellow down to brown -no -that's red. (from Levelt 1989) Whereas I mean is naturally viewed as a syntactic constituent of the alteration, no cannot be so analyzed. There are two obvious ways to analyze no's role. The most parsimonious way would be to assimilate it to uses like (58), where the resolution is based on a contextually available polar question or proposition. 31 29 Though see Van Wijk & Kempen 1987, Cori, De Fornel & Marandin 1997 for evidence that this rule can be overridden, as well as our own discussion of this issue below.
30 An anonymous reviewer for Semantics and Pragmatics points to a potentially tricky (constructed) example involving I mean as editing phrase, namely (i).
(i) A:What flavour is it? B: It's bl-I mean, it's raspberry.
S/he suggests that "[I]t's not clear that there is a "sub-utterance" bl-in any interesting sense", thereby raising the issue how our approach would handle this, e.g., by considering what the speaker meant by it's bl-. We are not convinced that there isn't a sub-utterance to serve as an antecedent in this case. If B stops after bl, A could follow up and ask What did you start saying? or even Blackberry? or perhaps Blackcurrant? Rather, the grammatical type characterizing this sub-utterance is of necessity very underspecified, an underspecification that is, in principle, straightforward to effect in the typed sign-based grammar assumed here.
31 Recall the conversational rules (22a)  In order to adopt such an analysis we would need to motivate the emergence of the requisite polar question or proposition, e.g., Is u0 what I meant to say?. And the most obvious way of doing that would be to postulate a variant of (48), where this was the MaxQUD. There is nothing clearly wrong with such an approach, which would have the benefit of capturing the widespread use of negative discourse particles across languages for this function too. Nonetheless, apart from being somewhat ad hoc, this approach would also require some additional machinery to explain the coherence of the part of alteration following no. In the case of (58a), one can appeal to two explanations for why Mary is is uttered: for some cases Bill is accented and this justifies the independent assumption that the issue of Who is coming is MaxQUD; there are also (complementary) considerations of cooperativity relative to A's original query. The former consideration does not apply in the case of (58b), whereas the latter does with cooperativeness being replaced by goal persistence -persisting in producing the utterance for whatever reason that motivated it in the first place. An alternative analysis, which would avoid postulating an additional conversational rule, would involve instead positing an additional meaning for no, which is arguably needed for other uses such as: This would, in particular, allow no to be used to express a negative attitude towards an unintended utterance event. We could analyze (57) as involving the utterance brown. Following this, the rule (48) is triggered with the specification MaxQUD.q = what did A mean by FEC? and the FEC = brown. The analysis then proceeds like the earlier cases. Nonetheless, there is an additional issue which this case does bring out: the alteration (that's red) is sentential rather than directly parallel to the reparandum. This fits nicely with viewing the alteration as an answer to a question. It is indeed a counterexample to an overly syntactic view of self-correction, as embodied in Levelt's rule. And this also means that the repaired utterance is not, in fact, a grammatical utterance if one filters away the reparandum (*From yellow down to that's red). 32 And, hence, just as with a clarification interaction case such as (61), one has to assume an additional inference process that leads from the provision of the answer to the triggering of Pending replacement (Pending extension in the case of (61) after and), and interpreting the alteration as a short answer. What is interesting about this case is that the reparandum it is is not a constituent. This exemplifies our earlier suggestion that the elements of Pending need not always be viewed as constituents, but rather as elements of a chart.

Forward looking disfluencies
Forward Looking Disfluencies are distinct from their backward cousins in one significant way, on our view -they require an editing phrase, one whose import is the existence of a soon-to-be-uttered word. We will presently offer a lexical entry for um, inspired in part by Herbert Clark & FoxTree 2002 and Horne 2012 who argue that filled pauses are conventionally used interjections.
We specify FLDs with the update rule in (65) -given a context where the LatestMove is a forward looking editing phrase by A, the next speaker -underspecified between the current one and the addressee -may address the issue of what A intended to say next by providing a co-propositional utterance: 36 (65) Forward Looking Utterance Rule: (65) differs from its BLD analogue, in two ways. First, in that the preconditions involves the LatestMove having as its content what we describe as an FLDEdit move, which we elucidate somewhat shortly. Words like uh, thee will be assumed to have such a force, hence the utterance of such a word is a prerequisite for an FLD. A second difference concerns parallelism: for BLDs it is intuitive that parallelism exists between reparandum and alteration (with caveats, as with the example (57) etc.), given that one is replacing one sub-utterance with another that is essentially of the same type. However, for FLDs there is no such intuition -what is taking place is a search for the word after the reparandum, which has no reason to be parallel to the reparandum. Hence in our rule (65), the FEC is specified as the empty set.
To make things explicit, we assume that uh could be analyzed by means of the lexical entry in (66): 37 We demonstrate how to analyze (67) This means we could unpack (69) in a number of ways, most obviously by making explicit the utterance-to-be-produced u1, representing this roughly as in (70): This opens the way for a more "pragmatic" account of FLDs, one in which (65) could be derived rather than stipulated. Once a word is uttered that introduces FLDEdit(A,B,u0) into the context, in other words has an import like (70), this leads to a context akin to ones like (71). Such contexts licence inter alia elliptical constructions like sluicing and pronominal anaphora, tied as they are to an existential quantifier in the semantic representation: Indeed a nice consequence of (65), whether we view it as basic or derived, is that it offers the potential to explain cases like (72) where, in the aftermath of a filled pause, an issue along the lines of the one we have posited as the effect of the conversational rule (65) actually gets uttered: On our account such utterances are licenced because these questions are copropositional with the issue what did A mean to say after u0. This suggests that a different range of such questions will occur depending on the identity of (the syntactic/semantic type of) u0. 40 To test whether this is indeed the case, we ran a corpus study on the spoken language section of the BNC, using the search engine SCoRE Purver 2001 to search for all self addressed queries. 41 Representative examples are in (73) and the distribution is summarized in Table 1.  Table 1 indicates that self addressed queries occur in a highly restricted set of contexts, above all where an NP is anticipated and after the. Moreover, the distribution of such queries across these contexts varies manifestly: the anticipated NP contexts involve predominantly a search for a name or for what the person/thing is called, with some who-questions as well, whereas the post-the contexts only allow what questions, predominantly of the form what does X call Y ; anticipated location NP contexts predominantly involve where questions. The final two classes identified are somewhat smaller, so generalizations there are less robust; nonetheless, the anticipated predicative 40 We are grateful to an anonymous reviewer for alerting us to this issue and the related issue of whether any question, in principle, would do, as long as it would ultimately lead to the right answer. The reviewer's example was (i): (i) Well its er ( Table 1 Self addressed questions in disfluencies in the British National Corpus

9:54
Disfluencies as intra-utterance dialogue moves phrase and post-say context seem to involve quite distinct distributions from the other classes mentioned above. With respect to self addressed queries we have so far suggested that their coherence is accounted for directly on the basis of the conversational rule that licences utterances that are co-propositional with the question what did A mean to say after u0. Capturing in this way an analogy with the coherence of clarification questions by B after a (completed) utterance by A.
Self addressed queries also highlight another feature of KoS's dialogue semantics: the fact that a speaker can straightforwardly answer their own question, indeed in these cases the speaker is the "addressee" of the query. Such cases get handled easily in KoS because turn taking is abstracted away from querying: the conversational rule QSPEC, introduced earlier as (22b), allows either conversationalist to take the turn given the QUD-maximality of q. This contrasts with a view of querying derived from Speech Act Theory (e.g., Searle 1969) still widely assumed (see e.g., Asher & Lascarides 2003), where there is a very tight link to intentional categories of 2-person dialogue (. . . Speaker wants Hearer to provide an answer . . . Speaker does not know the answer . . . ).

Conclusions
In this paper we have developed an account of the semantics of disfluencies. Our account distinguishes two types of disfluencies. Backward Looking Disfluencies (BLDs) are disfluencies where the moment of interruption is followed by an alteration that refers back to an already uttered reparandum; Forward Looking Disfluencies (FLDs) are disfluencies where the moment of interruption is followed by a completion of the utterance which is delayed by a filled or unfilled pause (hesitation) or a repetition of a previously uttered part of the utterance (repetition). In both cases the mechanisms involved are minor refinements of rules proposed in earlier work to deal with clarificational interaction. The only substantive assumption we take on board relative to this earlier work is the assumption of incremental interpretation, the assumption that the grammar provides types which enable word-by-word parsing and interpretation. In fact, for cross-turn disfluencies, we demonstrate that our account applies without any assumptions of intrasentential incremental processing. The assumption of the need for incremental processing is one that is supported by a wealth of recent work in psycholinguistics and is incorporated in a number of current grammatical frameworks.

9:55
Ginzburg, Fernández, & Schlangen Our account, within the KoS framework, underpinned by the logical framework of Type Theory with Records, offers a precise explication of the roles of all key components of a disfluency, including editing phrases and filled pauses, capturing the parallelism between reparandum and alteration, while also allowing for instances where it is relaxed, as in sentential alterations. It directly predicts the possibility of self addressed questions, a class of queries that occurs in a very restricted range of syntactic/semantic contexts and that has not been described or analyzed in previous work. More generally, it provides a unified analysis of repair and correction that incorporates disagreement at illocutionary and metacommunicative levels, as well as selfcorrection across and within turns. There is no existing account with this coverage, to the best of our knowledge.
The current work is clearly "proof of concept". What remains to be done is to develop a detailed incremental semantics, as well as to consider in detail the range of disfluencies evinced in actual and potential conversations. It is important to do this across a wide range of languages given the range of cross-linguistic variation with regards to disfluency constructions surveyed in Section 2.2.4. Finding a principled explanation for the syntactic/semantic contexts in which self addressed questions occur, one which is presumably tied to common areas of difficulty in the utterance planning process, is also important. Indeed in line with the aforementioned work on cross-linguistic variation, we hypothesize that the syntactic/semantic contexts in which self addressed questions occur should vary significantly across languages. We hope to pursue all this in future work.
The account we provide has significant methodological import and forces a number of foundational issues to be addressed. As we have seen, disfluencies are an utterly ubiquitous phenomenon in language use that interacts with a variety of linguistic phenomena (including anaphora, ellipsis, implicature, discourse particles) and are subject to phonological, syntactic, and semantic constraints internal to individual languages. Nonetheless, they can only be analyzed in frameworks where metacommunicative interaction is integrated into the linguistic context. This partitions frameworks where such integration is effected (e.g., KoS, PTT (Poesio & H. Rieser 2010)) or at least addressed (e.g., Dynamic Syntax (Purver, Gregoromichelaki, et al. 2010)) from work in most current formal semantic accounts of context where such integration is missing (e.g., standard DRT (van Eijck & Kamp 1997), SDRT (Asher & Lascarides