Probabilities and logic in implicature computation: Two puzzles with embedded disjunction *

Sentences are standardly assumed to trigger scalar and ignorance impli-catures because there are alternative utterances the speaker could have said. The central question in modeling these inferences is thus: what counts as an alternative utterance for a given sentence in a given context? In this paper, I will present two families of novel empirical observations related to inference and deviance patterns of embedded disjunction, based on which I will argue that (i) probabilistic informative-ness plays a role in selecting the set of alternatives; and (ii) the role of prior world knowledge in evaluating probabilistic informativeness of alternatives is limited


Introduction
The sentence in (1-a) typically triggers the inference (scalar implicature) that (1-b) is false.
(1) a. John ate a cookie or a muffin.b.John ate a cookie and a muffin.
There are different approaches to how scalar implicatures of sentences such as (1-a) are computed (Grice 1975, Sauerland 2004, van Rooij & Schulz 2004, Schulz & van Sentence 2).We will see that the data can be accounted for with a probabilistic, but not with an entailment-based, notion of informativeness.I will, however, argue that not all prior world knowledge is incorporated in probabilistic informativeness evaluation.This argument relates to a second family of novel observations, which will be referred to as the deviance puzzle.An example of this puzzle is that (5-a) and other structurally similar sentences with an embedded disjunction are degraded.This is surprising: (5-a) should be able to convey the same meaning as (5-b), yet clearly it cannot be used to do so.

Deviance puzzle:
(5) a. #Each of these three girls is Mary, Susan, or Jane.b.These three girls are Mary, Susan, and Jane.
I will propose that the degraded status of (5-a) is due to the inferences it triggers which contradict prior world knowledge.This proposal exploits an independentlymotivated decoupling of prior world knowledge and implicature computation argued for in Magri 2009, Meyer 2013, Marty 2017, Marty & Romoli 2022: these authors have argued that implicatures of a sentence can be derived even when they contradict prior world knowledge, thus causing the sentence to be degraded.Importantly, this explanation of the deviance of (5-a) will have consequences for the proposal put forward to account for the inference puzzle: evaluating whether an alternative sentence satisfies a probabilistic informativeness criterion will need to be opaque to (certain varieties of) prior world knowledge.
Finally, I would like to make a note about the theoretical framework of this paper.The discussion will largely be couched in the grammatical (i.e., exhaustificationbased) approach to implicature computation (e.g., Chierchia et al. 2012).The motivation for this is that the solution to the deviance puzzle -that deviance of sentences such as (5-a) is caused by implicatures contradicting prior world knowledge -can be straightforwardly accommodated within the grammatical approach to implicature computation, while it is challenging for (neo-)Gricean approaches (e.g., Grice 1975, Sauerland 2004, Franke 2011, Bergen et al. 2016).It is worthwhile pointing out, however, that the solution I will propose for the inference puzzle -that alternatives need to be informative enough to enter implicature computation -can be in principle plugged into both grammatical and (neo-)Gricean approaches.

Inference puzzle
Suppose that what's being discussed is where Mary's friends are from.Consider the example (6), which will be referred to as ALL-20-OR1 henceforth.When the disjunction is in the scope of a universal quantifier as in ALL-20-OR, it typically triggers the distributive inferences in (6-a) (Chemla 2009, Chemla & Spector 2011, Crnič et al. 2015, Chierchia et al. 2012, Fox 2007, Klinedinst 2007, Spector 2006: a.o.).Accordingly, the ignorance inferences in (6-b) are typically absent: even though the speaker can in principle both believe ALL-20-OR and be in the epistemic state as in (6-b), we do not typically infer (6-b) upon hearing ALL-20-OR. 26) All 20 of Mary's friends are French or Spanish.
ALL-20-OR a. ⇝ At least one of them is French.⇝ At least one of them is Spanish.b. ̸ ⇝ The speaker is ignorant about whether at least one of them is French.̸ ⇝ The speaker is ignorant about whether at least one of them is Spanish.
A novel observation3 is that, strikingly, this inference pattern is sensitive to the cardinality of the restrictor of the universal quantifier.Consider the sentence (7), which will be referred to as ALL-2-OR henceforth.ALL-2-OR is minimally different from ALL-20-OR in that the cardinality of the restrictor of the universal quantifier is 20 in ALL-20-OR and two in ALL-2-OR.This change in cardinality reverses the inference pattern of ALL-2-OR as compared to ALL-20-OR: ALL-2-OR no longer seems to trigger the distributive inferences in (7-a).Instead, it is naturally interpreted as suggesting (7-b). 47) Both of Mary's friends are French or Spanish.
ALL-2-OR a. ̸ ⇝ At least one is French.̸ ⇝ At least one is Spanish.b. ⇝ The speaker is ignorant about whether at least one of them is French.
⇝ The speaker is ignorant about whether at least one of them is Spanish.
In other words, there is a relationship between the cardinality of the restrictor of the universal quantifier and inferences triggered by the sentence: the naturalness of distributive inferences is higher when the cardinality of the restrictor is large; the naturalness of ignorance inferences is higher when the cardinality of the restrictor is small.
Another novel observation is that, in addition to the effect of the cardinality of the restrictor, the inference pattern is also sensitive to the number of disjuncts in the sentence.Consider (8), which will henceforth be referred to as SIMPLE-DISJ, and ( 9), which will henceforth be referred to as COMPLEX-DISJ.The restrictor of the universal quantifier has the same cardinality in these two examples (four), but the number of disjuncts is different: there are two disjuncts in SIMPLE-DISJ, and four disjuncts in COMPLEX-DISJ.SIMPLE-DISJ is reported to be more naturally interpreted with distributive inferences than COMPLEX-DISJ; COMPLEX-DISJ is reported to be more naturally interpreted with ignorance inferences than SIMPLE-DISJ.(8) All four of Mary's friends are French or Spanish.

SIMPLE-DISJ (9)
All four of Mary's friends are French, Spanish, German, or Dutch.

COMPLEX-DISJ
In other words, there is a relationship between the number of disjuncts in a universally quantified sentence and inferences triggered by the sentence: the naturalness of distributive inferences is higher with fewer disjuncts; the naturalness of ignorance inferences is higher with more disjuncts.How does the interaction of the cardinality of the restrictor of the universal quantifier and the number of disjuncts influence the inference pattern?The contrasts between ALL-20-OR and ALL-2-OR and between SIMPLE-DISJ and COMPLEX-DISJ are compatible with at least two different empirical generalizations, in (10) and (11).More empirical work is required to determine which of the two generalization is on the right track.
the speaker doesn't necessarily believe that both of Mary's friends come from the same country), ALL-2-OR doesn't seem to naturally trigger distributive inferences.

early access
Milica Denić (10) Threshold generalization: When the ratio of the cardinality of the restrictor to the number of disjuncts exceeds some threshold T (T ≥ 1), distributive inferences are preferably derived, otherwise ignorance inferences are preferably derived.
(11) Gradient generalization: The larger the ratio of the cardinality of the restrictor to the number of disjuncts, the greater the preference for distributive instead of ignorance inferences.
Note again that judgments and generalizations above are reported for a context in which the question being discussed is where Mary's friends are from, which is a natural context for uttering ALL-20-OR and ALL-2-OR, as well as SIMPLE-DISJ and COMPLEX-DISJ.We however leave open the possibility that there may be contexts in which uttering these sentences leads to a different inference pattern.

Inference puzzle within the exhaustification approach to implicatures
We will start by introducing the exhaustification approach to implicature computation (Chierchia et al. 2012), and discuss the challenges posed for it by the inference puzzle.Importantly, the challenges are not specific to the exhaustification approach: we will see that any approach in which implicatures are a function of (solely) entailment relations between a sentence and its alternatives -in addition to possibly considerations of contextual relevance -will face similar challenges.5

Implicatures of unembedded disjunction
We have introduced two types of implicatures so far: distributive and ignorance inferences.Distributive inferences are usually assumed to be a type of scalar implicature.Let us see how scalar and ignorance inferences are derived according to the exhaustification approach to implicature derivation (Chierchia et al. 2012).

early access
a.The speaker is ignorant about whether John is French (Spanish).6b.John isn't French and Spanish.
According to the exhaustification approach to implicatures, scalar implicatures are not the result of pragmatic reasoning.They are assumed to be a part of the semantic content of the sentence as a result of the semantics of a silent exhaustivity operator exh.This operator is assumed to be present in the logical form of a sentence, as in ( 13). (13) [ The semantics of exh is given in ( 14).It is very similar to that of the focus operator only (Chierchia 2006, Fox 2007, Chierchia et al. 2012).In short, the semantic import of exh when it attaches to a sentence S is to negate alternatives activated by S, ALT (S).There is, however, a restriction on the alternatives which can be negated: only those alternatives which are innocently excludable (IE) can be negated.IE alternatives of a sentence S are those which appear in every maximal subset What alternatives does a sentence activate?Simplifying somewhat, the formal alternatives (FA) of a sentence S are standardly assumed to be obtained by replacing the constituents of S with another expression of the same syntactic category and of smaller or equal structural complexity (Katzir 2007, Fox & Katzir 2011).The final set of alternatives a sentence S activates ALT (S) in a given context are all those formal alternatives which are relevant in that context (cf.( 15)) (Fox & Katzir 2011). (15) Alternatives of a sentence S: ALT (S) = FA(S) ∩ {Y : Y expresses a contextually relevant proposition} Let us now see how the inferences of ( 12) are derived under this approach.We assume that ALT (( 12)) is in ( 16). ( 16) Relevant formal alternatives of (12): a. John is French (Spanish).b.John is French and Spanish.
(17) a. {John is French, John is French and Spanish} b. {John is Spanish, John is French and Spanish} The only IE alternative of ( 12) is thus 'John is French and Spanish', as it is the only alternative which appears in both (17-a) and (17-b).( 12), parsed as ( 13), is thus interpreted as in ( 18).
(18) John is French or Spanish and he isn't French and Spanish.
How about the derivation of ignorance inferences?One approach7 to ignorance inferences is pragmatic in nature: ignorance inferences of unembedded disjunction are a consequence of the maxim of quantity (Grice 1975), according to which the speaker should convey all of the relevant information they have.We will adopt the version of the maxim of quantity in ( 19), adapted from Fox 2007: (19) Maxim of quantity: If two sentences S and S ′ are both relevant to the topic of conversation, and S ′ is more informative than S, if the speaker believes both S and S ′ to be true, the speaker should say S ′ rather than S.
Let us see how ignorance inferences of (12) (parsed as ( 13)) follow from the maxim of quantity in (19).Assume that in a context in which ( 13) is uttered and relevant, the sentence A = John is French is also relevant.Assume further, together with von Fintel & Heim (1997), Fox (2007), Fox & Katzir (2011), that relevance is closed under conjunction and negation: this means that (13) ∧A and (13) ∧¬A are also relevant.As (13) ∧A and (13) ∧¬A are more informative than (13) (because they asymmetrically entail (13)), the maxim of quantity licenses the inferences in (20). (20) The speaker doesn't believe (13) ∧A.
Assuming that the speaker believes their own utterance (13) (maxim of quality), the inferences in (20) amount to ignorance inferences about A = John is French, as in ( 21): (21) The speaker doesn't believe that John is French and the speaker doesn't believe that John is not French (i.e., the speaker is ignorant about whether John is French).
Ignorance inferences about B = John is Spanish would be derived in a similar vein from (13).More generally, assuming as above that relevance is closed under conjunction and negation, ignorance inferences are predicted to be derived about any relevant sentence S ′ whose truth is not settled by the utterance S. The reason is that if S is relevant and S ′ is relevant, so is S ∧ S ′ , as well as S ∧ ¬S ′ .As both of these are more informative than S, the maxim of quantity licenses inferences that the speaker doesn't believe S ∧ S ′ or S ∧ ¬S ′ : together with the maxim of quality this amounts to the ignorance inference about S ′ .

Embedded disjunction: A problem
Let us now see what implicatures are predicted under the exhaustification approach for ALL-20-OR and for ALL-2-OR.
The predictions of any theory of implicatures for a given sentence depend on the alternatives that the sentence is assumed to activate.ALL-20-OR and ALL-2-OR have two scalar items, both of which can activate alternatives: the universal quantifier (all, both), and the disjunction or.If both of these scalar items activate their alternatives, the set of formal alternatives consists of all sentences in which the universal quantifier, the disjunction, or both, are replaced by alternative expressions they activate.Concretely, for ALL-20-OR and ALL-2-OR, this means that the alternatives are in ( 22) and ( 23) respectively: we will henceforth refer to the set of alternatives in ( 22) and ( 23) as ALT-all-or.For presentational purposes, we will focus only on alternatives without connectives, as the alternatives with connectives don't play a role in distributive and ignorance inference derivation.Another possibility is that only one of the two scalar items activates alternatives.If only the disjunction activates its alternatives, the formal alternatives of ALL-20-OR and ALL-2-OR are in ( 24) and ( 25) respectively. 9We will henceforth refer to the set of alternatives in ( 24) and (25) as ALT-or.No alternative appears in all three sets in (26), hence no alternative is IE, and hence no distributive inferences are predicted.Assuming that the alternatives 'All 20 are French (Spanish)', 'Some are French (Spanish)', are relevant, ignorance inferences about them are derived as a consequence of the maxim of quantity in (19).The same applies to ALL-2-OR.
Let us now see what the predictions are if ALL-20-OR activates the alternatives ALT-or.All alternatives in ALT-or can be negated consistently with ALL-20-OR, that is, they are all IE.This would result in distributive inferences: ALL-20-OR together with the negation of (24-a) entails that some of Mary's friends are Spanish, and ALL-20-OR together with the negation of (24-b) entails that some of Mary's friends are French.The same applies to ALL-2-OR. 9Note that if only the universal quantifier activates its alternatives, the only alternative of ALL-20-OR and ALL-2-OR would be 'Some is French or Spanish': if the restrictor of the universal quantifier is non-empty, this alternative is entailed by the original sentence, so no implicatures are derived. 10The actual set of alternatives ALT-or in (24) would contain All 20 are French and Spanish, which can be shown to be IE with no consequences for distributive or ignorance inferences.Similarly for ALT-or in (25).
11 Fox (2007) and Magri (2009) assume that the ALT-or alternatives are the only alternatives that sentences such as ALL-20-OR and ALL-2-OR activate; see also the discussion in Bar-Lev & Fox (2017), fn. 7.

early access
How do these predictions match the actual inferences of ALL-20-OR and of ALL-2-OR?We have seen that ALL-2-OR preferably triggers ignorance inferences while ALL-20-OR preferably triggers distributive inferences.Assuming that sentences ALL-20-OR and ALL-2-OR activate the alternatives ALT-all-or, correct inferences are predicted for ALL-2-OR but not for ALL-20-OR.Assuming alternatively that sentences ALL-20-OR and ALL-2-OR activate the alternatives ALT-or, correct inferences are predicted for ALL-20-OR but not for ALL-2-OR. 12o summarize, ALL-20-OR triggers distributive inferences more naturally than ALL-2-OR; ALL-2-OR triggers ignorance inferences more naturally than ALL-20-OR.The exhaustifiction approach to implicature derivation, as it stands, cannot capture this difference.The reason for this is fully general.According to the exhaustification approach to implicatures, similarly to many other approaches (e.g., Grice 1975, Sauerland 2004), implicatures are a function of the entailment relations between a sentence and its alternatives (in the exhaustification approach, this follows from the semantic entry of exh).Crucially, to the extent that ALL-20-OR and ALL-2-OR activate comparable sets of alternatives, they stand in the same entailment relations to them, and will thus necessarily be predicted to have the same implicatures.

How about relevance?
In the previous section, we have explained why the contrast in the inference pattern of ALL-20-OR and ALL-2-OR is problematic for the exhaustification-based approach, as well as for any approach in which implicatures are a function of the entailment relations between a sentence and its alternatives: if ALL-20-OR and ALL-2-OR activate comparable sets of alternatives, they stand in the same entailment relations to them, and will thus necessarily be predicted to have the same implicatures.
It is, however, standardly assumed that contextual relevance plays a role in which alternatives enter implicature computation.In other words, some of the formal alternatives can sometimes be 'ignored' when implicatures are computed because they don't convey contextually relevant information -this is called alternative pruning (Horn 1972, Fox & Katzir 2011, Katzir 2014, Crnič et al. 2015, Bar-Lev 2018).This assumption is incorporated in the definition of the set of alternatives of a sentence S, ALT (S), as the set of formal alternatives which express contextually relevant propositions (cf.(15), repeated below): formal alternatives expressing contextually irrelevant propositions are pruned from ALT (S).

early access
Milica Denić (15) Alternatives of a sentence S: ALT (S) = FA(S) ∩ {Y : Y expresses a contextually relevant proposition} Could it then be that ALL-20-OR and ALL-2-OR do not activate comparable sets of alternatives because of contextual relevance, which would eliminate the problem for the exhaustification approach?
Recall the discussion from Section 3.2: if ALL-20-OR and ALL-2-OR activate alternatives ALT-all-or, ignorance inferences are derived, and if they activate alternatives ALT-or, distributive inferences are derived.Importantly, note that ALT-all-or is a superset of ALT-or.This allows for the following theoretical possibility.ALL-20-OR and ALL-2-OR have as their formal alternatives ALT-all-or.When no alternatives are pruned from this set of formal alternatives, ignorance inferences are derived.When alternatives obtained by replacing the universal quantifier with the existential -that is, 'Some are French', 'Some are Spanish' (we will refer to these as existential alternatives henceforth) -are pruned, distributive inferences are derived.
Crucially, if existential alternatives could be preferably pruned from the alternative set of ALL-20-OR but not from the alternative set of ALL-2-OR (i.e., if the alternatives of ALL-20-OR and ALL-2-OR were preferably as in Table 1 and Table 2 respectively), this would resolve the tension between the exhaustification approach to implicatures and the contrast between ALL-20-OR and ALL-2-OR.However, as discussed in the remainder of this section, such a contrast in pruning preferences is not predicted by existing approaches to pruning due to relevance considerations.
Namely, it has been recognized that we cannot prune just any alternative: that would create a massive overgeneration problem (Fox & Katzir 2011).Take for instance unembedded disjunction, as in ( 12): John is French or Spanish.
Recall that its formal alternatives are {John is French, John is Spanish, John is French and Spanish}.If we could simply prune the alternative 'John is French', (12) would have as implicature that John is not Spanish.This implicature arguably never arises.This example, among many others, motivated developing explicit proposals about what kind of alternatives can be pruned due to relevance considerations.An influential proposal by Fox & Katzir (2011) is that the set of formal alternatives FA(S) can be restricted via pruning to the set ALT (S) if and only if the following conditions are met: a.
No member of FA(S) \ ALT (S) is exhaustively relevant given ALT (S) (where p is exhaustively relevant given ALT (S) iff exhaustifying p with respect to ALT (S) is in the Boolean closure of ALT (S)) This proposal allows for ALT-all-or to be restricted to ALT-or by pruning for both ALL-20-OR and ALL-2-OR13 .Crucially, however, there is nothing in the proposal

early access
Milica Denić which would predict that the restriction from ALT-all-or to ALT-or should be more often done with ALL-20-OR than with ALL-2-OR.
In a different approach to constraints on pruning, Crnič et al. (2015) propose that one can only prune alternatives of S if the exhaustification of S with respect to the set of alternatives after pruning results in a weaker interpretation than the exhaustification of S with respect to the set of alternatives before pruning.This approach cannot account for the contrast between ALL-2-OR and ALL-20-OR either: pruning existential alternatives results in distributive inferences, not pruning them results in ignorance inferences.The two interpretations are logically independent: the constraint on pruning by Crnič et al. (2015) is thus incompatible with restricting ALT-all-or to ALT-or via pruning.Bar-Lev (2018) argues for a stronger version of Crnič et al. 2015 (he argues that additional criteria need to be satisfied for pruning to be possible); his proposal is thus incompatible with restricting ALT-all-or to ALT-or via pruning for the same reason as that of Crnič et al. (2015).

Interim conclusion
Let's take stock.The exhaustification approach -and any approach in which implicatures are a function of entailment relations between a sentence and its alternatives -coupled with existing approaches to pruning due to relevance considerations, cannot account for the contrast between ALL-2-OR and ALL-20-OR or other observations pertaining to the inference puzzle.

Proposal and the inference puzzle
In this section, I will propose that evaluation of informativeness of sentences is incorporated into alternative pruning, which can account for the inference puzzle.
Recall that the contrast between ALL-20-OR and ALL-2-OR could be derived if for the latter but not for the former it were possible to preferably derive the set of alternatives ALT-or from ALT-all-or by pruning existential alternatives.
The proposal we will put forward that achieves this has two components.The first component is that alternative pruning is, in addition to contextual relevance, also sensitive to how informative alternatives are.The set of alternatives of a sentence S, ALT (S), is thus defined in (27).This is not in the Boolean closure of {All are French or Spanish, All are French, All are Spanish, All are French and Spanish} -in other words, Some are French is not exhaustively relevant with respect to this set of alternatives.It can similarly be shown that Some are Spanish, as well as Some are French and Spanish and Some are French or Spanish, are not exhaustively relevant with respect to {All are French or Spanish, All are French, All are Spanish, All are French and Spanish}.

early access (27)
Alternatives of a sentence S: proposal ALT (S) = FA(S) ∩ {Y : Y expresses a contextually relevant proposition} ∩ {Z : Z expresses an informative proposition} The second component of the proposal states that informativeness employed for pruning is probabilistic: the more unlikely the proposition expressed by an alternative sentence is, the more informative the alternative sentence is (cf.Shannon 1948).For presentational purposes, we start with a very simple version of the proposal which only cares about the informativeness of the alternatives, and not about the informativeness of the original utterance. (28) Informative propositions and pruning: proposal (to be revised) Let A be a formal alternative of S, and P(A) the probability that A is true.The probability of pruning A from ALT (S) increases with P(A) (and thus decreases with the informativeness of A).
We will for the time being adopt a version of the proposal in (28) according to which the function mapping P(A) to the probability of pruning A from ALT (S) is strictly increasing, and discuss other options at a later stage.
An intuitive reason for why (28) might hold of pruning is that the more likely an alternative A is to be true, the less pressure there is for the speaker to utter A, and thus the less pressure there is to consider it as an alternative utterance the speaker could have said instead of their original utterance S.
Let us first see how this proposal accounts for the contrast between ALL-20-OR and ALL-2-OR.In particular, for some domain size n, for a sentence of the form ( 29 Let us make the intuitively plausible assumptions that (i) for any domain of individuals D, for any predicate A, the larger the cardinality of D, the more likely it is that someone in D is in A, and that (ii) if |D| > 1, then it is more likely that someone in D is in A than that everyone in D is in A. These assumptions plausibly hold when the interlocutors possess little prior world knowledge about individuals in the domain D (apart from how many of them there are) and about the property A. Obviously, these assumptions are not always met in the actual world.We will, however, adopt them for the time being; we will see in Section 7 arguments that implicature computation proceeds as if these assumptions are met (i.e., as if access to prior world knowledge is limited).
Under these assumptions, an alternative of the form 'Some of the n individuals are A' is more likely to be true for larger ns, and therefore it is more likely to be pruned from some set of alternatives for larger ns.In addition, a sentence of the form 'All of the n individuals are A' is less likely to be pruned from some set of alternatives than the sentence of the form 'Some of the n individuals are A' as soon as the cardinality of the domain of individuals is larger than 1.It thus follows that we are more likely to end up with the restricted set of alternatives which yields distributive inferences for larger ns than for smaller ns, and that we are more likely to end up with the full set of alternatives which yields ignorance inferences for smaller ns than for larger ns.
Concretely, this means that ALL-20-OR is more likely to have the set of alternatives as in Table 1 than ALL-2-OR is to have a parallel set of alternatives: with such a set of alternatives, distributive inferences are derived.Furthermore, ALL-2-OR is more likely to have the set of alternatives as in Table 2 than ALL-20-OR is to have a parallel set of alternatives: with such a set of alternatives, ignorance inferences are derived (cf.Section 3.2).Distributive and not ignorance inferences are thus more likely to be derived with ALL-20-OR than with ALL-2-OR, and ignorance and not distributive inferences are more likely to be derived with ALL-2-OR than with ALL-20-OR.
The proposal in (28) can thus capture the contrast between ALL-20-OR and The problem of the proposal in ( 28) is the following: we have related the pruning of 'Some of Mary's friends are French' to P(Some of Mary's friends are French), which in turn depends on the total number of Mary's friends, but nothing we have said so far relates the pruning of 'Some of Mary's friends are French' to the number of disjuncts in the original utterance.In other words, whether alternatives such as 'Some of Mary's friends are French' are pruned and hence whether distributive or ignorance inferences are derived is expected to vary as a function of the total number of Mary's friends, rather than as a function of the number of disjuncts in the sentence.
A very minor refinement of the proposal, in (30), solves this problem.
are omitting from Tables 3 and 4 alternatives in which disjunction is replaced by conjunction, which can be shown to be IE with no consequences for distributive or ignorance inferences.

early access
Milica Denić (30) Informative propositions and pruning: proposal (final) Let A be a formal alternative of S, and P(A|S) the conditional probability that A is true given that S is true.The probability of pruning A from ALT (S) increases with P(A|S) (and thus decreases with the informativeness of A given S).
As before, we will for the time being adopt a version of the proposal in (30) according to which the function mapping P(A|S) to the probability of pruning A from ALT (S) is strictly increasing, and discuss other options at a later stage.
An intuitive reason for why a constraint on pruning such as (30) might hold is related to what has been said to conceptually motivate (28): the more likely an alternative A is to be true given the utterance S (the closer it is to being entailed by S), the less pressure there is for the speaker to utter A instead of S, and thus the less pressure there is to consider it as an alternative utterance the speaker could have said instead of their original utterance S.
How does the proposal in (30) capture the contrast between ALL-20-OR and ALL-2-OR, and between SIMPLE-DISJ and COMPLEX-DISJ?
Let us start with the contrast between ALL-20-OR and ALL-2-OR.We have already seen that, under the intuitive assumptions discussed above, as n increases, so does the P(Someone is A | n people are A or B).Alternatives such as 'Some of Mary's friends are French (Spanish)' are thus more likely to be pruned from ALT(ALL-20-OR) than from ALT(ALL-2-OR).This will result in distributive inferences being more likely to be derived in the case of ALL-20-OR than in the case of ALL-2-OR, and ignorance inferences being more likely to be derived in the case of ALL-2-OR than in the case of ALL-20-OR.
Let us now see how the revised proposal also captures the difference between SIMPLE-DISJ and COMPLEX-DISJ.
Let us make another intuitively plausible assumption15 that in a domain D of n people it is less likely that there is someone in D who is A 1 when it is known that everyone in D is A 1 , A 2 , A 3 ,. . .or A n , than when it is known that everyone in D is A 1 , A 2 , A 3 ,. . .or A m , with m < n.
Because of this we may conclude that the likelihood of pruning alternatives of the form 'Some of the n individuals are A' decreases as the number of disjuncts of the original sentence increases (again, this is true for any A).It thus follows that the set of alternatives that we will end up with is more likely to be the set without the existential alternatives (i.e., without the alternatives such as 'Some of the n individuals are A') for sentences with smaller numbers of disjuncts.Concretely, this means that we will be more likely to derive distributive inferences for SIMPLE-DISJ than for COMPLEX-DISJ, and more likely to derive ignorance inferences with COMPLEX-DISJ than with SIMPLE-DISJ.
The proposal in (30) thus accounts for the contrast between ALL-20-OR and ALL-2-OR, and between SIMPLE-DISJ and COMPLEX-DISJ.There is, however, an important piece in the proposal that is left underspecified.How exactly does P(A|S) (conditional probability that the alternative A is true given that the sentence S is true) map to the probability of pruning A from the set of alternatives of S? The formulation of the proposal in (30) states that the function from the first set of probabilities to the second set of probabilities is some increasing function.We have for concreteness assumed that the probability of pruning A from the set of alternatives of S strictly increases with P(A|S) (for instance, it may increase linearly, that is, P(pruning A from ALT (S)) = a • P(A|S) + b, with a > 0).However, other increasing functions are compatible with the data we have so far, too.For instance, there may be a threshold θ , such that iff P(A|S) ≥ θ , A is pruned from the set of alternatives of S. That would entail, for instance that, P('Someone is French'|'Both are French or Spanish') is lower than θ when 'Both are French or Spanish' triggers ignorance inferences.To be able to specify this part of the proposal -i.e., how exactly P(A|S) maps to the probability of pruning A from the set of alternatives of S -experimental and computational work is necessary.There are a lot of outstanding questions to be pursued.In addition to determining the nature of the mapping (e.g., is it a strictly increasing linear or nonlinear function, or some other increasing function?),how are parameters of the function (i.e., θ , a, b) computed?Can they be affected by certain aspects of the context, and if so, which?Can they be affected by aspects of the semantic content of the sentence, and if so, which?Do they vary across different people?
There are two important empirical questions brought up in Section 2 that relate to this.
First, in Section 2, we discussed inferences of ALL-20-OR and ALL-2-OR in the context where what's under discussion is where Mary's friends are from.We however left open the possibility that there may be contexts in which these sentences have a different inference pattern.A different inference pattern could be due to relevance considerations (e.g., there may be contexts where no alternatives are relevant so that neither ALL-20-OR nor ALL-2-OR trigger any inferences), but it could also be due to the informativeness criterion shifting with context.For instance, if a context can be found where ALL-20-OR preferably triggered ignorance inferences, this would suggest that some of the parameters discussed above can indeed be affected by context (e.g., in such a context, existential alternatives of ALL-20-OR would be sufficiently informative to qualify for implicature computation).
Second, we discussed in Section 2 whether the judgments about distributive or ignorance inferences of universally quantified sentences with embedded disjunction are better described by threshold or by gradient generalization.According to the threshold generalization, when the ratio of the cardinality of the restrictor to the number of disjuncts exceeds a certain threshold T (T ≥ 1), distributive inferences are preferably derived, otherwise ignorance inferences are preferably derived.According to the gradient generalization, the larger the ratio of the cardinality of the restrictor to the number of disjuncts, the greater the preference for distributive instead of ignorance inferences.The proposal we put forward is in principle compatible with both of these generalizations: which generalization turns out to be correct will constrain the set of possible functions mapping P(A|S) to the probability of pruning A from the set of alternatives of S. Investigating empirically how exactly the judgments vary with the cardinality of the restrictor to the number of disjuncts ratio is thus crucial for inferring properties of the function mapping P(A|S) to the probability of pruning A from the set of alternatives of S. Specifying this function, and answering interesting conceptual and empirical questions such a function would raise, will remain open for future work.
To summarize, the proposal in (30) accounts for the two aspects of the inference puzzle: the influence of the cardinality of the restrictor and of the number of disjuncts on distributive and ignorance inference derivation in quantified sentences with embedded disjunction.As a reminder, the proposal relies on the assumption that alternatives can be pruned under certain considerations (Horn 1972, Fox & Katzir 2011, Katzir 2014, Crnič et al. 2015, Bar-Lev 2018).The core of the proposal is that alternative pruning is sensitive to the informativeness of an alternative conditioned on the original utterance.The proposal is relatively independent of the specifics of the mechanism which derives implicatures: we have demonstrated how it can be implemented with the exhaustification-based approach to implicature derivation, but it is in principle also compatible with other approaches to implicature derivation.

Deviance puzzle
We will now move to the deviance puzzle.
When a disjunction of definite noun phrases is embedded in the scope of a universal quantifier, the result is sometimes unexpectedly deviant.The deviance depends on the predicate that embeds the disjunction.To see this, consider (31), which will be referred to as DEVIANT-BE, (32), which will be referred to as NON-DEVIANT-CALLED, (33), which will be referred to as DEVIANT-WRITE, and (34), which will be referred to as NON-DEVIANT-READ.When the predicate in question is the identity copula as in DEVIANT-BE or the predicate to write in DEVIANT-WRITE, the result is deviant.When the predicate in question is minimally different, as the predicate to be called in NON-DEVIANT-CALLED or the predicate to read in NON-DEVIANT-READ, the result is acceptable.To see why the deviance of DEVIANT-BE and of DEVIANT-WRITE is surprising, note that DEVIANT-BE is contextually equivalent (in the sense of Stalnaker 1973Stalnaker , 1978Stalnaker , 2002: a.o.) : a.o.) to ( 35), assuming that it is common knowledge that Mary, Susan, and Jane have to be three different individuals.Likewise, DEVIANT-WRITE is contextually equivalent to (36), assuming that it is common knowledge that for any book there can be exactly one singular or plural individual who wrote it 16 .Yet, surprisingly, DEVIANT-BE cannot be naturally used to convey the meaning of ( 35), and neither can DEVIANT-WRITE to convey the meaning of ( 36).
(35) One of those three girls is Mary, another one is Susan, and yet another one is Jane.
(36) One of those three writers wrote Anna Karenina, another one wrote Germinal, and yet another one wrote Harry Potter.
Note that the deviance observed in DEVIANT-BE and DEVIANT-WRITE is not specific to each: the pattern is the same with every and all. 17e have observed the deviance of an embedded disjunction with certain predicates, such as the identity copula or to write, but not with others, such as to be called or to read.Which property makes a predicate pattern with one group or the other?We will argue that the essential property that the identity copula and to write have in common is that, when their internal arguments are, respectively, a specific individual (e.g., to be Mary) and a specific book (e.g., to write Anna Karenina), they can only be true of a unique (singular or plural) individual given common knowledge.In other words, given common knowledge, these (complex) predicates necessarily denote a singleton (henceforth, they are singleton-denoting).
To see that the identity copula when its internal argument is from a domain of individuals and to write when its internal argument is from a domain of books are singleton-denoting but not the predicates to be called when its internal argument is from a domain of names and to read when its internal argument is from a domain of books, observe that the continuations in (37-a) and (37-c) sound contradictory, but not in (37-b) and in (37-d).
( 37 To see that whether a (complex) predicate is singleton-denoting is indeed relevant for the deviance puzzle, consider what happens when to write and its internal argument do not form a singleton-denoting predicate.An example of this is when to write's internal argument is from a domain of letters of the alphabet (e.g., to write the letter A).To write the letter A is not singleton-denoting, and note that (38), which is structurally similar to DEVIANT-BE and DEVIANT-WRITE, is not deviant (it could perfectly be used in a situation in which, for instance, each of John's three students wrote a number of letters on the board): (38) Each of John's three students wrote the letter A, the letter D, or the letter K on the board.

Proposal and deviance puzzle
Why would the property of predicates being singleton-denoting be relevant for the deviance pattern of universally quantified sentences with embedded disjunction?I will suggest that this is connected to the inference pattern we have observed in Section 2. We have seen that quantified sentences with embedded disjunction trigger ignorance inferences under certain conditions.Consider now what happens if DEVIANT-BE triggers ignorance inferences, which are in fact expected given the ratio between the cardinality of the restrictor and the number of disjuncts in these sentences (cf.empirical generalizations in Section 2).These ignorance inferences are paraphrased in (39). (39) The speaker is ignorant about whether at least one of these three girls is Mary (Susan, Jane).
These inferences are problematic for the following reason.Assuming that the speaker believes their own utterance in DEVIANT-BE, due to the the fact that to be Mary (Susan, Jane) are singleton-denoting, the speaker cannot be in the ignorance state in (39) -they must know that one of the girls is Mary, that one is Susan, and that one is Jane.(Ignorance inferences of DEVIANT-WRITE would be similarly problematic.) On the other hand, ignorance inferences of NON-DEVIANT-CALLED are not problematic.As predicates to be called Mary (Susan, Jane) are not singletondenoting, the speaker can believe their utterance NON-DEVIANT-CALLED and still be ignorant about whether at least one is called Mary (Susan, Jane) (e.g., the speaker may consider it possible that all three of them are called Mary, that all three of them are called Susan, and that all three of them are called Jane).(Ignorance inferences of NON-DEVIANT-READ would be similarly non-problematic.) The ignorance inferences that sentences DEVIANT-BE and DEVIANT-WRITE might trigger thus contradict common knowledge, while those of NON-DEVIANT-CALLED and NON-DEVIANT-READ do not.I propose that this is the reason why DEVIANT-BE and DEVIANT-WRITE are deviant (cf.( 40)). (40) Deviance due to ignorance inferences: Sentences DEVIANT-BE and DEVIANT-WRITE are deviant because they trigger ignorance inferences which contradict common knowledge.
The core assumption of the proposal in ( 40) is that implicatures generally, and ignorance inferences specifically, are derived blindly from common knowledge.What is meant by this is that, once the set of alternatives is determined, implicatures are derived even if they contradict common knowledge (note however that there are ways for common knowledge to influence which alternatives feed implicature computation due to relevance or salience consideration, for which there is ample empirical evidence (Matsumoto 1995, Fox & Katzir 2011, Degen & Tanenhaus 2016: a.o.)).The idea that the procedure which derives implicatures is blind to common knowledge has been in fact already defended by Magri (2009) for the case of scalar implicatures in order to account for the deviance of (41) (cf.also Meyer 2013, Marty 2017, Marty & Romoli 2022 for related data and ideas).
(41) #Some Italians come from a warm country.
The crux of Magri's proposal is that (41) is deviant because the conjunction of (41) and its scalar implicature in (42) contradicts common knowledge.
(42) Not all Italians come from a warm country.
Additionally, there is other data suggesting that ignorance inferences may result in the deviance of sentences which triggered them when they contradict common knowledge.One such data point relates to the ignorance inferences of the modified numeral at least n.We provide in (43) an example from Buccola & Haida 2019; similar empirical observations have been first made by Nouwen (2010).Given the context in ( 43), (43-a) and (43-b) are contextually equivalent; yet (43-a) is deviant and (43-b) is not.A possible explanation for why (43-a) is deviant is because it triggers the inference that the speaker is ignorant about whether Ann scored exactly 3 points, which contradicts common knowledge.
(43) Context: Ann played a card game in which, given the rules, the final score is always an even number of points.Bob knows this, and reports to Carl: a. #Ann scored at least 3 points.b.Ann scored at least 4 points.
In light of the data from this section, a proposal aiming to account for the inference pattern in Section 2 needs to allow for implicatures to be derived blindly to common knowledge (or alternatively to put forward a different account for the data presented in this section).The proposal for how the deviance puzzle is to be resolved thus has important implications for certain aspects of the proposal for the inference puzzle developed in Section 4, and for implicature derivation more generally.We discuss these implications in turn in the following section.
7 Combining proposals for the inference and deviance puzzles 7.1 Grammatical approach to ignorance inferences Within the context of the inference puzzle, we have been working with the grammatical approach to scalar implicatures, but with the pragmatic approach to ignorance inferences as in Fox 2007.However, within the context of the deviance puzzle, the derivation of ignorance inferences which is blind to common knowledge (cf.Sections 5 and 6) suggests that they too need to be derived in grammar (cf.Meyer 2013, 2014, Buccola & Haida 2019).According to these approaches, ignorance inferences, just like scalar implicatures, end up being part of the semantic content of the sentence.
While defending a specific version of a grammatical theory of ignorance inferences is beyond the scope of this paper, we will for concreteness discuss how distributive and ignorance inferences of a sentence such as All are A or B can be derived within the grammatical theory of ignorance implicatures put forward in early access Meyer 2013Meyer , 2014, and how our pruning proposal from Section 4 can be combined with this theory to account for the contrast between ALL-20-OR and ALL-2-OR.Simplifying somewhat, according to Meyer (2013Meyer ( , 2014)), there is a silent modal operator K speaker in language, and every asserted sentence φ is parsed as 'K speaker φ '.The meaning of 'K speaker φ ' can be informally paraphrased as 'the speaker believes that φ ' (and the formalization is in (44)).( 44) iff given the beliefs of the speaker in w, w ′ could be the actual world Meyer (2013Meyer ( , 2014) ) further assumes that the exhaustivity operator exh can attach to any propositional node, including above K speaker .Finally, while Meyer adopts the structural approach to alternatives as in Katzir 2007, Fox & Katzir 2011,  All the alternatives in ( 46) are IE, amounting to inferences that the speaker doesn't believe that all are A, that all are B, that some are A, and that some are B (note that for any X, exh(X) is at least as strong as X, so not believing X entails not believing exh(X)).Together, these inferences amount to ignorance inferences: if the speaker believes that all are A or B, but doesn't believe that all are B, and doesn't believe that some are A, they must be ignorant about whether some are A and about whether all are B (similarly for 'some are B' and 'all are A').
On the other hand, if existential alternatives are pruned from the set of alternatives of the embedded exh, distributive inferences are derived in the scope of K speaker in (45) for reasons discussed in Section 3.2, and ignorance inferences cannot be derived at the matrix level.
If we assume that sentences such as ALL-20-OR and ALL-2-OR are by default parsed as in ( 45) in contexts where what's under discussion is where Mary's friends are from, the approach to ignorance inferences in Meyer 2013, 2014 is compatible with the pruning proposal developed in Section 4.
Note however that to obtain this result with the parse in ( 45), we have assumed that the embedded exh can be deleted from alternatives (cf. the alternative set in ( 46)), which is in line with the approach to formal alternatives in Katzir 2007.Whether this is possible has been questioned however (e.g., Meyer 2013 -but not Meyer 2014 -proposes this shouldn't be possible; see also discussion about Crnič et al. 2015 in Section 8.1.3, andabout Bar-Lev &Fox 2016 in Section 9.3).
There may be other ways to combine the grammatical theory of ignorance implicatures by Meyer (2013Meyer ( , 2014) ) or other grammatical theories of ignorance implicatures for that matter, with the proposal developed in Section 4: the discussion above is intended as an illustration, rather than a final proposal.

Obligatory implicatures
If problematic inferences are behind the deviance of sentences such as DEVIANT-BE and DEVIANT-WRITE, in addition to being derived in grammar, it must be the case that these inferences are obligatory.Within the exhaustification approach to implicatures, this would entail that sentences are obligatorily parsed with the exhaustivity operator exh at the matrix level.This assumption is arguably needed for any account aiming to explain deviance of certain sentences as a consequence of their problematic inferences.For instance, this assumption is already present in Magri 2009, who argued that sentences such as (41), repeated here, are deviant due to the problematic scalar implicatures (Not all Italians come from a warm country).
(41) #Some Italians come from a warm country.
There is a related challenge for these accounts: even if the implicature computing mechanism is triggered whenever we interpret a sentence, why can't pruning the alternatives which would lead to problematic inferences save the sentence?In other words, why can't pruning the alternative All Italians come from a warm country from the set of alternatives of (41), or pruning the existential alternatives from the set of alternatives of DEVIANT-BE or DEVIANT-WRITE, save these sentences from deviance?
This is an important challenge to which we don't have a complete answer.There are two directions one could pursue.
One option would be to try to propose additional constraints on what type of alternatives can or cannot be pruned: for instance, Magri (2009) proposes one such constraint which prohibits pruning the alternative All Italians come from a warm country when we compute implicatures of (41).
Another possibility is that there is something about the architecture of implicature computation that disallows potential deviance of the sentence to influence pruning.In other words, pruning may be guided solely by relevance and informativeness considerations, and the information about whether the implicatures of a sentence contradict common knowledge might not be accessible for guiding the decision about which alternatives to prune.

Probabilistic informativeness and blindness
If sentences such as DEVIANT-BE, DEVIANT-WRITE, or Magri's cases such as (41) are indeed deviant due to implicatures they trigger, there is another important consequence for the proposal that probabilistic informativeness considerations guide pruning.This is that not only does the derivation of ignorance inferences have to proceed in a blind manner, but the mechanism which calculates the informativeness of alternatives must be blind to common knowledge too.The reason is simply that, given common knowledge, P(All Italians come from a warm country | Some Italians come from a warm country) = 1: this means that the alternative 'All Italians come from a warm country' should be pruned due to its lack of informativeness from ALT (Some Italians come from a warm country), and that the problematic implicature should not arise.
Likewise, informativeness-based pruning that is blind to common knowledge is necessary to account for the deviance of DEVIANT-BE and DEVIANT-WRITE within our approach.Let us see why on the example of DEVIANT-BE (similar considerations apply to DEVIANT-WRITE).Recall that in order to derive ignorance inferences of DEVIANT-BE, the alternatives in (47) need to not be pruned from ALT (DEVIANT-BE). (47) Someone is Mary (Susan, Jane), Someone is Mary or Susan (Susan or Jane, Mary or Jane) However, given common knowledge P(At least one (i.e., some) of the girls is Mary| Each of the girls is Mary, Susan, or Jane) = 1, and similarly for all of the alternatives from (47).
This means that, if our proposal is on the right track and alternatives in (47) aren't pruned from the alternative set of DEVIANT-BE, the computation of informativeness according to the proposal in (30) has to be blind to (most of) common knowledge: the only things that seem to matter are domain size and logical words (quantifiers, disjunctions etc.) in a sentence.In other words, this means that This of course raises an important conceptual question to which we don't have an answer at this point -why should pruning be sensitive to informativeness computed blindly to common knowledge?

Empirical challenges
In this section, we discuss empirical challenges to the proposal according to which probabilistic informativeness plays a role in pruning (Section 8.1), and to the proposal that sentences such as DEVIANT-BE and DEVIANT-WRITE are deviant because of the ignorance inferences they trigger which contradict common knowledge (Section 8.2).

The symmetry problem
According to our proposal, there are two sources of alternative pruning in implicature computation.The first is pruning due to contextual relevance considerations, whose existence has been argued for in much previous work on implicatures.The second is pruning due to informativeness considerations, which we have argued for in the present paper.
Recall from the discussion in Section 3.3 that previous work has established that pruning needs to be constrained -in other words, not all alternatives are prunable.A representative example is (12), repeated here, which can never be interpreted as 'John is French' or 'John is Spanish' (these interpretations would be available if it were possible to prune one of the alternatives 'John is French', 'John is Spanish' without the other, as discussed in Section 3.3).( 12) John is French or Spanish.
This data point belongs to a larger data pattern according to which, when a sentence has two alternatives which are symmetric, it is not possible to prune one without pruning the other.Katzir (2014) proposes the following definition of symmetry: alternative sentences in a set A of a sentence S are symmetric if no element of A is in IE(S, A).In the case of ( 12), the alternatives 'John is French' and 'John is early access Spanish' are symmetric, and it is thus not possible to prune one without the other (i.e., to 'break' symmetry).This data pattern is one aspect of the so-called symmetry problem; see Fox & Katzir 2011, Katzir 2014and Breheny et al. 2018 for more comprehensive discussions of the symmetry problem.
Why can't pruning due to contextual relevance considerations break symmetry?To our knowledge, this question hasn't yet received a complete answer, although various proposals exist for how pruning should be constrained (see discussion in Section 3.3)).
Importantly, it appears that pruning based on informativeness considerations cannot break symmetry either.To see this, consider (48). (48) All of Mary's 5 cousins or all of her 20 friends are French.
Formal alternatives of ( 48) are {All of Mary's 5 cousins are French, All of Mary's 20 friends are French, All of Mary's 5 cousins and all of her 20 friends are French}.
As before, the disjunct alternatives 'All of Mary's 5 cousins are French' and 'All of Mary's 20 friends are French' are symmetric alternatives.If prior knowledge (dis)connecting Mary's family or friends to France doesn't enter into the informativeness evaluation but the cardinality of individuals (domain size) does, P(All of Mary's 5 cousins are French|( 48)) > P(All of Mary's 20 friends are French|( 48)).Our proposal may thus (in principle 19 ) predict that we may be able to prune 'All of Mary's 5 cousins are French' due to informativeness considerations, and that (48) could thus have as implicature that not all of Mary's 20 friends are French.This reading is intuitively unavailable.This suggests that we cannot prune one of the symmetric alternatives without pruning the other due to informativeness considerations.How to account for this within the present proposal?A possible route would be to propose a set of constraints on informativeness-based pruning that would prevent it from breaking symmetry.This would, however, miss the generalization that symmetry cannot be broken by either of the two types of pruning (informativenessbased and contextual relevance-based).We leave explaining this generalization as an important direction for future work.

Extension to other empirical domains
The proposal according to which probabilistic informativeness plays a role in pruning has been motivated by the novel empirical generalizations discussed in Section 2. An important challenge for future work is to look for further corroboration of the proposal in other empirical domains.This however requires establishing how exactly informativeness maps to pruning, which the proposal at present doesn't offer (cf.discussion in Section 4).For instance, for some (but not all) of such conceivable mappings, there would be an expectation that the alternative of (49-a), which is in (49-b), is more likely to be pruned for smaller domain sizes, and thus that (49-a) is more likely to trigger the implicature that Not all of Mary's students are French for larger domain sizes (because P(All of the n individuals are A | Some of the n individuals are A) for n > 0 decreases as n increases).In order to derive the inferences in (50-b) without deriving the inferences in (50-c), Crnič et al. (2015) propose that the exhaustification operator applies at two positions in a sentence such as (50-a).More specifically, they propose that the logical form of (50-a) is ( 51).Importantly, they assume that the conjunctive alternative ('x contains A and B'), which would have been the only IE alternative in the domain of the embedded exh, is pruned.Because of this, the embedded exh doesn't affect the meaning of (52-a) -in other words, the meaning of (52-a) is (the meaning without implicatures of) (52-b).(54) a.It's not the case that every box contains an A and not a B. ⇝ Some box contains a B. b.It's not the case that every box contains a B and not an A.
⇝ Some box contains an A.
It can be shown that if alternatives obtained by replacing the universal quantifier with an existential were added, no alternatives feeding the matrix exh in (51) would be IE.In that case, distributive inferences wouldn't be derived; ignorance inferences would be derived instead (assuming that ignorance inferences are derived about all of the alternatives whose truth is not settled by the utterance as in the pragmatic approach; we leave open what parse is needed for ignorance inferences to be derived grammatically in an approach as in Meyer 2013Meyer , 2014 which would preserve the results of both Crnič et al. 2015 and the pruning proposal).This means that, with exh applying at two positions as in (51), similarly to what was the case with the standard approach to implicatures of embedded disjuncton discussed in Section 3.2, pruning existential alternatives would lead to distributive inferences and not pruning them would lead to ignorance inferences.
Importantly, however, there is an aspect of the proposal in Crnič et al. (2015) that is at odds with pruning existential alternatives of sentences such as (51).In order to motivate the possibility of pruning the conjunctive alternative from the domain of the embedded exhaustivity operator in (51), while avoiding optional pruning of conjunctive alternative in any sentence with disjunction (which would lead to an overgeneration problem), they propose the constraint on pruning discussed in Section 3.3: simplifying somewhat, they propose that one can only prune alternatives if that results in a weaker interpretation than not pruning them.This constraint is not met for existential alternatives of (51) as explained in Section 3.3: their constraint is thus not compatible with the proposal put forward in 4. One way to resolve this tension would be to find an alternative way to avoid the overgeneration problem that led Crnič et al. (2015) to postulate their constraint on pruning.Pursuing this is left for future work.

Empirical challenges for the proposed solution to the deviance puzzle
We will now discuss three additional observations of deviant sentences with embedded disjunction, whose deviance is not straightforwardly accounted for by problematic ignorance inferences.We discuss how the deviance of those cases may be accounted for, but we acknowledge that further work on those cases is needed.

Modal contrast
Sentences such as DEVIANT-BE can be saved if the possibility modal is inserted below the universal quantifier and above disjunction, but not if the necessity modal is: (55-a) is reported deviant, while (55-b) is perfectly felicitous. 2055) a. #Each of these three girls must be Mary, Susan, or Jane.b.Each of these three girls might be Mary, Susan, or Jane.
Can this contrast be explained by inferences triggered by (55-a) which contradict common ground and which are not triggered by (55-b)?To the extent that (55-a) triggers ignorance implicatures about (56), the deviance of (55-a) could be explained in the same way as the deviance of of DEVIANT-BE or DEVIANT-WRITE (note that (55-b) triggering ignorance implicatures about (56) wouldn't be problematic).More work is needed however to establish under which assumptions ignorance implicatures about (56) can be derived for (55-a) without predicting any inferences contradicting common knowledge for (55-b).
(56) At least one of these three girls is Mary (Susan, Jane. . . )

Larger domain size
The intuitions about (57) appear to be more subtle than those for DEVIANT-BE or DEVIANT-WRITE, but at least some speakers find the sentence deviant.
early access (57) ?Each of the twenty girls in this photo is Lisa or one of our neighbors.
If the sentence in (57) triggered ignorance inferences, we could explain its deviance in the same way as we did for the sentences DEVIANT-BE and DEVIANT-WRITE.However, we have established that universally quantified sentences with the cardinality of the restrictor and the number of disjuncts as in ( 57) are naturally interpreted with distributive and not ignorance inferences (cf.ALL-20-OR).
We would thus like to point to an alternative approach to explain the deviance of (57), which is nonetheless in the same spirit as the current proposal.Spector (2018) observes that sentences such as ALL-20-OR trigger not only distributive inferences according to which at least one of the twenty girls is French, and at least one is Spanish, but also an inference about how many of the twenty girls (approximately) are French, and how many are Spanish (we will refer to this in the continuation as the distribution estimate inference).The content of this inference for a sentence such as ALL-20-OR seems to be that there is approximately as many of the twenty girls who are French as those who are Spanish.
Such a distribution estimate inference in the case of (57) would amount to (58), which clearly contradicts common knowledge and could thus explain the deviance of (57). 21(58) Approximately the same number of the girls in the photo are Lisa as the number of girls in the photo who are our neighbors.
21 Extending the exhaustification approach to capture the distribution estimate inference is straightforward.The only necessary components are (i) to assume that sentences such as ALL-20-OR activate not the alternatives in which the universal quantifier is substituted with an existential, but the alternatives in which the universal quantifier is substituted with the full range of numeric expressions between (at least) 1 and the (at least) n − 1, with n being the cardinality of the restrictor, and (ii) to assume that there is a threshold numeral such that all and only alternatives headed by numerals lower than the threshold numeral are not informative enough and are thus pruned.Taking as an example the numeral at least 12 as the threshold numeral for ALL-20-OR, alternatives in (i) are sufficiently informative not to be pruned.They can all be negated consistently with ALL-20-OR, giving rise to inference in (ii

Downward-entailing contexts
Finally, we discuss sentences such as (59), in which the universally quantified sentence with a disjunction in its scope is embedded under a downward-entailing operator such as negation.Like for (57), the intuitions about (59) appear to be more subtle than those for DEVIANT-BE or DEVIANT-WRITE, but at least some speakers find the sentence deviant.
(59) ?It's not the case that both of these girls are Susan or Jane.
This empirical pattern is entirely parallel to deviance cases discussed by Magri (2009) which motivated the proposal that scalar implicatures are derived blindly to common knowledge.Consider (60), which is deviant just like (41), repeated here in (61).
(60) #It's not the case that some Italians come from a cold country.
(61) #Some Italians come from a warm country.
To explain the deviance of (60), Magri (2009) proposes that implicatures are in cases such as (60) derived locally instead of globally, that is to say, that implicatures are derived at the embedded level, below negation, rather than at the matrix level.
Furthermore, it is possible to construct deviant cases with the modified numeral at least n in downward-entailing contexts (recall that this modified numeral triggers ignorance inferences in upward-entailing contexts which may cause the sentence to be deviant, cf. ( 43)).We can slightly adapt the scenario reported in (43) from Buccola & Haida 2019 to (62). (62) Context: Ann played a card game in which, given the rules, the final score is always an even number of points.According to the rules, if a person scores 2 or 4 points, they get a small prize, and if they score 6 or more, they get a big prize.Carl and Bob are wondering how many points Ann scored.Bob sees that Ann is awarded a small prize, and reports to Carl: a. ?Ann got a small prize, so it can't be the case that she scored at least 5 points.b.Ann got a small prize, so it can't be the case that she scored at least 6 points.
It thus seems to be the case that, quite generally, sentences which are deviant (arguably because they trigger certain problematic inferences) remain deviant when embedded under a downward-entailing operator.Whether this is because of a local derivation of problematic inferences or the deviance in such cases has a different source remains to be understood.
We will now introduce four alternative directions one may attempt to pursue as competing accounts for the inference and deviance puzzles.We will point out the difficulties and open questions for each of these alternative directions.

Alternative 1: Iterated rationality models
We demonstrated that the inference puzzle poses fundamental challenges to the exhaustification approach to implicatures, and more generally, to any approach in which implicatures are a function of (solely) entailment relations between a sentence and its alternatives (in addition to possibly considerations of contextual relevance).
We proposed a solution to the inference puzzle according to which probabilistic informativeness plays a role in pruning.
Shortcomings of IRMs for certain types of implicatures, such as scalar implicatures and exhaustivity implicatures, have already been discussed (Franke & Bergen 2020, Fox & Katzir 2021, Cremers et al. 2023).IRMs may, however, still be appropriate models of various other inferences we draw when we interpret language.Could they be used to model inference patterns of embedded disjunction?
An underlying assumption of IRMs is that speakers and listeners are rational agents: the speaker reasons about how the listener will interpret the utterance, and the listener in turn reasons about how the speaker selects utterances, which results in inferences enriching the literal meaning of the speaker's sentences.In most IRMs, the inferences listeners draw depend heavily on prior world knowledge (common knowledge).The data pertaining to the deviance puzzle suggests however that inference patterns of embedded disjunction are largely independent of prior world knowledge.This in turn suggests that most existing IRMs wouldn't be appropriate to account for the deviance puzzle, even if an IRM account for the inference puzzle were to be developed.
There is however a version of IRMs developed in Degen et al. 2015, in which listeners reason about the prior world knowledge based on the speaker's utterance: if the utterance has certain properties, the listeners can suspend some of their prior beliefs in the process of implicature computation.It may be possible to develop a version of such a model which would account for the inference and the deviance puzzles: this would require working out a proposal for why prior world knowledge is systematically suspended in sentences with embedded disjunction.9.2 Alternative 2: Domain-general reasoning about the world based on the literal meaning of sentences One may further wonder whether the inference puzzle can be resolved by invoking domain-general reasoning about how the world might be based on the literal meaning of sentences (that is, based on the the meaning of sentences without implicatures resulting from consideration of alternative sentences that could have been used).Take ALL-20-OR and ALL-2-OR for instance -if all we know about Mary's friends is that each of them is either French or Spanish (i.e., literal meaning), the more friends she has, the more likely it is that at least one of them is French and the more likely it is that at least one of them is Spanish.Such domain-general reasoning about how the world might be may thus account for the observation that inferences that at least one of Mary's friends is French and at least one is Spanish are more prominent for ALL-20-OR than for ALL-2-OR (although something more would need to be said under such an approach to explain that ignorance inferences are more prominent for ALL-2-OR than for ALL-20-OR).
Here is one argument in favor of the contrast between ALL-20-OR and ALL-2-OR not being (solely) a product of domain-general reasoning about how the world might be based on the literal meaning of sentences.
Suppose that we are discussing where Mary's office-mates are from.Consider (63). (63) Both of Mary's office-mates are American or British.
Imagine that Mary works in the US, and that this fact translates into high prior probability that her office-mates are from the US (and hence low prior probability that they are from elsewhere).If we simply reason about how the world might be based on the literal meaning of the sentence, and as domain-general reasoning is expected to be sensitive to prior world knowledge, we may expect that (63) should suggest that at least one of Mary's office-mates is American, and that it shouldn't suggest that at least one is British (in other words, a salient interpretation of the sentence should be that at least one and possibly both of Mary's office-mates are American).This interpretation does not seem to be readily available: even in such a context, ( 63) is reported to be preferably interpreted with ignorance inferences and without distributive inferences, similarly to ALL-2-OR.
Moving away from the inference puzzle, if our explanation of the deviance puzzle is on the right track (i.e., if DEVIANT-BE and DEVIANT-WRITE are deviant because the inferences they trigger contradict common knowledge), this is an additional argument that distributive and ignorance inferences of sentences such as ALL-20-OR and ALL-2-OR are not the result of domain-general reasoning about how the world might be based on the literal meaning of sentences.
In other words, the situation is the following: considering that sentences with the disjunction embedded in the scope of a universal quantifier activate the alternatives as in ALT-all-or, on the assumption that recursive exhaustification at the matrix level is possible, exhaustifying a sentence like (64) once derives ignorance inferences via the maxim of quantity, and exhaustifying it twice derives distributive inferences.
If this is indeed the way distributive and ignorance inferences of sentences such as ( 64) are derived, one can put forward an alternative to pruning to account for the inference puzzle, one that would possibly relate the informativeness of a sentence to a propensity to parse it with recursive matrix exhaustification.The idea in brief would be that, given that ALL-20-OR is more informative than ALL-2-OR, and that SIMPLE-DISJ is more informative than COMPLEX-DISJ24 , we are more likely to parse ALL-20-OR with recursive matrix exhaustification as compared to ALL-2-OR, and likewise SIMPLE-DISJ as compared to COMPLEX-DISJ.In other words, the disambiguation process (between a parse with a single exh and parse with double exh) would have to be guided by the informativeness of a sentence.
A problem with this approach however is the deviance puzzle.If the deviance pattern reported in Section 5 is indeed due to problematic inferences contradicting common knowledge, one would need to propose that sentence disambiguation is not sensitive to whether one of the meanings is unlikely -or, in the extreme case, contradictory -given prior world knowledge.This seems to be wrong: how we interpret the sentence 'I like banks' will likely differ when the sentence is uttered in a bank (financial institution) and when it's uttered at a bank (riverside).

Alternative 4: Implicature suspension
Our proposal according to which probabilistic informativeness plays a role in pruning is close in spirit to the proposal in Chemla & Romoli 2015, which was developed for other purposes.In their framework, implicatures of a sentence are eliminated if the informativeness of the implicature is too high.According to our proposal, alternatives are eliminated if the informativeness of the alternative given the original utterance is too low.The two ideas 'co-vary' in most cases, since in most cases the implicature is a consequence of the negation of an alternative.
Importantly, however, pruning an alternative from the whole process of implicature derivation (as in the current proposal) may have radically different effects than eliminating an implicature coming out of the presence of this alternative.To give a concrete example from the empirical domain explored in this paper, under the exhaustification approach to implicature derivation, pruning certain alternatives of ALL-20-OR and ALL-2-OR derives distributive inferences, and not pruning them derives ignorance inferences.Crucially, however, eliminating ignorance inferences (e.g., because they are too informative) would not immediately lead to the derivation of distributive inferences, or vice versa.This fact allows to differentiate our proposal from that of Chemla & Romoli (2015) on empirical grounds.This is not to say, however, that the proposal in Chemla & Romoli 2015 cannot be extended to capture the data discussed here.In particular, there are free parameters in Chemla & Romoli 2015 to be set (e.g., which set of alternatives is assumed, which approach to implicature derivation is taken) in order to be able to fully compare it to the current proposal.We leave this comparison for future work.

Conclusion
In this paper, two novel empirical puzzles with embedded disjunction have been explored: the inference puzzle and the deviance puzzle.
The inference puzzle taught us that quantified sentences with embedded disjunction trigger inferences which are sensitive in some way to the informativeness of the utterance and its alternatives (as evidenced by the effect of the domain size and the number of disjuncts on whether the ignorance or the distributive inferences are derived).The account we have put forward to capture this effect is that pruning of alternatives is sensitive to how much information the alternative carries given the original utterance: the more informative the alternative is, the more likely it is to be kept in the alternative set in the computation of implicatures.Importantly, even if the specifics of the pruning account turn out to be incorrect, the data pattern that the account aims to capture strongly suggests that informativeness other than logical or contextual entailment plays a role in some way in implicature computation.
The deviance puzzle is about a novel case of deviance of sentences with embedded disjunction, which we have argued to be caused by ignorance inferences.Importantly, if the proposed account is on the right track, ignorance inferences need to be derived blindly to common knowledge (much like scalar implicatures have been argued to be derived blindly to common knowledge by Magri 2009), and crucially, the computation of informativeness of alternatives needs to be blind to (at least) some aspects of common knowledge (i.e., there has to be some level of modularity when informativeness is calculated).
The two main conclusions of the paper are thus that probabilistic informativeness plays a role in implicature derivation, and that it is computed in a modular way.
), let us consider what happens with its alternatives of the form (29-a,b). (29) All of n people are A or B. a.Some of n people are A(B) b.All of n people are A(B) those three girls is Mary, Susan, or Jane.(32) (Context: Peter invited three girls to the party.)NON-DEVIANT-CALLED Each of those three girls is called Mary, Susan, or Jane.(33) (Context: Tolstoy, Zola and Rowling are great writers.)DEVIANT-WRITE #Each of those three writers wrote Anna Karenina, Germinal, or Harry Potter.(34) (Context: Ann, John, and Bob are great students.)NON-DEVIANT-READ Each of those three students read Anna Karenina, Germinal, or Harry Potter.
of Mary's students are French.b.All of Mary's students are French.8.1.3Crnič et al.'s 2015 approach to distributive inferences Crnič et al. (2015) provide experimental results showing that sentences such as (50-a) trigger the distributive inferences in (50-b) without necessarily triggering the inference in (50-c).This suggests that negating the disjunct alternatives, which is how distributive inferences are standardly derived, as discussed in Section 3.2, may not be the (only) way to derive distributive inferences.(50) a.Every box contains an A or a B. b.Some box contains an A and some box contains a B. c.Not every box contains an A and not every box contains a B.
(51) exh [Every box x exh [x contains an A or a B]] box x exh [x contains an A or a B]] b.Every box contains an A or a B.Furthermore, they assume that two types of alternatives are not there: (i) alternatives obtained by replacing the universal quantifier with an existential; (ii) alternatives where the embedded exh is deleted.Therefore, according to the parse (51), the alternatives on which the matrix exh operates are in (53): (53) a. [Every box x exh [x contains an A]] = Every box contains an A and not a B b. [Every box x exh [x contains an B]] = Every box contains a B and not an A c. [Every box x exh [x contains an A and a B]] = Every box contains an A and a B All of these alternatives can be negated consistently with (52-a), that is, with the original proposition which is an argument to the matrix exh.Negating (53-a) obtains the inference in (54-a), and negating (53-b) the inference in (54-b).

Table 1 Left
: Alternatives of ALL-20-OR with the existential alternatives pruned (in strike-through text).Right: Inferences of ALL-20-OR which result from the alternatives on the left-hand side of the table.Distributive inferences are derived (good outcome for ALL-20-OR).

Table 2
Left: Alternatives of ALL-2-OR (no alternatives are pruned).Right: Inferences of ALL-2-OR which result from the alternatives on the lefthand side of the table.Ignorance inferences are derived (good outcome for ALL-2-OR).

Table 3
, the proposal does not yet capture that the inference pattern is sensitive to the number of disjuncts.As a reminder, consider SIMPLE-DISJ and COMPLEX-DISJ, repeated below: SIMPLE-DISJ is more naturally interpreted with distributive inferences than COMPLEX-DISJ, and COMPLEX-DISJ more naturally with ignorance inferences than SIMPLE-DISJ.DISJ.COMPLEX-DISJ is predicted to trigger distributive inferences when the existential alternatives of the form 'Some are French', 'Some are French or Spanish' etc. are pruned, as in Table3, and ignorance inferences when no alternatives are pruned, as in Table4.14Inotherwords, correct inferences are predicted when existential alternatives aren't pruned.Left: Alternatives of COMPLEX-DISJ with the existential alternatives pruned (in strike-through text).Right: Inferences of COMPLEX-DISJ which result from the alternatives on the left-hand side of the table.Distributive inferences are derived (bad outcome for COMPLEX-DISJ). However

Table 4 Left
: Alternatives of COMPLEX-DISJ (no alternatives are pruned).Right: Inferences of COMPLEX-DISJ which result from the alternatives on the left-hand side of the table.Ignorance inferences are derived (good outcome for COMPLEX-DISJ).
) a.This girl is my sister Susan.#That other girl is my sister Susan too.b.This girl is called Susan.That other girl is called Susan too.
c. John wrote this book.#Peter wrote this book too.d.John read this book.Peter read this book too.
she assumes that in deriving alternatives one cannot delete K speaker .(45) is one possible parse of the sentence All are A or B according to this theory.Importantly for our purposes, this parse results in grammatical ignorance implicatures if existential alternatives aren't pruned, and in distributive inferences if they are pruned.Let us see how.
speaker [exh[All are A or B]]]If existential alternatives (i.e., 'Some are A (B)') aren't pruned from the set of alternatives of the embedded exh in (45), neither the alternatives 'Some are A (B)' nor the alternatives 'All are A (B)' are IE at the embedded level for reasons discussed in Section 3.2.What happens at the level of matrix exh in that case?The alternative set of the matrix exh is in (46). ).