Probability and implicatures: A unified account of the scalar effects of disjunction under modals

Sentences involving disjunction under epistemic modal adjectives — such as possible, likely, and certain — give rise to the inference that the disjuncts are epistemically possible. Inferences of this sort are often classified and treated differently, depending on the force of the embedding modal. Those triggered by possibility modals are singled out as ‘free choice inferences’ (Kratzer & Shimoyama 2002, Klinedinst 2007, Fox 2007, Chierchia 2013, a.o.), while those triggered by stronger modals are accounted for in a different way (Sauerland 2004, Fox 2007, Crnič, Chemla & Fox 2015 a.o.). In this paper, we pursue two goals. First, we develop and defend a degree semantics for epistemic modal adjectives, building on much recent work on the topic (Yalcin 2010, Lassiter 2011, 2014, Moss 2015a, Swanson 2015, a.o.). Second, we show that this semantics, in combination with the assumption that scalar implicatures can arise in embedded position (Fox 2007, Chierchia, Fox & Spector 2012, a.o.), can predict all the inferences triggered by disjunction under modals, including free choice ones, via a uniform mechanism. We conclude by outlining how the proposal can be extended to epistemic modal items in other syntactic categories, and to modals of different flavor.


The inferences of disjunction under epistemic modals: An overview
Disjunctions in the scope of epistemic modal expressions give rise to scalar inferences to the effect that each disjunct is epistemically possible. For example, all the sentences in (1a)-(1c) are naturally heard as suggesting that the sentences in (2a) and (2b) are true.
(1) a. It's possible that we will hire Mary or Sue. b. It's likely that we will hire Mary or Sue. c. It's certain that we will hire Mary or Sue.
(2) a. It's possible that we will hire Mary. b. It's possible that we will hire Sue.
Part of the evidence for these effects is that all of (1a)-(1c) are infelicitous in a context where one of (2a) or (2b) is false.
(3) I talked to the committee, it's impossible that we will hire Mary but . . . #It's possible/likely/certain that we will hire Mary or Sue.
These observations generalize to epistemic modal expressions in different syntactic categories, including adverbs like possibly, probably, certainly and auxiliaries like might, should and must. They also arise with some modal expressions with non-epistemic flavor -in particular, with deontic modal adjectives like allowed and required, and with the corresponding auxiliaries may and have to. 1 Some examples are below in (4)-(7).
(4) It might rain or snow tomorrow.

a.
It might rain tomorrow 1 Disjunction embedded under quantificational DPs like all students and some students also appears to give rise to similar inferences (Klinedinst 2007, Fox 2007 among others): (i) All students/Some students took Syntax or Logic this year.
a. some student took Syntax b.
some student took Logic We focus on the case of modals in this paper and we leave to further work the task of investigating extensions of our account to cases like those in (i). See also footnote 8 below, where we point to arguments from Crnič, Chemla & Fox 2015, which call into question whether we should treat at least the case of all in the same way. John might have taken Logic last year (6) It is allowed to smoke or drink in this room.

a.
It's allowed to smoke in this room b.
It's allowed to drink in this room (7) John has to take Syntax or Logic this year. a.
John may take Syntax this year b.
John may take Logic this year All these effects are unexpected from the viewpoint of classical modal semantics, which descends from modal logic and treats natural language modals as quantifiers over worlds. This analysis, in combination with the hypothesis that or simply corresponds to Boolean disjunction, predicts that the following inference patterns are invalid: 2 There is broad (though not universal) agreement that a Boolean analysis of disjunction and classical modal semantics are essentially correct, and that the scalar effects we described should be captured as implicatures that arise on top of literal meaning (Kratzer & Shimoyama 2002, Simons 2005, Fox 2007, Klinedinst 2007, Chemla 2008, van Rooij 2010, Franke 2011, Alonso Ovalle 2005, Chierchia 2013, Crnič, Chemla & Fox 2015. 3 The main argument for this approach is that the problematic effects disappear in downward entailing contexts -a signature of scalar implicatures in general. For example, consider (8): (8) It's not possible that we will hire Mary or Sue.
If the inferences in (2a) and (2b) arose also under negation, (8) would have the meaning in (9a), represented schematically in (9b): 2 The same point holds for likely; see §2 for details.
3 For some semantic accounts of free choice inferences, see Higginbotham 1991, Zimmerman 2000, Geurts 2005, Barker 2010, Starr 2016, among others. In this paper, we lack the space to set up a comparison between our analysis and these accounts.

13:3
Paolo Santorio and Jacopo Romoli (9) a. It's not true that: it's possible that we will hire Mary or Sue and it's possible that we will hire Mary and it's possible that we will hire Sue. b. ¬[♦(p ∨ q) ∧ ♦p ∧ ♦q] Notice that (9a) is compatible with it being possible that (say) we hire Mary. But this is clearly incorrect. (8) says that it's certain both that we will not hire Mary and that we will not hire Sue. This shows that the inferences we're interested in don't arise under negation.
To sum up: the scalar effects introduced above can be captured via a simple generalization:

Possibility Implicatures of Modals (PIM)
Sentences of the form mod[p ∨ q] (where 'mod' stands for any modal that is not downward entailing in its prejacent position) give rise to the inferences that each of the disjuncts is possible.

mod[p ∨ q]
♦p, ♦q Let us notice right away that PIM applies to modal expressions of various flavors and syntactic categories. Throughout the core part of the paper, we focus on epistemic modal adjectives. This choice is dictated by a practical concern: our proposal makes crucial use of a degree semantics, and the most developed degree-based analyses of modal terms concern just items of this sort. We are going to sketch how the proposal may be extended to other items in §8.

The project: A uniform account
Strikingly, the great majority of analyses in the literature do not predict PIM via a uniform mechanism. 4 Instead, the scalar effects covered by PIM 4 Two notable exceptions to this are Chemla 2008 andKlinedinst 2007. Our proposal is in particular very similar in spirit to that of Klinedinst 2007, although we do not make the assumptions that modals quantify over pluralities of worlds. See §9.2 below for a brief comparison with Klinedinst's (2007) approach. We leave a detailed comparison with Chemla's (2008) proposal for future research. Notice also that Chierchia (2013) calls 'free choice inferences' both distributive and free choice inferences, but derives them in very different ways.

13:4
Probability and implicatures are generally classified and treated differently. The effects triggered by nonexistential modals are sometimes called 'distributive inferences' -the intuition being that they 'distribute' the two propositions corresponding to each disjunct across the worlds in the quantificational domain. These inferences can be handled straightforwardly by some standard accounts of scalar implicature (e.g. Sauerland 2004 andFox 2007). We go through the details in §2, but the basic algorithm consists simply in conjoining the basic meaning of the sentence with the negation of some of its stronger alternatives. This strengthened meaning entails the relevant possibility claims. Schematically: The possibility inferences triggered by possibility modals, on the other hand, are called 'free choice inferences' and are usually derived via a more complex route. One increasingly popular account (though by no means the only live option) involves exploiting a recursive mechanism for computing implicatures (see Kratzer & Shimoyama 2002, Fox 2007, Chierchia 2013 among others).
In this paper, we show that, by adopting a degree semantics for epistemic modal expressions, we can derive all the effects covered by PIM, in a uniform way, via the simple schematic process outlined above. The idea builds on recent work on the semantics of epistemic modals. Recent literature on modality has defended the claim that a degree semantics for modals might be empirically and theoretically superior to a classical quantificational semantics (Swanson 2006 andLassiter 2011; see also Swanson 2015, Yalcin 2007, Lassiter 2014, Moss 2015a for related work). On one popular implementation, epistemic modals work as measures of probability, mapping propositions to a value on a probability scale. For concreteness, in this paper we adopt a probabilistic semantics for modals, though in principle other implementations of a degree semantics are possible. As mentioned, we focus on epistemic modal adjectives like possible, likely and certain as our main case study.
The rest of the paper is organized as follows. In §2, we set up some background on the semantics of modals and scalar implicature. In §3, we show how this account can straightforwardly derive distributive inferences, but not free choice ones. After a brief overview of the proposal in §4, in §5 we give 13:5 some background on degree semantics for adjectives in general, while in §6 we discuss in detail the case for a degree semantics for epistemic adjectives. In §7, we put forward our unified account of all possibility inferences and in §8 we show how it can be extended beyond epistemic modals. In §9, we discuss other remaining issues and make some brief comparisons with alternative proposals.
2 General background

The classical semantics of modals and probability operators
The contemporary benchmark for the semantics of modality is set by Kratzer's analysis (1981Kratzer's analysis ( , 1991Kratzer's analysis ( , 2012; see also Portner 2009). In outline, Kratzer treats modals as quantifiers over a contextually restricted domain of worlds. This domain of quantification is determined via two contextually provided sets of propositions (more precisely: functions from worlds to sets of propositions), which Kratzer calls modal base and ordering source. The modal base is used to single out the domain of quantification of the modal: worlds in the domain are all and only those that satisfy all the propositions in the modal base. The ordering source is (simplifying somewhat) used to induce an ordering on the worlds in this domain, singling out a set of 'best' worlds, along some relevant dimension. 5 What properties of worlds matter for the ordering depends on the flavor of modality involved. In particular, Kratzer takes epistemic modals to use an ordering that ranks worlds on the basis of how much they conform to stereotypical assumptions. 5 One standard way to extract an ordering from a set of propositions S is straightforward (as pointed out by Lewis 1981). We say that, for any two worlds w 1 , w 2 , w 1 w 2 iff, for any p ∈ S, if p(w 2 ) = 1, then p(w 1 ) = 1. The set of 'best' worlds is the set of worlds such that there is no world that is better than them; i.e., best = {w : ¬∃w : w ≺ w}. For simplicity, here we are assuming that deontic modals satisfy the so-called limit assumption (Lewis 1973, Stalnaker 1984. The limit assumption says that we can single out a set of 'best' worlds starting from any world in the modal base. Formally (and following the formulation in Kaufmann & Kaufmann 2015): A pair of a modal base f and an ordering source g satisfies the Limit Assumption iff for all possible worlds w, for all v ∈ f (w) there is a u ∈ best g(w) (f (w)) such that u g(w) v Where best g(w) (f (w)) is the subset of f (w) such that for all its elements there isn't a strictly g(w)-better world in f (w).

13:6
Probability and implicatures Using 'f ' and 'g' to pick out respectively the modal base and the ordering source, an analysis of possible and certain, under this approach, is in (10) and (11). 6 (10) possible As Kratzer emphasizes, her view is well-equipped to handle graded modal expressions, i.e. modal expressions with intermediate force between possibility and necessity. likely, in particular, is treated as follows: likely φ is true iff, for every non-φ-world in the modal base, there is a φ-world in the modal base that is more highly ranked. Formally: The analysis naturally generalizes to modal expressions in other syntactic categories, as well as expressions of different strength, modulo changes in quantificational force. Some examples of epistemically modalized expressions covered by Kratzer's analysis are modal auxiliaries like might, should, and must and adverbs like possibly, probably, and certainly.

Scalar implicatures
Before discussing distributive and free choice inferences, it is helpful to introduce scalar implicatures in general. Grice (1975) first singled out implicatures as a category of interpretive effects going beyond the basic literal meaning of a sentence. Scalar implicatures are a subkind of implicatures. Roughly, scalar implicatures involve the denial of a logically stronger alternative to a sentence. Here is a classical example: (13) a. We will hire Mary or Sue. b.

We won't hire both Mary and Sue
Grice treated all implicatures as purely pragmatic in nature. On his view, implicatures are produced via general principles of reasoning, which take as input the final result of the compositional computation of semantic value. Much recent work has questioned this account. On the alternative view, scalar implicatures are generated by semantic processes, i.e. processes that are implemented compositionally. The debate is far from being settled and we will not enter this discussion here. As we point out below, however, we need a view that allows scalar implicatures to arise from embedded positions. More precisely, we need a theory that allows for 'intermediate' scalar implicatures, in the sense of Sauerland (2012Sauerland ( , 2014. A natural way to do this is to adopt a semantic view of scalar implicatures (Fox 2007, Chierchia, Fox & Spector 2012, Chierchia 2013, though any theory that allows for intermediate scalar implicatures would work for our purposes. Chierchia, Fox & Spector (2012) suggest that scalar implicatures are generated by a silent operator, which they represent as 'exh' (for 'exhaustification'). exh is roughly akin in meaning to natural language only. Like only, exh combines with a sentence φ and it returns the proposition resulting from conjoining the meaning of φ with the negation of some of the alternatives of φ, ALT (φ) -what we call excludable alternatives. Schematically: We take an alternative to be excludable just in case (a) negating it doesn't contradict the literal meaning of the sentence asserted, and (b) negating it doesn't force us to accept other alternatives on the list (Sauerland 2004, Fox 2007; see also Gazdar 1979). Here is the formal definition. 7 The idea behind the functioning of exh is this: we try to strengthen the sentence as much as possible, while at the same time avoiding both contradictions and arbitrary choices. For an example, consider the sentence in (13a), which has the alternatives in (16).
We will hire Mary or Sue M ∨ S We will hire Mary M We will hire Sue S We will hire Mary and Sue M ∧ S Of these alternatives, only the conjunctive one (M ∧ S) is excludable. As for the other two: if we excluded both, we would get a contradictory meaning (since, combining the implicatures with the assertion, we would have that one between Mary and Sue will be hired, but not Mary, and not Sue); if we excluded one, we would have to arbitrarily select one of the two disjuncts as the true one (e.g., if we ruled We will hire Mary as false, given the content of (13a) we would have to conclude that We will hire Sue is true). Hence only the conjunctive alternative is ruled out, and the strengthened meaning of the sentence is: We will hire Mary or Sue and it's not true that we will hire both.
The kind of strengthening induced by exh is generally called exhaustification, and sentences that have gained a stronger meaning in this way are called exhaustified. So far, we haven't said what sentences enter the set of alternatives used to compute exhaustified meanings. This is a controversial issue in the literature, and one that cross-cuts the line between pragmatic accounts and semantic ones. This issue doesn't matter much for our purposes. For concreteness, we assume the complexity-based account given by Katzir 2007 andFox &Katzir 2011. In outline, on this account the alternatives to S are those sentences that are no more complex than S, and that can be obtained from S by replacing S's constituents with its subconstituents or relevant items from the lexicon.
3 Deriving distributive inferences (but not free choice ones)

Distributive inferences with certain and likely
We can now go back to sentences like (18a) and (18b) and show how we can derive the distributive inferences in (19a) and (19b) as straightforward scalar implicatures.
(18) a. It is certain that we will hire Mary or Sue. b. It is likely that we will hire Mary or Sue.

13:9
Paolo Santorio and Jacopo Romoli It is possible that we will hire Mary b.
It is possible that we will hire Sue Consider first (18a); its alternatives are in (20). All of the alternatives in (20) (aside from the one corresponding to the assertion) are excludable: they are all stronger than the basic meaning of (18a), and there is no alternative such that negating it compels us to accept another alternative. (20) It is certain that we will hire Mary or Sue (M ∨ S) It is certain that we will hire Mary (M) It is certain that we will hire Sue (S) It is certain that we will hire Mary or Sue By conjoining the basic meaning with the negated excludable alternatives, we obtain the distributive inferences that we are looking for, since (21) entails (22). (Here and in what follows, to avoid clutter we ignore the conjunctive alternative, which is superfluous since its negation is entailed by the negation of the other two alternatives.) 8 This can easily extend to the case of likely in (18b), the alternatives of which are in (23). To our knowledge, the scalar inferences triggered by likely have not been discussed in the literature. But, as the reader can check, by adopting 8 As mentioned above, this account of distributive inferences is generally applied also to the parallel inferences of disjunction embedded in the scope of a universal quantifier like (i).
(i) Every student took Syntax or Logic a. some student took Syntax b.
some student took Logic Again, the inferences in (ia) and (ib) can be obtained by negating the alternative corresponding to each disjunct embedded under the universal quantifier: every student took Syntax and every student took Logic. Crnič, Chemla & Fox 2015, however, show that this derivation is problematic. In particular, they show that a sentence like (i) can be interpreted as giving rise to the distributive inferences even in contexts in which one of the two alternatives mentioned above is actually true (e.g., a context in which every student took Syntax). They then propose a different way of deriving distributive inferences, involving two exhaustifications and a stipulation about alternatives. Crucially, as they notice, the problem they identify does not extend to the case of modals.

13:10
Probability and implicatures the meaning in (12) and conjoining the sentence with negated excludable alternatives, as in (24), we predict the distributive effect (since (24) entails (25)). 9 (23) It is likely that we will hire Mary or Sue (M ∨ S) It is likely that we will hire Mary (M) It is likely that we will hire Sue (S) It is likely that we will hire Mary and Sue In sum: we can derive the distributive inferences triggered by certain and likely in a straightforward way by adopting a standard account of scalar implicature. 10 Before moving on, let us notice a potential problem concerning the effects triggered by likely. The account predicts that, whenever likely (φ ∨ ψ) triggers possibility inferences, it will also trigger the inference that likely φ and likely ψ are false. In fact, the derivation of the former inferences essentially relies on the latter. This may be problematic. Several reviewers have pointed out to us that they can distinguish an intermediate reading of likely (φ ∨ ψ) that triggers the relevant possibility inferences, but not the inferences that likely φ and likely ψ are false. Here is an example. 11 Suppose that I have removed some cards from a standard 52 card deck, and I ask you to draw a card at random. I add: (26) It is likely that you will draw a spade or a number.
(26) clearly triggers the inference in (27): 9 We are using here to represent the meaning of likely in (12).
10 Notice that a scalar implicature account of these inferences can also naturally account for why they tend to be absent in downward entailing contexts, given that this is a general property of scalar implicatures. One way to capture this property is assuming that the distribution of exh obeys an economy condition along the lines of (i): that is, it is not added if it leads to an overall weaker meaning (this condition is a version of the Strongest Meaning Hypothesis; see Chierchia, Fox & Spector 2012 among others for discussion).
(i) Do not weaken!: Do not insert exh in S if the overall resulting meaning is weaker than S .
11 This example is a refinement of one suggested by an anonymous referee.

13:11
Paolo Santorio and Jacopo Romoli (27) a. It is possible that you will draw a spade b.
It is possible that you will draw a number While the judgment is subtle, it does not seem to trigger those in (28): (28) a.
It is not likely that you will draw a spade b.
It is not likely that you will draw a number If this judgment is correct, this is an obvious problem for either standard theories of likely, or standard theories of distributive implicatures. One of the advantages we are going to claim for our proposal is that it improves on this prediction. 12

The trouble with possible
Given that the account of distributive inferences sketched above covers both certain and likely, one might expect that it also covers the free choice inferences generated under possible. But this is not so if we stick to the classical semantics for possible. Consider the counterpart of (18a) and (18b) involving possible. This sentence triggers the usual inferences in (29a) and (29b).

(29)
It is possible that we will hire Mary or Sue.
12 A referee and the editor point out an additional issue, distinct from the availability of what we've called 'intermediate reading'. They suggest that a sentence corresponding to the strengthened meaning sounds quasi-contradictory. Compare (i) to (ii), which involves certain and is unproblematic.
(i) ?It's likely that we will hire Mary or Sue and it's not likely that we will hire Mary and it's not likely that we will hire Sue.
(ii) It's certain that we will hire Mary or Sue and it's not certain that we will hire Mary and it's not certain that we will hire Sue.
We think the source of this contrast lies in the fact that likely, but not certain, is a negraising predicate (see Horn 1978, Gajewski 2007, Romoli 2013. This means that (i) is typically interpreted as in (iii). While (iii) is consistent, it is intuitively not easy to think of a situation that would make it true and this could be the reason why the sentence sounds slightly deviant.
(iii) ?It's likely that we will hire Mary or Sue and it's likely that we will not hire Mary and it's likely that we will not hire Sue.
It is possible that we will hire Mary b.
It is possible that we will hire Sue Suppose we try to derive the free choice effect as a kind of implicature in the way we did above. Our alternatives are: It is possible that we will hire Mary or Sue ♦(M ∨ S) It is possible that we will hire Mary ♦(M) It is possible that we will hire Sue ♦(S) It is possible that we will hire Mary and Sue ♦(M ∧ S) Here we get stuck. Differently from previous cases, the first two alternatives are not excludable, hence we cannot strengthen the assertion with their denial. If we deny ♦(M), given the content of the assertion (i.e., ♦(M ∨ S)), ♦(S) must be true. Vice versa if we deny ♦(S). So the derivation of (29a) and (29b) is blocked. 13 4 Overview of the proposal One might wonder whether there is something wrong with our assumptions. Throughout §3, we presented a general kind of scalar reasoning that allowed us to derive distributivity implicatures under certain and likely. Why should that reasoning fail just for possible?
Why, indeed? In the next sections, we show that the simple scalar reasoning that we use to derive distributive inferences also gets us free choice, provided that we tweak the meaning of possible. On a standard, Kratzerstyle semantics, possible is an existential quantifier. Among other things, this means that possible works as a scalar endpoint: possible is the weakest quantifier over epistemically possible worlds. As a result, when we deny possible φ we obtain a very strong claim, i.e. the claim that there are no φ-worlds. We suggest that we remedy the problem by introducing more structure into the semantics of possible. In particular, a degree-based treatment makes a difference to the alternatives generated by possible-sentences.
The next sections develop the proposal in detail, but here we give an overview. Following standard theories of gradable adjectives, we assume that likely, certain and possible work as measure functions. In particular, they all 13 To be sure, we still derive the implicature that it might not be that we hire both Mary and Sue, via the conjunctive alternative. But this is not enough to get the free choice effect: from ¬♦(M ∧ S) it doesn't follow that ♦(M) or ♦(S).

13:13
Paolo Santorio and Jacopo Romoli map propositions to sets of degrees of probability. For example, this is the schematic meaning for likely φ: In addition, we are going to assume that gradable adjectives in the positive form combine with a covert morpheme, dubbed 'pos'. Roughly, pos sets a standard on the scale that the adjective operates on. For the case of likely, this standard is merely a degree of probability that is salient in the context. Hence It is likely that φ is true just in case the probability of φ is greater than the contextual standard for likelihood (which we denote as 's likely '). (32) It is pos likely that φ = 1 iff P r ( φ ) > s likely For the case of possible, we assume that the standard is the minimum of the scale. As a result, It is possible that φ is true just in case there is some non-zero degree of probability such that the probability of φ is greater than that. 14 (33) It is pos possible that Our key observation is that we can derive possibility implicatures for all relevant cases by giving the exhaustivity operator intermediate scope between pos and the adjective. I.e., adopting a degree semantics for modal adjectives, the following configuration yields free choice: For a schematic example, consider: Informally, here is how we derive free choice from (35): if the probability of a disjunction is at least some degree d, and the probability of each disjunct is lower than d, then each disjunct must have positive probability. Given the meaning we're assuming for possible, this is equivalent to the claim that each disjunct is possible. 15 The next three sections develop and defend the proposal in detail.
The jar is almost full.
15 Note that this readily extends to disjunctions with more than two disjuncts, provided the assumption that each disjunct is an alternative to the whole disjunction.
and P r (p ∨ q ∨ r ) ≥ d and P r (p) < d and P r (q) < d and P r (r ) < d

13:15
Paolo Santorio and Jacopo Romoli The scale type associated with an adjective correlates with the availability or lack of availability of certain modifiers. For example, proportional modifiers like partially, half, or 60% may only combine with closed scale adjectives, as (39) and (40) show.
For a comprehensive survey of the relationship between modifiers and scale structure, we refer the reader to McNally 2005 andKennedy 2007.

Semantics of gradable adjectives: Degrees and pos
We assume a semantics for gradable adjectives in the style of Cresswell 1976, von Stechow 1984, Heim 1985. On this semantics, gradable adjectives denote functions from degrees and individuals to truth values (type d, et ). 16 As an example, here is the lexical entry for tall. 17 Informally, tall maps an individual and a degree to true just in case the individual's degree of height is equal to or greater than that degree. Following the literature, we assume that the degree argument of the adjective is provided by a separate morpheme, which appears in a syntactic position labeled 'Deg(ree)P(hrase).' When the adjective appears in the positive form, we assume the presence of a covert morpheme 'pos' (for 'positive form'), which relates the degree argument of the adjective to a salient standard of comparison.
Following Heim 2000, we assume that all degree phrases, including pos, are of type dt, t and that syntactically they are generated as a sister of the adjective. This produces a type mismatch with adjectives (which are of type d, et ). As a result, DegP moves leaving a trace of type d and combines with the rest of the sentence after lambda abstraction. 18 For illustration: the LF of the simple sentence in (42a) after movement is in (42b).
As Heim points out, DegP-movement has to be severely constrained. In particular, it cannot outscope quantificational DPs and negation (Kennedy 1997, Heim 2000, Beck 2012, Romero 2015. To see this, notice that, if pos were allowed to scope over the quantifier, as in (44), we would predict that (43) has the reading in (45) (where s tall represents the contextual standard for tallness).
(45) says that there is a degree of height d above the standard of tallness such that no student is d-tall. This requires merely that we can find a degree of height (say, 9 feet) such that no student is tall to that degree. Obviously (43) does not have this reading. From now on, we are going to assume that all syntactic constraints applying to gradable adjectives in general will carry over to the case of epistemic modal adjectives. We refer the reader to Heim's paper for extended discussion of these issues.
Let us say more about pos. Following Kennedy & McNally 2005, Kennedy 2007 (see also Bochnak 2015), we assume that the denotation of pos involves a contextually provided function R, which restricts the possible denotations of d depending on the clause that pos takes as one of its arguments.
As an example, the truth conditions for a sentence like Mary is tall are in (47), 's tall ' indicates the standard of comparison for tall in the context.
18 For an alternative perspective on degree phrases that doesn't use movement, see Alrenga & Kennedy 2014. We think our account could be made compatible with this alternative treatment, though we don't develop this variant here.

13:17
Paolo Santorio and Jacopo Romoli The standard of comparison is fixed in different ways for different adjectives.
Adjectives that exploit a totally open scale, like tall, are evaluated relative to a contextually given threshold (and are hence dubbed 'relative standard adjectives'). Adjectives that exploit closed scales are evaluated relative to the endpoints, and are hence dubbed 'minimum standard' or 'maximum standards.' Overall, the correlations between scale structure and adjective type are mapped in the following diagram: Notice that, for the case of adjectives exploiting totally closed scales, some adjectives are ambiguous between a minimum and a maximum standard reading, while some others invariably pick out one of the endpoints of the scale. Discussing the correlation between scale structure and adjective type would take us too far from our main target. So we are simply going to assume that R takes a degree and an adjective and restricts the possible denotations of d on the adjective scale in the following ways: R applied to a degree d and a d, t property involving possible makes sure that d is higher than the bottom of the scale (d > 0), in the case of certain, d is restricted to be the maximum of the scale (d = 1), while with likely d is restricted to be larger than a contextually salient degree (d > s likely ). 19 Presumably, these facts should be explained, but providing this explanation is orthogonal to issues specific to modal adjectives. 20 6 Degree semantics for epistemic modal adjectives This section lays out our analysis of the epistemic modal adjectives likely, certain and possible. We start from a degree-based probabilistic semantics for likely (inspired by Yalcin 2010and Lassiter 2011 and extend it, with some changes, to certain and possible. The idea of using a degree semantics for these modals is controversial, so we will spend substantial space defending our assumptions.

The arguments for a degree semantics
Existing arguments for a degree semantics for epistemic modal adjectives fall into two categories. 21 The first concerns the logical properties of these adjectives; the second concerns their ability to combine with scalar modifiers.
The item that has received greatest attention recently is likely. 22 Kratzer's semantics for likely in (12) predicts the validity of inference patterns that are obviously incorrect. For example, it predicts the validity of the following: (48) φ is as likely as ¬φ φ is as likely as ψ has to be explained via differences in the value of the R parameter. Crucially, this is not a problem arising specifically with modal adjectives. Rather, it extends to all adjectives that exploit the same scale (e.g. hot and warm) and is part of the general problem of correlating scale structures and adjective types.
20 So far as we know, the only attempt at providing a systematic account is due to Kennedy 2007, who (building also on Kennedy & McNally 2005) hypothesizes that the correlations we described are generated by a principle of 'Interpretive Economy', which roughly dictates using the endpoints of a scale whenever they are available. We should note that even this account cannot explain in full the range of variability observed for closed scale adjectives.
21 For relevant work, see Swanson 2006, Yalcin 2007, Lassiter 2010, Moss 2015b,a. See also Cariani, Santorio & Wellwood 2016 for an up-to-date summary of the arguments for a probabilistic semantics for likely. For concreteness, we are going to adopt as our benchmark the approach in Yalcin 2010.
22 Arguments that the logical features of likely demand a probabilistic semantics are put forward by Yalcin (2010) as well as Lassiter (2011), though the formal work that undergirds them precedes them (see in particular Halpern 1997particular Halpern , 2003.

13:19
Paolo Santorio and Jacopo Romoli This is incorrect, since it entails that any proposition that is as likely as its negation is at least as likely as any other proposition (and hence, that it has probability 1). Conversely, a probabilistic semantics invalidates the problematic pattern. 23 Arguments from compositional interactions focus on the fact that likely may combine with degree modifiers and may appear in comparatives. (49) It is very likely that Mary will hand in her paper in time.
(50) It is more likely that Mary gets an A than that Sue gets an A.
In addition, likely also appears in combination with certain proportional modifiers, namely percentage modifiers of the form n%, as in (51): (51) It is 20/50/80/100% likely that it will snow.
The presence of adjectival modifiers is the hallmark of degree semantics. In particular, the fact that likely can combine with proportional modifiers as in (51) is evidence that the relevant scale is closed on both ends. In sum, there appears to be a happy convergence between the two strands of evidence for a semantics for likely that exploits degrees of probability. A probability function maps each proposition in a given space to a real number lying in the closed interval [0,1]. Hence a probability function can be characterized as a measure function mapping propositions to a closed scale, exactly as the data about modifiers suggests. Thus the logical and the compositional properties of likely seem to be predicted, in one stroke, by a degree semantics based on probability.
We should note that, even if we accept a degree analysis, the claim that likely invokes a probabilistic scale is not uncontroversial. In particular, Klecha (2014) has argued that likely is gradable but exploits an open scale. Given that Klecha's semantics still retains some relevant logical properties 24 , the choice between Klecha's semantics and a probabilistic one is irrelevant for us. So we leave an evaluation of Klecha's proposal to other work. For concreteness 23 Holliday & Icard (2013) point out that there are semantics that are based on a qualitative ordering of worlds (different from Kratzer's) that manage to capture the logical desiderata laid out by Yalcin. But these qualitative semantics suffer from other logical problems (as pointed out by Lassiter 2014; see also the discussion in Cariani, Santorio & Wellwood 2016).
24 In particular, that semantics still vindicates the following principle: p is more likely than ⊥ (p ∨ q) is more likely than q (given that q p)

13:20
Probability and implicatures (and because we think it is eventually empirically superior), in this paper we opt for a fully probabilistic semantics.

Semantics for likely
We assume that the denotation of likely is analogous to the denotation of standard gradable adjectives, modulo a different argument type (propositions rather than individuals), and a switch to a probability scale. 25 Following the semantics for probably in Yalcin 2010, we assume that semantic values are relativized to functions e from worlds to probability spaces (besides being relativized to worlds, as is customary.) Probability spaces are pairs E, P r of a set of possible worlds E and a probability measure P r . Officially, P r takes as input subsets of E, but for simplicity we will assume that P r directly maps each world in E to a numerical value in the interval [0,1] in accordance with the standard constraints on probability distributions. 26 Below is our semantics for likely (type d, st, t ).
is the function P r in e(w).) We will use the simplified entry in (53).
Differently from existing analyses of likely, we assume that the meaning of likely does not involve the specification of a threshold on the scale. (This would make it difficult to account for the compositional interactions of likely.) Rather, we assume that, on a par with other gradable adjectives, the standard of comparison used by likely is fixed by a DegP item, and in particular by 25 A semantics that appeals to a notion of probability needs to explain how that notion is interpreted. (For an overview of the options, see Hájek 2012). For current purposes, we simply assume that this is a notion of subjective probability capturing the credences of the speaker. This assumption is simplistic and runs into a series of problems (including the arguments from retractions and disagreements put forward by MacFarlane 2011, and the puzzles generated by so-called epistemic contradictions discussed by Yalcin 2007). But the point is orthogonal to our main concerns in this paper.
26 Specifically, the constraints we need to impose are: (a) P r (E) = 1 (b) If p and q are disjoint sets of worlds, P r (p ∪ q) = P r (p) + P r (q).
For simplicity, we also assume with Yalcin that the space of all possible worlds W is finite.

13:21
Paolo Santorio and Jacopo Romoli pos for the positive form. So the LF of (say) (54) is in (55), with pos moving out for the usual type-mismatch reasons.
(54) It is likely that we will hire Mary.
As explained, we assume that the value s likely is simply a degree of probability that stands out in the context. 27 The truth conditions resulting from these assumptions appear adequate: a sentence like (54) is true iff the probability that we will hire Mary is higher than the contextual threshold for likely.
In the next subsection, we show how this analysis of likely can straightforwardly be extended to certain.

Certain
It is uncontroversial that certain is gradable: it may appear in comparative constructions, as in (57), and compose with degree modifiers, as in (58).
(57) It is more certain that we will hire Mary than that we will hire Sue.
(58) It is pretty certain that we will hire Mary.
We suggest that certain and likely work on scales that are overlapping, if not identical. Here are three pieces of evidence for this claim. First, It is certain that p works as a congruent answer to a question asking about the probability of p. 28 (59) a. A: How likely is it that we will hire Mary? b. B: It is certain that we will hire Mary.
Second, certain entails likely.
(60) It is certain that we will hire Mary It is likely that we will hire Mary.
27 This choice departs from several existing semantics for likely (e.g., Yalcin's official semantics in his 2010), which set this threshold at .5. But there is evidence that the threshold can be lower or higher than this, depending on context, as discussed by Yalcin (2010) himself.
28 We learned of examples of this kind from Klecha 2014. (Klecha uses a variant of (59) to argue that likely and must are not scalemates.)

13:22
Probability and implicatures Third, modified instances of likely may entail modified instances of certain. For example, extremely likely seems to entail almost certain, as witnessed by the awkwardness of (62).
(61) It is extremely likely that we will hire Mary It is almost certain that we will hire Mary.
(62) ?? It is extremely likely, but not almost certain, that we will hire Mary.
These data motivate a degree semantics for certain that uses a scale that overlaps with that of likely -hence, a probabilistic scale. 29 Extending our probabilistic semantics to certain is easy. The entry (in (63), simplified version in (64)), is analogous to that of likely. The only difference is that the meaning of certain dictates that the maximum degree on the scale 'stands out,' in Kennedy's sense.
The resulting semantics yields adequate truth conditions for sentences involving certain: the probability of the prejacent of certain is 1. That is, (65) is true iff the probability that we will hire Mary is 1 (a very similar analysis is defended in Lassiter 2010, 2016 among others).
(65) It is certain that we will hire Mary.

Possible
This section lays out our analysis of possible. From a formal point of view, this analysis is unproblematic and closely connected to our analysis of likely. But the analysis is empirically controversial; in particular, Klecha 2014 has argued extensively that possible is not gradable and that hence it doesn't have a degree semantics. Here we take up the empirical challenge. We are going to agree with Klecha that the evidence presented so far in the literature is unsatisfactory, but we are going to produce new data in favor of gradability from both English and Italian. We will conclude that a gradable analysis is not problem-free, but it is our best bet.

Skepticism about gradability
We start from Lassiter's (2010Lassiter's ( , 2011Lassiter's ( , 2016) defense of the idea that possible is gradable. Lassiter treats possible as a minimum standard adjective exploiting a closed scale. This places possible in a category of items that includes adjectives like acquainted, protected, and documented (Kennedy & McNally 2005); though, interestingly, Lassiter doesn't draw a direct close comparison with adjectives in this group. The data Lassiter uses to support this claim includes the following: In fact, it is more possible that tomorrow is the zombie apocalypse than people magically floating away into the clouds. (Lassiter 2016; web data) Lassiter's empirical claims have been criticized from a number of directions. A first, flat-footed challenge simply concerns sentences like (67)-(69). A number of informants judge that these sentences are not fully felicitous. In addition, Klecha (2014) provides both corpus and experimental evidence against Lassiter's gradability claim. The experimental evidence is particularly significant for us: Klecha tests the acceptability of possible in combination with a four modifiers -very, pretty, too, and the comparative form (more than). The finding is that modification negatively affects acceptability and it crucially does so for possible more than it does with likely.
We agree that these are substantial challenges. But we think that an investigation of other modifiers provides evidence that possible is gradable after all.

Evidence for the gradability of possible
To our knowledge, the debate about possible has focused on a relatively restricted set of modifiers: very, slightly, comparatives like more than, and

13:24
Probability and implicatures proportional modifiers like 60%. But this choice seems arbitrary. We have independent knowledge that some of these modifiers do not combine with closed scale adjectives. Conversely, some modifiers that combine specifically with closed scale adjectives have so far been ignored. Kennedy & McNally (2005) discuss a class of minimum standard adjectives that exploit a closed scale. This class includes the adjectives aware, able, acquainted, documented, understood, publicized. Kennedy and McNally notice that the default intensifier for these adjectives is well. Conversely, most of them may not combine with slightly and very, as the following sentences show.
??John is slightly/very able to do his homework.
(72) ??The election is slightly/very documented/publicized. On a probabilistic analysis, possible turns out to be exactly a minimum standard adjective operating on a closed scale. As a result, we expect it to pattern with (70)-(72). Hence it's not surprising that slightly possible and very possible are awkward. (Let us hasten to point out that the adjectives appearing in (70)-(72) do combine with proportional modifiers, hence if we had a full analogy we would expect, say, 60% possible to be felicitous. More on this shortly.) Conversely, we expect that possible combines with the kind of intensifiers that are acceptable with the adjectives in (70)-(72). In the next paragraphs, we suggest that this prediction is in part borne out, if we look at crosslinguistic data. In particular, we claim that: (i) possible combines with modifiers that are clearly scalar in meaning, and in fact select specifically for closed scales; (ii) the truth-conditional effects of possible-modification involve shifting a threshold along a scale that interacts with the scale used by likely. The emerging pattern of modification is still spotty, but on balance supports a degree analysis for possible.
The items we focus on are the Italian modifier ampiamente and the English modifiers well and very well. We argue that, on the most natural analysis, these modifiers work by shifting a degree parameter introduced by possible.
Italian ampiamente Our first example is the Italian intensifier ampiamente (translatable as 'amply'). Ampiamente possibile is grammatical and intuitively conveys that the relevant event is both possible and also somewhat likely. In addition, there is truth-conditional evidence that modification with ampiamente has the effect of shifting a threshold on a degree scale that is at least overlapping with that used by probabile (which translates likely). In particular, the use of ampiamente possibile suggests that the degree of likelihood of a proposition is not low. This is showed by the contrast between (81) and (82) This contrast survives when possibile appears in combination with modifiers different from ampiamente. For example, the following case shows that del tutto possibile (which roughly translates entirely possible) is compatible with low probability of the prejacent.

13:25
Scenario: Giovanni is fond of buying lottery tickets, despite the fact that his chances of winning the lottery are very low. Maria is irritated by this behavior and comments: "What an idiot! It's obvious that he will never win".
In this scenario, (83) is a fine rebuttal to Maria, but (84) sounds contradictory. Given these data, the natural suggestion is that ampiamente has a scalar meaning, and that, on a par with modifiers like very, it works by shifting upwards a threshold on a degree scale.

13:27
Back to English: well, scarcely and impossible Can we find a counterpart of Italian ampiamente in English? The natural candidate, especially given the pattern individuated by Kennedy and McNally, is well itself. And indeed several of our informants (in particular, native speakers of British English from some areas of Northern England and Northern Ireland) find modifications of possible with well or very well, as in (85), grammatical: It is (very) well possible that it will rain.
Moreover, well and very well are very commonly used to modify epistemic might: (86) It might well be that it rains.
(87) Mary might very well bring her girlfriend to the party.
There are two other suggestive pieces of evidence that English possible has a scalar semantics. The first is that possible combines, at least for some speakers, with another modifier that selects for fully closed scales, namely scarcely.
(88) It is scarcely possible that Rubio will win at this point in the race.

Probability and implicatures
The licensing pattern of scarcely follows closely that of well, and of ampiamente in Italian. On the one hand, scarcely is infelicitous with all adjectives that don't employ a closed scale.
On the other, scarcely seem to be licensed with all the adjectives in the Kennedy and McNally closed scale list.
Therefore scarcely appears to be a genuine scalar minimizer which combines with possible. 31 Second, we observe that possible has a gradable antonym, which can be formed by in-prefixation, namely impossible. Impossible picks out a maximum standard on a scale, as the following data show: It is almost impossible that Rubio will win the nomination.
(94) It's completely impossible that we will hire a semanticist this year.
On the plausible assumption that in-prefixation triggers polarity reversal, this suggests that possible also has a scalar meaning.

Spotty gradability
In summary: possible may combine with genuine scalar modifiers and this modification has the truth-conditional effects we expect on a scalar semantics. At the same time, possible is somewhat awkward with a number of other modifiers, including comparatives and proportional modifiers. So there seems to be gradability, although in an uncharacteristically spotty way. We think that, in this predicament, the best solution is adopting a genuine degree semantics. A semantics that is not based on degrees has two major disadvantages. First, it must treat scalar modifiers like well and ampiamente as ambiguous. These modifiers would have a scalar meaning, which would be the one used in compounds with genuinely gradable adjectives like documented/documentato, and a nonscalar one, which would be the one used in 31 Although it should be noticed that scarcely can apply also to verbs, like in 'John scarcely ran.'

13:29
Paolo Santorio and Jacopo Romoli combination with possible. This is obviously a stipulation. Second, it's just unclear how a non-degree semantics can account for the truth-conditional effects of modification by well and ampiamente. In particular, it's unclear how to account for the fact that well possible φ and ampiamente possibile φ entail that φ is somewhat likely in any straightforward way. Conversely, we think that a theory that treats possible as using degrees will have the resources to explain the spotty pattern we observe. Here we tentatively follow Lassiter 2016 in hypothesizing that there is a constraint dictating a preference for relative standard adjectives over minimum standard adjectives, whenever the two can be used to express the same meaning. On this view, more possible is blocked by the competition with more likely; similarly 40/50/60% possible would be blocked by 40/50/60% likely.
Finally, let us flag another route that deserves investigation. We may revise some of our background assumptions about scale structure. In particular, it might be that the degree scale exploited by possible has a different structure from the one we assume -for example, it might not be a total ordering. While this would be incompatible with a fully probabilistic semantics, it would be in principle compatible with our account of the scalar inferences triggered by possible. This said, for current purposes we set this hypothesis aside and proceed with a probabilistic analysis.

Semantics for possible
We use the semantics in (95) (simplified version in (96)). In this analysis, possible, exactly like likely and certain, has type d, st, t . As usual, the only difference with respect to likely and certain is that possible makes salient the lowest degree on the probability scale. Composition with pos works in the usual way.

λp. [P r (p) ≥ d]
As a result, a sentence like (97) has the truth-conditions in (98): it is true just in case the probability of us hiring Mary is non-zero.
(97) It is possible that we will hire Mary.

Summary
We have argued that certain, likely and possible are gradable adjectives. We model their semantics on the semantics of, respectively, maximum, relative, and minimum standard adjectives. They all operate on a closed scale, which is a probability scale. We also assume that the positive form of likely, certain, and possible involves a covert morpheme pos, which sets the standard of comparison using either information contained in lexical entries or (when the latter is not available) contextual information.

A unified account of possibility implicatures
In this section, we first show how our account replicates the results of the standard analysis, straightforwardly predicting the distributive inferences triggered by likely and certain. We then show how, unlike the standard analysis, our account naturally generalizes to the free choice inferences triggered by possible.

Possibility implicatures with likely and certain
To start, consider a sentence with likely whose prejacent includes a disjunction like (99). Suppose that we exhaustify it as in (99), via an exhaustivity operator taking scope over the whole sentence.
(99) It's likely that we will hire Mary or Sue. It is easy to see that all the alternatives are excludable (e.g., the negation of it's likely that we will hire Mary does not entail that it's likely that we will hire Sue). By adding their negation to the sentence, we get (102). 32 (102) It is likely that we will hire Mary or Sue and it is not likely that we will hire Mary and it is not likely that we will hire Sue.
Given our semantics for likely, (102) is equivalent to the schematic (103), which we can simplify as in (104).
From (104) we can infer that the probability that we will hire Mary is non-zero and the probability that we will hire Sue is non-zero.
(105) It is likely that we will hire Mary or Sue.
a. The probability that we will hire Mary is non-zero b.
The probability that we will hire Sue is non-zero But now, recall out semantics for possible: possible φ is true iff there is a degree of probability d such that the probability of φ is greater than d. This means that (105a) and (105b) entail, respectively: (106) a. It's possible that we will hire Mary b. It's possible that we will hire Sue Hence we predict that likely (φ ∨ ψ) generates standard possibility implicatures.
The derivation that we just illustrated can be replicated straightforwardly when we replace likely with other epistemic modal expressions. First, it can be replicated with certain. By following analogous steps, we derive (107), from which it follows again that the probability of each disjunct has to be non-zero, and therefore that both disjuncts are possible.
(107) P r (M ∨ S) = 1 ∧ P r (M) < 1 ∧ P r (S) < 1 32 We are omitting the negation of the conjunctive alternative which is entailed by the negation of the other two alternatives.

13:32
Probability and implicatures Second, it can be replicated when likely appears in combination with various modifiers, as in the following cases. 33 (108) It is very likely that we will hire Mary or Sue It is 60% likely that we will hire Mary or Sue It is 8% likely that we will hire Mary or Sue It is .00000001% likely that we will hire Mary or Sue . . .
In all these cases, the only difference is the probability threshold that is involved in the meaning of the relevant modal adjective. But this doesn't affect the computation of implicatures; hence, in all these cases, we are able to derive probabilistic inferences, exactly as we did for likely. We leave it to the reader to work out the details.

Giving exh intermediate scope
So far, we have showed how a degree semantics can mimic the predictions of a Kratzer-style approach. Let us now start showing how we get divergent predictions. Recall the judgment we elicited at the end of §3: (26) seems to trigger the inferences in (27), but not those in (28).
(26) It is likely that you will draw a spade or a number.
It is possible that you will draw a spade b.
It is possible that you will draw a number (28) a.
It is not likely that you will draw a spade b.
It is not likely that you will draw a number Yet a Kratzer-style semantics, together with the view of implicature we're adopting throughout the paper, is unable to predict this. Conversely, we can predict this pattern in the framework that we have developed. The key maneuver is letting exh take intermediate scope between pos and likely. Suppose that, rather than parsing (99) as (100), we parse it as follows, letting exh take scope below the λ-abstractor over degrees:

that we will hire Mary or Sue]]]]
33 Here we make the (very natural) assumption that the semantic contribution of the modifiers is to shift the threshold appearing in the entry of likely.

Paolo Santorio and Jacopo Romoli
Let us sketch the derivation. First, the alternatives that exh uses are in (110) (ignoring 'it is' here). Notice that, since pos is not in the scope of exh, the alternatives involve a free variable d. 34 Once more, all alternatives are excludable. The outcome of the computation is (111), where d is a free variable at this stage of the derivation. Assuming again that the relevant contextual standard is .5, the resulting truth conditions are in (112).
Notice that the truth conditions in (112) are crucially weaker than the ones we got by letting exh take wide scope (in (104)). The latter truth conditions ruled out that the probabilities of the individual disjuncts could be higher than the contextual threshold. The truth conditions in (112), conversely, only 34 We are assuming that the value of the free variable d is fixed by the context at the relevant stage of the computation and that, once fixed, it remains constant across alternatives. Notice that the assumption that exh takes scope over clauses involving free variables is independently needed to account for some cases of embedded implicature. For example, consider (i), on the reading in which it is equivalent to (ii).
(i) a. Every student did some of her homework. b.
Every student did some of her homework and no student did all of her homework.
To account for the implicatures of (109), we need to assume that exh takes scope below the DP every student (see Chemla  Hence a degree semantics for likely manages to predict that likely (φ ∨ ψ) triggers possibility inferences without also triggering the inferences that likely φ and likely ψ are false. The key assumption that we used to get this result is that pos can outscope exh. This same assumption will do substantial work in the derivation of the possibility inferences of possible. 35

Predicting free choice under possible
Let us start by showing that, by letting exh take wide scope with respect to pos, we run into the same problems as standard modal semantics. Consider (113), and assume that it is parsed as in (114) Notice that this assumption is in line with the syntactic constraints in §5. We noticed that, while there are constraints on the movement of items in DegP position (which include pos), those constraints are limited to certain categories of items, like DPs and negation. Moreover, even this relatively uncontroversial assumption will become unnecessary on slightly different accounts of local implicatures. For example, a theory in the style of Chierchia 2004, on which exhaustified meanings are computed in parallel with basic meanings and carried along with them in the compositional computation, we don't need any syntactic assumptions at all.

13:35
It is easy to see that the alternatives involving the two disjuncts are not excludable. If we negate (say) it's pos possible that we'll hire Mary, we would get that the probability of us hiring Mary is zero, which in turn entails that it's pos possible that we'll hire Sue is true. As a result, no possibility inferences are computed if we parse the sentence as in (114). 36 Now suppose that (113) is parsed as in (116)  (119) says that the probability of the disjunction is greater than d, and that the probability of each of its disjuncts is less than or equal to d. It follows that the probability of each disjunct must be greater than zero. In other words, (119) entails (120): (120) P r (M) > 0 ∧ P r (S) > 0 Given our semantics for possibility claims, the relevant possibility inferences follow from (120): (121) a. It is possible that we will hire Mary 36 Though the conjunctive alternative is innocently excludable, so (113), on the parse in (114), does give rise to a run-of-the-mill scalar implicature that we will not hire both Mary and Sue.

It is possible that we will hire Sue
Hence we predict free choice, using exactly the same mechanics we used to predict other possibility inferences.
8 Extending the account

Extension to epistemic modals in other syntactic categories
Given the degree semantics adopted above, we can account for all the possibility inferences generated by epistemic modal adjectives (recapitulated below) via a uniform mechanism.
(122) a. It's possible that we will hire Mary or Sue. b. It's likely that we will hire Mary or Sue. c. It's certain that we will hire Mary or Sue.
It's possible that we will hire Mary b.
It's possible that we will hire Sue We have illustrated our analysis using the epistemic modal adjectives possible, likely and certain. But, in principle, what we say can be generalized to modals in other syntactic categories -including modal auxiliaries like might, should and must. For instance, we may adopt a semantics for might that is exactly analogous to that of possible, with a degree variable, which is then existentially quantified over.
(124) might d φ w,e = 1 iff P r e(w) ({w : φ w ,e = 1}) ≥ d Whether we can run the same account of possibility implicatures will depend on the compositional details. In particular, it will depend on whether we can compositionally separate the degree element from the position in which existential closure would happen, and insert an exhaustivity operator in between. In part, this question depends on whether modal auxiliaries should be treated as gradable. Pursuing this question in full is beyond the scope of this paper. Let us just notice again that, at first sight, epistemic modal auxiliaries are perfectly felicitous when modified by well and very well.
(125) It might very well be that we will hire Sue.

Extension to deontic modals
So far, we have focused on free choice inferences arising from epistemic modals. But possibility inferences are generated also by other modals, in particular deontic modals -in fact, the problem of free choice was first pointed out just in connection with the latter (von Wright 1968, Kamp 1973. Below are some examples (see also (6) and (7) in §1).
(126) You are allowed to take Syntax or Logic.

a.
You are allowed to take Syntax b.
You are allowed to take Logic (127) You are required to take Syntax or Logic.

a.
You are allowed to take Syntax b.
You are allowed to take Logic The argument that we're running in this paper is, in part, an argument from generality: our account deserves consideration because it is able to predict the full array of data in a unified way. So it's important that we manage to extend the account to deontic modals as well. Can we do this? From a technical point of view, there is no difficulty. We can simply treat allowed, required, and the like as degree expressions, mapping propositions to a degree scale that has the same formal properties as a probability scale. This will allow us to derive probability inferences via exhaustification exactly as in the case of epistemic modals. At the same time, even granting that deontic modals have a degree semantics, it's unclear that the scale that they exploit has the right formal properties.
To illustrate the problem, take a naïve idea: deontic modal expressions map propositions to a scale of permissibility (set aside, for the moment, exactly what this notion of permissibility amounts to). At an intuitive level, it doesn't seem that the degree of permissibility of a disjunction like (126) is greater than or equal to the degree of permissibility of one of its disjuncts, like (126a) and (126b). 37 In fact, it's not even clear what could be meant by saying that a disjunction is 'more permissible' than its disjuncts. Yet this property is exactly what we need to derive possibility inferences via exhaustification. 38 In this section, we discuss the prospects for formulating a degree semantics of the right kind for deontic modals. Given that our main focus is elsewhere, our discussion will have two significant limits. First, we won't be able to present empirical evidence in support of a degree semantics for these modals. For evidence of this sort (including the observation that the deontic modal auxiliary should can appear in the comparative) see, among others, Portner & Rubinstein 2016. Second, since there is no ready-made semantics we can use, our proposal will be just an outline. Nevertheless, what we say should be enough to show that there are good prospects for generalizing our account to the deontic case.

Deontic modals as measures of expected utility?
One natural attempt at a degree semantics for deontic modals exploits a scale of expected utility. 39 Roughly, expected utility is a measure of how valuable a certain proposition is for an agent, given what else they know about the world. And in fact, an expected utility semantics has been proposed for deontic modals (see Goble 1996 andLassiter 2011; see also Aloni 2005 for related work on imperatives). But this kind of semantics, at least without substantial tweaks, is a nonstarter for our purposes. Let us explain why.
The expected utility (henceforth, EU) of a proposition for an agent is defined on the basis of their credence function (represented, as usual, as an assignment of probabilities to propositions) and their utility function. Roughly, you can think of utility as a measure of how valuable for the agent a certain state of the world is. Since we won't be using EU in our positive proposal for deontic modals, we are not going to define explicitly EU here; rather, we refer the reader to any of the classical discussions or overviews. 40 Any mainstream way of defining EU will do for our purposes. What matters 38 In general, this property is a kind of additivity property. Probability measures satisfy a finite additivity property, but even a weaker additivity property will do; for discussion, see Holliday & Icard 2013. 39 In a sense, any semantics for deontic modals that exploits a notion of comparative closeness (starting from Lewis's semantics in Lewis 1973) counts as a degree semantics. But here we intend to pick out degree semantics that have a kind of quantitative structure, going beyond orderings.
40 For a classical account of expected utility theory, see Jeffrey 1983; for a more recent and very popular version of decision theory see Joyce 1999. See Briggs 2015 for a recent overview.

13:39
Paolo Santorio and Jacopo Romoli to us is that (as we're going to point out) EU accounts yield predictions that are problematic for a theory of the scalar features of modals. Let us sketch two EU-based semantics for deontic modals. We are going to use required as our sample item. Since this example just serves the purposes of illustrating the problem for EU theories, we won't be concerned with the distinction between weak and strong necessity modals. 41 In its general form, an EU semantics for required analyzes required φ as saying that the EU of φ is higher than a certain threshold value. Formally (and introducing a utility function parameter u besides familiar ones): t ALT is a 'threshold' value that is set in relation to a set of alternatives specifying mutually exclusive and jointly exhaustive courses of action. Different analyses diverge on what alternatives should be used. One simple option is that the only relevant alternative is the negation of the prejacent (as in the first account in Goble 1996; a similar idea, though not in the context of an EU semantics, is used by Jackson 1985, Jackson & Pargetter 1986. On this account, the schematic analysis in (128)  I.e., required φ is true iff the expected utility of φ is greater than the expected utility of ¬φ. A second option is to take the relevant alternatives to be a contextually supplied set of mutually exclusive and jointly exhaustive propositions, which specify alternatives courses of action (as in the second proposal in Goble 1996, and in the vicinity of the proposal in Lassiter 2011). On this proposal, (128) is specified as: 41 As a reminder: it is widely acknowledged (see e.g. von Fintel & Iatridou 2008) that some deontic modals are stronger than others. For example, required and have to (which are socalled strong necessity modals) are stronger than ought (a so-called weak necessity modal).
The following contrast illustrates the difference.
(i) You ought to wash the dishes. In fact you're required to do so/you have to.
(ii) #You're required to/you have to wash the dishes. In fact, you ought to do so.

13:40
Probability and implicatures (130) required φ w,e,u = 1 iff {w : φ w,e = 1} = BEST EU e(w),u(w) (ALT ) (where BEST EU e(w),u(w) (ALT ) is the set of members of ALT that maximize expected utility, relative to e(w) and u(w)) Informally: required φ is true iff φ coincides with the union of the alternatives that maximize expected utility. The choice between different versions of (128) doesn't matter for our purposes, so we don't pursue the issue further. Expected utility accounts of deontic modality (henceforth, "EU accounts") have much to recommend them. But they also have drawbacks, some of which have been pointed out in existing literature (see, e.g., Cariani 2016a,b). For reasons of space, here we limit ourselves to pointing out two problems that relate closely to the scalar properties of modals.
The first problem is very simple and very general. The problem is that EU accounts yield systematically wrong predictions when deontic modals appear embedded in DE environment. (131) entails the claims in (131a) and (131b): 42 (131) Sally is not required to discuss her paper with John or Mary.

a.
Sally is not required to discuss her paper with John b.
Sally is not required to discuss her paper with Mary To better see the empirical point, notice that (132) sounds inconsistent.
(132) #Sally is not required to discuss her paper with John or Mary. But she is required to discuss it with Mary.
These facts are fully expected on classical analyses for required. These analyses are upward monotonic in the prejacent position, and hence validate the entailment required φ required (φ or ψ). Since downward entailing contexts reverse the direction of entailment, these analyses predict that (131) entails (131a) and (131b). But all versions of (128) are nonmonotonic, and hence fail to validate the relevant entailment. As a result, EU accounts miss the entailment in (131), and predict that the discourse in (132) is consistent.
The second problem is less general -since it presupposes a Sauerland/Foxstyle account of distributivity inferences -but still substantial. We saw that, 42 See also von Fintel 2012 for discussion and see Cariani 2013 for an alleged counterexample.
While we feel the pull of Cariani's example, we hasten to point out that isolated counterexamples to the entailment we discuss can be easily accommodate via embedded implicatures.
The key point is that EU accounts predicts that entailments like that in (131) should fail systematically, since the semantics of required is nonmonotonic. We think that the overall data clearly speaks against this prediction.

13:41
Paolo Santorio and Jacopo Romoli on EU accounts, required φ is not in general stronger than required (φ or ψ). This is sometimes claimed to be an advantage of EU accounts (e.g., by Lassiter 2011). 43 But it creates problems for the computation of implicatures, since it blocks the derivation of the distributive inferences in (127a) and (127b).
(127) You are required to take Syntax or Logic . a.
You are allowed to take Syntax b.
You are allowed to take Logic Recall: the inferences in (127) are derived by (i) assuming that (133a) and (133b) (below) express stronger propositions than (127), and (ii) assuming that implicatures are computed via exhaustification of stronger alternatives that are excludable.
(133) a. You are required to take Syntax. b. You are required to take Logic.
On EU accounts, the computation stops because step (i) is blocked. On the one hand, this seems a drawback of EU accounts. On the other, it makes these accounts look like nonstarters for our purposes. If it can't get us even standard distributivity inferences, a fortiori this analysis won't get us free choice.

A probabilistic semantics for deontic modals
We suggest a minimal variant of classical ordering semantics for deontic modals. (See Kratzer 1986, as well as Lewis 1973.) Recall a schematic Kratzer-style analysis of deontic required: (134) required φ w,f ,g = 1 iff ∀w ∈ best g(w) (f (w)), φ is true at w We suggest that we move to a degree semantics simply by replacing the quantificational element of (134) with a mapping to a probability scale. According to (134), required φ is true iff the set of best worlds entails φ. On our 43 One of the reasons is that it blocks problematic-sounding inferences like the one from (i)-a to (i)-b, which are validated by quantificational semantics. (This is known in classical literature as 'Ross's puzzle,' after Ross 1944.) (i) a.
You are required to post this letter. b.
You are required to post this letter or burn it.

13:42
Probability and implicatures proposal, required φ is true iff the probability of φ, conditional on the world being one of the best worlds, is 1. 44 Here is a schematic entry (for simplicity, we incorporate the compositional contribution of pos directly in the truth conditions).
(135) pos required φ w,f ,g = 1 iff P r (w : φ w,f ,g = 1|{w : w ∈ best g(w) (f (w))}) = 1 For current purposes, we leave open how the notion of probability in (135) should be interpreted. Plausibly, in some cases (broadly, subjective uses of deontic modals) it will represent the credences of the speaker or a group of agents, while in others (broadly, objective uses) it will capture an objective notion of probability (be it chance, evidential probability in the sense of Williamson 2000, or something else). 45 As usual, allowed can be defined as the dual of required: allowed φ is true just in case the probability of φ, conditional on the proposition including all best worlds in the modal base, is non-zero .
It's easy to check that this analysis renders modals monotonic on their prejacent, hence the two problems we pointed out in 8.2.1 are sidestepped.

Predicting possibility inferences
Let us show that the analysis, like its counterpart for epistemic modals, predicts all the possibility inferences we observe via a unified mechanism. Let's focus first on required, which we take to have the entry: (137) required w,f ,g = λd. λp. P r (p|{w : w ∈ best g(w) (f (w))}) ≥ d 44 We are assuming a standard way of defining conditional probability as a ratio: 45 Let us point out that our basic maneuver -replacing a quantifier with a probability function -is compatible with a more radical departure from Kratzer. Our semantics still singles out a set of 'best' worlds via an ordering source. But one might also single out best worlds in a different way -e.g., as worlds via expected utility. Our main reason for sticking to a more conservative setup here is that the latter already yields all that we need.

Paolo Santorio and Jacopo Romoli
Assuming composition of pos in the usual way, (127) is predicted to have the truth-conditions in (138): (127) You are pos required to take Syntax or Logic.
(138) (127) = 1 iff P r ( S ∨ L | best )= 1 As usual, we derive distributive inferences via exhaustification. We assume that, on its strengthened reading, (127) is parsed as including an occurrence of exh (which we assume to be below pos, though the issue is of no consequence in this case). Given standard assumption about alternatives, the truth conditions of (140) are: As above, (140), in turn, entails (141).
(141), given our meaning for allowed, yields the relevant possibility inferences.
(127) You are required to take Syntax or Logic.

a.
You are allowed to take Syntax b.
You are allowed to take Logic The same account holds for possibility inferences with allowed.
In analogy with what we have done so far, we assume the following entry: The basic LF of (126) (assuming that the modal takes wide scope with respect to the subject) is in (143) Given our assumptions about pos and the entry in (142), the schematic truth conditions of (126) are: Free choice is derived, exactly as it happens for epistemic modals, by using an exhaustivity operator that takes scope above the modal, but below pos. Hence, on the exhaustified reading, (126) has the LF: Given standard assumption about alternatives, the truth conditions of (145) are: The truth conditions in (146) entail, among other things: which gives us the free choice inferences, repeated below in (148).
(148) You are allowed to take Syntax or Logic.

a.
You are allowed to take Syntax b.
You are allowed to take Logic

Other modals
Epistemic and deontic modals are the paradigm examples of modals giving rise to possibility inferences. But these inferences have been discussed also in connection with modals of other flavors. One interesting case is that of ability modals. Ability modals give rise to possibility inferences in some cases, but block them in others. For a case of the latter kind, consider (149 Jenny can outsmart a lawyer Here we don't have space to discuss these modals, in particular with respect to the contrast between (149), on the one hand, and (150) and (151), on the other (see Nouwen 2017 for a proposal). But let us point out that our account could in principle be extended to ability modals. The trick is, again, to decompose the meaning of ability modals in a measure function and a kind of quantifier over degrees, and insert an exhaustivity operator between the two -i.e., to treat ability modals that give rise to free choice as exploiting the usual configuration: Again, this can be done by replacing all-or-nothing logical notions like consistency and entailment with probabilistic notions. For a toy example, 47 suppose that we treat can as a possibility modal operating over an appropriately restricted set of possibilities. Much like the semantics for epistemic modals above, can φ will say that the probability of its prejacent is non-zero . The resulting truth conditions for (153) are in (154) Up to now, we have only considered instances of free choice where the modal operator takes wide scope over disjunction. But it is well-known that free choice inferences are generated also in cases where disjunction appears to take scope over modals. E.g., (155) suggests that rain is possible, and that so is snow (Legrand 1975, Zimmerman 2000, Geurts 2005, Simons 2005, Fox 2007 among others).
(155) It's possible that it will rain or it's possible that it will snow.

a.
It's possible that it will rain b.
It's possible that it will snow The present proposal, like most other implicature-based accounts, doesn't address this puzzle. 48 But let us observe that it is compatible with independent accounts of wide scope free choice, and in particular with attempts at reducing wide scope free choice to narrow scope free choice via syntactic or semantic means. (See, e.g., the proposal in Simons 2005, on which in a sentence like (155) the modal undergoes covert movement and takes scope over disjunction; though see also Alonso Ovalle 2005, §4.13, for some substantial problems with Simons' proposal.) Second, free choice obtains also with negated conjunction, as noticed by Fox 2007.
(156) It's not certain that we will hire Mary and Sue.

a.
It's possible that we will not hire Mary b.
It's possible that we will not hire Sue In our framework, the inferences in (156a) and (156b) can be derived by using two exhaustivity operators -one below and one above pos, as in (157) The prejacent is stronger than all alternatives, so exh is vacuous here. It does, however, change the alternatives which are then fed to the second exh (represented in (159)). In particular, the negated exhaustified disjunctive alternative is crucial once the second exhaustivity operator enters the derivation.
The second exh quantifies over the alternatives above and the result is in (160). That is, we conclude that the probability of hiring Mary and Sue is less than one, that hiring one or the other is 1, and, crucially, that the probability of neither disjunct is 1. As a result, we derive the relevant possibility inferences.

Brief comparison with alternative proposals
We don't have the space to compare in detail the present proposal to other accounts in the literature. But let us make some brief remarks about its relation to two successful scalar approaches, i.e. those in Fox (2007) and Klinedinst (2007). 49 Let us start with Fox's account. Fox assumes a standard existential analysis of possibility modals. Free choice is derived by assuming that sentences like (161) are parsed as involving two exhaustivity operators, as in (162). (161) It is possible that we will hire Mary or Sue.

Probability and implicatures
The crucial element of the account is that the outermost exh exploits alternatives that have been already exhaustified. In particular, the alternatives for the outermost exh include the exhaustified disjuncts. (163) exh[It is possible that we will hire Mary] ♦M ∧ ¬♦S exh[It is possible that we will hire Sue] ♦S ∧ ¬♦M . . .
This in turn means that the second exh winds up negating the exhaustified alternatives above, giving rise to free choice as in (164).
Fox's account extends to all types of modals and to nominal quantifiers. 50 The advantage that we claim for our account over Fox's is that we give a unified story for deriving all possibility inferences. This covers both distributivity inferences and free choice ones. By contrast, Fox has to resort to different mechanisms for the two cases. His account of distributive inferences is the one involving ordinary exhaustification, and sketched in §3; conversely, free choice requires double exhaustification. Of course, this is not enough to establish that our account is overall better. But it suggests that it deserves consideration as a competitor.
Let us move on to Klinedinst's (2007) proposal. This is probably the closest proposal to ours, at least in spirit. Klinedinst derives free choice inferences as a kind of embedded distributivity implicature, by making a key background assumption, i.e., that all modals (and, among them, possibility modals) are plural quantifiers over worlds containing a distributivity operator dist. 51 As a result, (165) says (roughly) that there is a plurality of worlds such that in each of them we will hire Mary or Sue.
(165) It is possible that we will hire Mary or Sue.
50 Notice also that Fox's (2007) account is in principle compatible with ours. That is, it is easy to show that global recursive exhaustification would give rise to free choice in the same way as above. Against this background, Klinedinst's suggestion is that we embed an exhaustivity operator below the modal, but above dist. Like Fox's account, Klinedinst's (2007) extends to quantifiers over individuals. Hence Klinedinst can account for free choice-type inference triggered by plural quantifiers (as in (170)) with the very same tools.
(170) Some students took three months or didn't finish at all.
a. Some students took three months b.

Some students didn't finish at all
A natural worry about Klinedinst's proposal is that it crucially relies on a plural semantics for modals, and moreover on the presence of a covert distributivity operator dist in modal sentences. While there is some evidence for the former, the latter assumption comes with no independent motivation. In fact, one can easily raise overgeneration worries about dist. A theory that assumes the presence of a distributivity operator predicts that various items should be able to take scope between the modal and dist -i.e., exactly where 13:50 Probability and implicatures exh is located on Klinedinst's theory. To take a concrete example, consider (171).
(171) Nobody is allowed to enter.
Klinedinst predicts that (171) has, among others, a reading with the following schematic truth conditions: ∃ > Nobody > dist (172) says that there is a plurality of (deontically ideal) worlds such that no individual entered in all of them. I.e., (172) is false only if there is some individual who enters in all (deontically ideal) worlds; hence, on this reading it should be equivalent to the most prominent reading of No one is required to enter. Obviously (172) doesn't have a reading of this sort. 52 Also in this case, this point alone is not enough to give our account a definitive advantage. For one thing, Klinedinst could appeal to scope constraints that are similar to the ones we postulate for pos. (Though the latter constraints are independently motivated in the literature, while to our knowledge no analogous arguments have been given about dist.) Let us also emphasize again that one advantage that, at present, both Fox's and Klinedinst's accounts have over our own account is that both naturally generalize to free choice-type inferences triggered by nominal quantifiers, like 52 Thanks to an anonymous referee for suggesting this way of putting the point.
Let us notice that Klinedinst could appeal to whatever constraints on scope disallow a similar reading with other cases involving distributive operators, like bare plurals. Similarly, (i), does not have a reading which would be true if there is a plurality of students such that no individual met all of the member of such plurality (but, say, each individual met one of the students).
(i) Nobody met students.
It is unclear that the parallelism between plural definites and modals hold when we move to other contexts. For instance, a non-monotonic case like (ii) can be shown not to have a reading in which exactly two scopes in between existential quantification and dist. But it is unclear that this reading is not there for its plural definite counterpart in (iii).
(ii) Exactly two people are allowed to enter.
(iii) Exactly two people talked to the students.

13:51
Paolo Santorio and Jacopo Romoli (170). Conversely, it remains to be seen how our account could be extended beyond modals. 53 This is one of the issues we have to put off to future work.

The optionality of free choice
It is an established point in the literature that all possibility inferences, including free choice ones, are optional effects. For example, consider: (173) Mary is allowed to take Syntax or Logic.
(173) has a reading that doesn't imply that Mary is allowed to take Syntax and that she is allowed to take Logic. Rather, it suggests that the speaker is simply ignorant about which disjunct is true. 54 This reading is sometimes called an 'ignorance reading,' and is brought out by continuations like that in (174).
(174) Mary is allowed to take Syntax or Logic, but I don't remember which one.
Ignorance readings of this kind are available also for epistemic modals, though they are somewhat harder to obtain. To generate them, we need a context that makes salient a body of information that entails, but it not entailed by, the speaker's information. For an example, consider the following scenario.
Department X has voted an offer to a candidate. The details are not public, but my friend, who is in Department X's search committee, has let slip information suggesting that the offer might have gone to a certain candidate. Unfortunately, I have forgotten whether this candidate is Mary or Sue.
Against this background, one says: (175) It's possible that Department X made an offer to Mary or Sue, but I don't remember which.
53 Notice also that in principle our account is compatible with Klinedinst's account for the nominal quantifier cases, since the possibility of having a distributivity operator for nominal quantifiers is not in question.
54 Ignorance inferences standardly arise from a standard Gricean maxim of Quantity not based on alternatives; see Sauerland 2004, Fox 2007 among others for discussion.

13:52
Probability and implicatures (175) is perfectly appropriate in this scenario. Moreover, notice that (thanks again to the conjunct I don't remember which) the usual possibility inferences are blocked. Of course, upon hearing (175) we can infer that it's compatible with the speaker's knowledge both that Mary is the recipient of the offer and that Sue is. But possible in (175) evidently doesn't merely describe the speaker's knowledge.
Here we want to point out that our account, on a par with other exhaustivitybased accounts, has no difficulties predicting the optionality of possibility inferences. We predict that a sentence like (176) has a number of available parses.
(176) It's possible that Department X made an offer to Mary or Sue.
In 7.1, we have pointed out that (176)  In this case, we predict no implicature at all. Of course, we also need to say something about when the parse involving an exhaustivity operator is available. For concreteness, here we simply rely on the suggestion in Fox 2007. Fox's idea is that speakers adopt the exhaustified parses to avoid ignorance inferences. More precisely: Fox assumes that speakers will first attempt to parse a sentence without an exhaustivity operator. If this parse gives rise to ignorance inferences, the sentence is reparsed as involving one (or possibly more) occurrences of exh.

Conclusion
Disjunctions in the scope of epistemic modals give rise to scalar inferences to the effect that each disjunct is epistemically possible. These inferences are usually called and treated differently. Those generated by non-possibility modals are labeled 'distributive inferences,' and derived as scalar implicatures. Those generated by possibility modals are called 'free choice infer-