Compositional trace conversion *

In order to eliminate traces as stipulated grammatical objects, syntactic movement has been reformulated in terms of multiple-merge : it is the result of the same constituent being merged into the structure multiple times, using either copies or multidominance structures . In spite of their empirical and conceptual advantages, multiple-merge theories pose known challenges for the semantic interpretation of movement, as there are no variable-denoting traces in lower positions. The most common means of resolving this conundrum is trace conversion (Fox 2002, 2003), in which either a syntactic operation makes alterations at lower merge sites in order to generate trace-like interpretations, or the semantics behaves as if such a syntactic operation had occurred. In this paper I discuss problems faced by presently formulated versions of trace conversion and propose an alternative, compositional trace conversion , in which multiple-merge structures can be directly interpreted in a straightforwardly compositional manner. This approach is shown to generalize well, extending to modals and degree phrases as well as DPs.


Introduction
For much of the history of generative syntax, displacement -the apparent tendency for constituents to simultaneously occupy multiple syntactic locations -has been cast in terms of movement: a constituent seems to occupy multiple locations because it starts in one spot and moves to another, leaving a trace. For example, consider (1) on its inverse scope (every > a) interpretation: (1) A student likes every teacher.
Here every teacher seems to simultaneously occupy both its overt position as the internal argument of like, and a higher position at which it outscopes a student. This is typically explained through quantifier raising (QR): every teacher covertly moves past a student, leaving a trace that is interpreted as a bound variable argument to like: (2) [ TP every teacher 2 [ TP a student 1 T [ VP t 1 like t 2 ]]] However, the existence of traces as a distinct kind of syntactic object has come under fire in the past couple of decades. Traces have several seemingly undesirable properties, including that (i) they are stipulated as part of the language faculty instead of being derived from prior principles; (ii) they are non-lexical objects inserted into syntactic computations; 1 and (iii) their insertion is countercyclic, since they are placed in locations vacated by moving constituents. For these and other reasons, one aim of the Minimalist Program (Chomsky 1995) has been to do away with traces, and to derive those empirical facts normally attributed to them from other grammatical operations and principles whose core motivations are clearer.
Researchers have mostly coalesced around a single broad strategy for accomplishing this. We know that some sort of structure-building operation -what Chomsky (1995) calls Merge -is needed to generate syntactic structures to begin with, meaning that this structure-building operation can be taken for granted as part of the language faculty. So what if, when some constituent X undergoes "movement", X is in fact simply merged back into the structure again, this time at a higher point in the tree? This eliminates the need for traces and the stipulations that come with them: displacement is not movement-plus-trace-insertion, but rather a single constituent appearing in multiple syntactic locations, by means of an operation that is independently motivated on basic conceptual grounds. I will refer to analyses in this broad program as multiple-merge theories of movement.
Under the multiple-merge umbrella, two main candidate theories have emerged. The first is the copy theory of movement (Chomsky 1995), in which a "moving" constituent is simply copied, with the newly created copy merged at the destination of movement. Thus, the trace-laden LF in (2) might be replaced with (3): 2 (3) [ TP every 2 teacher 2 [ TP a 1 student 1 T [ VP a 1 student like every 2 teacher]]] The second approach is the multidominance theory of movement (Starke 2001, Gärtner 2002, Johnson 2012. According to multidominance theories, it is possible for a single constituent to have multiple mothers, and this is precisely the configuration that arises in the case of movement: when constituent X moves from below Y to the specifier of ZP, no copies are made, and instead we end up with a structure in which X has both Y and ZP as mothers. In other words, rather than having two indistinguishable copies of X, it is quite literally the same constituent that sits in both positions. Thus, the translation of the LF in (2) into the multidominance theory of movement will look like Fig. 1. In spite of their empirical and conceptual advantages, multiple-merge theories pose well-known challenges for traditional approaches to semantic composition. A common means of interpreting LFs like (2) is to treat traces as denoting variables bound by the lambda-abstracting nodes 1 and 2 , generating predicates that serve as arguments to their respective quantifiers (Heim & Kratzer 1998). But this view of compositionality seems untenable in the face of LFs like (3) and Fig. 1, as it would require that a DP be interpreted as a true quantifier at its highest merge site, and as a bound variable at lower merge sites. Put another way, there is an apparent tension between 2 Since I eventually adopt the syntax in (3) for my own semantic analysis, a note on syntactic assumptions is in order. I assume that indices are semantically interpretable features that a constituent (e.g., DP) inherits from its head (D), and that when movement occurs, a lambdaabstractor is inserted that is co-indexed with the moving constituent, or equivalently, with its head. On multiple-merge theories this also means that the lambda-abstractor will be coindexed with the "trace" (the lower iteration of the DP), as in traditional semantic analyses like Heim & Kratzer 1998. Finally, I assume that distinct operators are assigned distinct indices, though assigning multiple operators the same index could be an intriguing way to analyze across-the-board movement (cf. Fox & Johnson 2016). Whether lambda-abstractors can be inserted without being triggered by movement is an issue left for future work. For further discussion of issues related to indices and lambda-abstractors, see Section 6. Multidominance LF for inverse scope of A student likes every teacher.
the following principles of semantic interpretation: (i) a quantificational DP introduces quantification at its highest merge site; (ii) quantificational DPs do not introduce quantification at lower merge sites, and are interpreted as bound variables; and (iii) structurally identical DPs are semantically identical, regardless of whether it's two indistinguishable copies (copy theory) or the same DP interpreted at multiple merge sites (multidominance). The most commonly adopted means of dissolving this tension is trace conversion, proposed by Fox (2002Fox ( , 2003 within the confines of the copy theory of movement. Trace conversion is a post-syntactic operation that replaces lower copies of determiners with bound definite determiners: 3 (4) (3) after trace conversion: [ TP every 2 teacher 2 [ TP a 1 student 1 T [ VP the 1 student like the 2 teacher]]] We can then posit that the denotation of the , evaluated with respect to a variable assignment , takes a predicate and returns ( ) if ( ) is a , and is otherwise undefined. In other words, the ( ) denotes a restricted variable: (5) the = ∶ ( ( )). ( ) For example, the 1 student denotes (1) iff (1) is a student (and is otherwise undefined), and the 2 teacher denotes (2) iff (2) is a teacher. Hence, replacing a 1 student and every 2 teacher with the 1 student and the 2 teacher permits precisely the sort of bound variable reading required for successful composition. An alternate version of trace conversion, mentioned as a possibility by Fox (2003), shifts the burden from the syntax to the semantics: there is no syntactic operation that modifies lower copies, but instead the compositional semantics interprets multiple-merge structures as if some such syntactic operation had taken place. In Section 2 I discuss a specific semantic prediction of trace conversion: when a DP undergoes movement, its NP restrictor semantically contributes not only at the DP's highest merge site, but also at its lower merge sites in the form of a domain restriction. I will also briefly discuss some empirical arguments that have been made in favor of this hypothesis, and thus in favor of multiple-merge over trace-based theories of movement. However, in Section 3 I go over some critical downsides to trace conversion, based on a mixture of empirical observations, theory-internal considerations, and broader principles of theoretical simplicity and elegance. In brief, syntactic trace conversion is difficult to motivate on independent syntactic grounds and requires abandoning well-motivated syntactic principles (namely, Chomsky's (1995) Inclusiveness Condition), as well as additional stipulations to account for non-DP scope-taking. Meanwhile, semantic trace conversion runs afoul of basic principles of semantic compositionality, since the same DP must be interpreted differently in different locations. The goal in the rest of the paper will be to retain the benefits of trace conversion without these downsides.
In Section 4 I turn to my own analysis, which I refer to as compositional trace conversion, and which allows LFs like (3) and Fig. 1 to be interpreted compositionally and without syntactic modifications. Put simply, the semantic impact of trace conversion -that is, the "swapping out" of a quantificational interpretation for a bound definite interpretation at lower merge sites -is automatically triggered by the operation of lambda abstraction. In Section 5, I show how the analysis can easily be type-generalized and thus extended beyond DP quantification, accounting for the scope-taking behavior of both modals and degree phrases in comparatives. I offer some concluding remarks and lines for potential future research in Section 6. 4

The Interpreted Lower Restrictor Hypothesis
Before discussing the downsides to trace conversion, it is worth going over one of its positives. Consider again the post-trace-conversion LF in (4). As discussed previously, the 1 student denotes a restricted variable: (1) iff (1) is a student, and otherwise undefined. Likewise for the 2 teacher: (6) the 1 student like the 2 teacher is defined iff student( (1)) and teacher ( (2)).
Where defined, the 1 student like the 2 teacher = like( (1), (2)) Because the 1 student denotes a restricted variable, when we lambda-abstract over that variable we get a predicate whose domain is restricted to students: (7) 1 the 1 student like the 2 teacher is defined iff teacher( (2)). Where defined, 1 the 1 student like the 2 teacher = ∶ student( ). like( , (2)) In other words, because of the definition of the , we predict NP restrictors in the "traces" of DP movement to make semantic contributions in the form of domain restrictions. I will call this the Interpreted Lower Restrictor Hypothesis (ILRH): (8) Interpreted Lower Restrictor Hypothesis (ILRH): When a DP of the form [ DP D NP] undergoes movement, NP is also semantically interpreted at the trace position, so that after lambda abstraction the resulting predicate is restricted to individuals in NP .
Several empirical arguments have been offered in favor of ILRH. For example, Erlewine (2014) provides evidence from association with focus -in short, it seems that focus-sensitive operators can associate with focused material inside of traces -and Romoli (2015) follows Chierchia (1995), Fox (2002), and 4 For reasons of space I cannot discuss other attempts at a Minimalism-friendly semantics, which make more fundamental revisions to traditional assumptions about the syntaxsemantics of quantification. These include Gotham's (2018) LF-less "Glue semantic" approach, as well as the analyses of Johnson (2012) and Fox & Johnson (2016) in which QR does not involve movement of quantificational heads. Comparing these theories is of course important work; my analysis can be thought of as an attempt to bring the traditional approach closer to its best form in order to facilitate such comparisons. 14:6 Compositional trace conversion Sportiche (2005) in using ILRH to account for the famed conservativity hypothesis (Barwise & Cooper 1981, Keenan & Stavi 1986. However, to keep things brief I will only discuss ILRH in relation to the treatment by Sauerland (1998Sauerland ( , 2004 of certain puzzling facts pertaining to antecedent-contained deletion (ACD), augmented by Fox's (2002) proposal connecting ACD to extraposition. This choice is motivated by two factors: the empirical puzzle provides particularly compelling evidence for ILRH, and going over the analysis will help illustrate some key concepts that will prove useful in later discussion.

Setting the table: ACD and the Kennedy-Sauerland Puzzle
Antecedent-contained deletion refers to ellipsis sites that, at least by appearances, are contained within their own antecedents. Consider (9): (9) Lisa read every book that Anna did.
The elided VP is inside the relative clause, and its antecedent is the matrix VP. The apparent containment of the ellipsis within its own antecedent is illustrated in (10):

]]
If the ellipsis is truly contained within its own antecedent, this poses an apparent problem: the ellipsis and antecedent VP cannot match without an infinite regress. To avoid this, traditional accounts of ACD posit that the DP headed by every QRs outside of the VP (Sag 1976, May 1985: With the ellipsis site no longer contained within its antecedent, ellipsis resolution can be achieved without any infinite regress. The empirical evidence for ILRH that we will be discussing comes from a particular puzzle concerning ACD, which I will call the Kennedy-Sauerland Puzzle: (12) Kennedy-Sauerland Puzzle (Sauerland 2004: p. 64): a. * Polly visited every town that is near the lake Erik did.

b.
Polly visited every town that is near the one Erik did. 14:7

Robert Pasternak
The first half of the puzzle, initially observed by Kennedy (1994), is the illformedness of (12a). This sentence is well-formed without the VP ellipsis, indicating that its ill-formedness is due specifically to a lack of ellipsis licensing.
(13) Polly visited every town that is near the lake Erik visited.
But on traditional analyses there is no reason to suspect that (12a) should be ill-formed: in the LF in (14), the ellipsis is outside of its antecedent, just as it is in (11).
(14) [ DP every 1 town that is near the lake Erik did The second half of the puzzle, noted by Sauerland (1998Sauerland ( , 2004, is the fact that (12b) -identical to (12a) except that lake is replaced by one -is wellformed. This indicates that whether or not ellipsis is licensed in examples like (12) must be sensitive to the choice of noun that the most deeply embedded relative clause adjoins to (lake vs. one), an even stranger result on traditional analyses. Sauerland (1998Sauerland ( , 2004 offers an account of this puzzle that hinges on ILRH. But before going into the specifics of Sauerland's solution, it will help to first go over a particular proposal by Fox (2002) connecting ACD to extraposition. Fox (2002) follows Baltin (1987) in arguing that ACD necessarily involves (often string-vacuous) extraposition, the same process separating the relative clause from book in (15) He adopts Fox & Nissenbaum's (1999) analysis of extraposition, which crucially relies on multiple-merge. An important empirical observation about extraposition is what Fox refers to as Williams's Generalization (after Williams (1974)), which states that any scope-taker must scope at least as high as any 5 For arguments against Baltin's analysis, see Larson & May 1990. See Fox 2002for counterarguments. 6 Fox & Nissenbaum (1999 note that both nominal arguments and adjuncts can undergo extraposition. However, argument and adjunct extraposition seem to be two distinct processes; the empirical findings and analysis under discussion are specific to adjunct extraposition.

Fox 2002 and the ACD-extraposition connection
14:8 Compositional trace conversion adjunct extraposed from it. An illustration of Williams's Generalization can be seen in (16): (16) Illustration of Williams's Generalization (Fox 2002: p. 72): a. I read every book that John had recommended before you did.
b. I read every book before you did that John had recommended.
The extraposition-less (16a) is ambiguous. If every scopes below before, the resulting interpretation is that I was the first to make my way through John's list, though certain books you might have finished first. If every outscopes before, the interpretation is stronger: each book was finished by me first. But (16b), in which the relative clause is extraposed past before, is unambiguous, and only the latter reading is available.
In their analysis of extraposition, Fox & Nissenbaum (1999) follow Lebeaux (1990) in positing that adjuncts can be late merged, i.e., adjoined to the constituent they modify after that constituent has already merged into a larger structure and undergone movement. Fox & Nissenbaum further propose that late merger can occur after QR, and that this is what happens in the case of extraposition. In (16b), for example, every book QRs past the before-phrase, with the relative clause adjoining to the higher, unpronounced copy of book. If QR is stipulated to be rightward, this generates the correct word order: (17) [I read [every 1 book] before you did] [every 1 book that John…] Importantly, by tying extraposition to QR in this fashion, Williams's Generalization can be accounted for. After all, in order to generate the extraposed structure, every book had to QR past the before-phrase. Trace conversion then converts this structure into one that is semantically interpretable, and the interpretation is one in which every outscopes before: (18) [ 1 I read the 1 book before you did] [every 1 book that John …] Therefore, Fox & Nissenbaum (1999) rightly predict that the only way to generate the string in (16b) is by means of a structure in which every book outscopes the before-phrase, thereby eliminating the ambiguity seen in (16a). Fox (2002) proposes that this analysis of extraposition be extended to ACD examples like (9): the ellipsis-containing relative clause is late merged after QR, leading to (string-vacuous) extraposition and the approximate LF structure seen in (19), with gaps to be filled in shortly: Note that on this analysis "antecedent-contained deletion" is in fact a misnomer: since the relative clause containing the ellipsis is late merged after QR, there is actually no point at which the ellipsis is contained within its antecedent.
Many of Fox's arguments for connecting ACD to extraposition in this manner are beyond the scope of this paper. However, one such argument comes precisely from its utility when combined with Sauerland's (1998Sauerland's ( , 2004 account of the Kennedy-Sauerland puzzle, to which we now turn.

Resolving the Kennedy-Sauerland Puzzle
Sauerland's solution to the Kennedy-Sauerland puzzle needs two additional ingredients. First, we adopt a PF deletion theory of ellipsis, in contrast to an LF copying theory: an ellipsis site is not an empty constituent that is subsequently "filled in" at LF, but a fully syntactically realized constituent that can be erased at PF under certain conditions. As for what these conditions are, Sauerland follows Rooth (1992) and Fox (1999) in connecting ellipsis licensing to contrastive focus. However, for our purposes we can adopt a simpler picture: antecedent X can license the ellipsis of Y only if X and Y are semantically identical, modulo indexation. Thus, see herself 1 and see herself 2 count as sufficiently similar, but not see her 1 friend and see her 1 enemy.
Second, Sauerland proposes a matching analysis of relative clauses: in an NP with an adjoined relative clause like lake that Erik visited, the head noun lake also occurs inside the relative clause. More specifically, the internal argument of visit is saturated by the DP Op lake, where Op is a wh-like determiner analogous to which. This DP then undergoes movement to the left periphery of the relative clause, whereupon its noun is deleted under identity with the adjoined-to lake: After trace conversion this NP will look as in (21) at LF. For convenience I ignore the higher merge site of Op lake, since how it is or is not integrated into the compositional semantics is not relevant for the present discussion. (However, see Section 5.3 for some discussion of the relevance of this issue to the theory proposed in this paper.) Thus, by combining multiple-merge, ILRH, and the matching analysis of relative clauses we predict that the semantic interpretation of the head noun 14:10 Compositional trace conversion should also be visible inside the relative clause, in the form of a domain restriction.
This can then be combined with Fox's (2002) extraposition-based analysis of ACD, leading to the post-trace-conversion LF for (9) shown in (22) Notice that the antecedent (read the 1 book) and the ellipsis site (read the 2 book) are indeed semantically identical modulo indexation. Hence, we rightly predict the availability of ellipsis and the grammaticality of (9). By adopting these assumptions, the Kennedy-Sauerland facts in (12) fall out immediately. Let us start with the ill-formed (12a), which we predict to have the following post-trace-conversion LF: [every 1 town (Op 2 town) 2 that the 2 town is near [the lake (Op 3 lake) 3 Erik visited the 3 lake ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ This is a perfectly well-formed LF structure, hence (13). However, ellipsis is not licensed. The antecedent visited the 1 town is not semantically identical to the elided visited the 3 lake, regardless of indexation: the former has an argument saturated by a variable restricted to towns, while the latter has an argument saturated by a variable restricted to lakes. So (12a) is ill-formed. (12b), meanwhile, will have the same LF structure, but replacing lake with one. Regardless of whether one thinks of one as an NP pronoun anaphoric to town or as an instance of town that is converted to one at PF, it is clear that in this example one = town . Hence, the antecedent visited the 1 town and the elided visited the 3 one are semantically identical modulo indexation -in both, the internal argument is saturated by a variable restricted to townsand ellipsis is licensed in (12b).
Notice that ILRH is critical to this account of the Kennedy-Sauerland Puzzle. After all, both the antecedent and elided (or unelidable) constituents contain traces (of every town and Op lake/one, respectively), and the only way to semantically differentiate between these traces -thereby preventing (12a) and permitting (12b) -is to posit that town/lake/one makes a semantic contribution at its lower merge site. Since trace conversion adheres to ILRH, it makes for a useful framework in which to formulate Sauerland's analysis. Therefore, given the arguments against trace conversion that will be offered 14:11 in the next section, it will be important that whatever ends up replacing it similarly adhere to ILRH.

Trace conversion and its discontents
In this section I will discuss syntactic and semantic trace conversion in greater detail, as well as the critical issues that each version faces. This will pave the way for my own analysis later in the paper, which generates the semantic effects of trace conversion while avoiding those problems faced by prior implementations.

Syntactic trace conversion
The most popular approach to trace conversion, and the version assumed thus far, is syntactic: a post-syntactic operation replaces determiners at lower merge sites with the . Note that an additional positive of trace conversion beyond its semantic results is that it swaps in a syntactic object whose existence has already been motivated on independent grounds: simply put, there is such a thing as overt the. Thus, not only does syntactic trace conversion follow through on the Minimalist Program's avoidance of traces while allowing for straightforward semantic computation that adheres to ILRH, but it does so while also only making use of syntactic objects whose inclusion as a part of the grammar has been independently motivated.
However, syntactic trace conversion is not without its drawbacks. Notice that trace conversion is an operation performed only on lower copies of a DP, since the highest copy must retain its quantificational interpretation. But without any recourse to look-ahead there is no way of telling that a given copy is a lower copy until movement has already taken place, at which point the lower copy is already embedded in a larger structure. 7 In other words, trace conversion is an inherently countercyclic operation that replaces certain constituents with semantically interpreted material absent from the numeration, a significant violation of Chomsky's (1995) otherwise robust Inclusiveness Condition. While Inclusiveness is an empirical hypothesis and should not be taken as gospel, on basic Minimalist principles the prospect of abandoning it should at least give us pause. After all, if we wish to determine the extent to which the language faculty is optimally designed -including how much of its behavior can be predicted by attributing to it only basic syntactic operations like Merge and Agree -then introducing further operations like syntactic trace conversion that make it suboptimal in this regard should not be done unless necessary.
Even if we put Inclusiveness aside, trace conversion remains a rather strange syntactic operation, and it is not entirely clear how its existence could be empirically motivated. It will help to compare and contrast it with another covert syntactic operation: covert movement. The fact that the standard approach to the semantics of quantification happens to require movement to effect scope reconfiguration does not, of course, constitute evidence for covert movement: this is an observation about a theory, not about language. This is why researchers have looked elsewhere for evidence of covert movement, including parallels between covert and overt movement, cross-linguistic variation in whether certain movement operations are covert or overt, effects of covert movement on overt structure (e.g., Fox & Nissenbaum's (1999) aforementioned account of extraposition), etc. Yet to my knowledge the only existing motivations for syntactic trace conversion involve its necessity under current semantic assumptions for generating appropriate interpretations (including ILRH). But again, this is an observation about a semantic theory, not an observation about language. Moreover, the kinds of evidence in favor of covert movement do not seem to be available for trace conversion: I am unaware of any evidence for the existence of an overt counterpart to trace conversion, nor of trace conversion affecting overt structure. Put succinctly, if contemporary semantic assumptions happened to mesh well with multiplemerge and generate the appropriate interpretations, there would be no reason to posit a syntactic operation of trace conversion at all. This implementation of syntactic trace conversion also suffers from the fact that DPs are not the only type of syntactic constituent that takes scope by means of movement. For instance, degree phrases in comparatives can give rise to scope ambiguities, as evidenced by examples (24) and (25) from Heim (2000: 48, paraphrases mine), in which the degree phrases exactly 5 pages -er than that and less than that can scope either above or below the intensional verb require: (24) (This draft is 10 pages.) The paper is required to be exactly 5 pages longer than that.
a. The paper must be exactly 15 pages. In addition, Iatridou & Zeijlstra (2013) argue that the relative scope of modals and negation is resolved by means of movement. More specifically, they argue that modals are merged under negation and move overtly past it; modals like can that scope under negation then reconstruct to their pre-movement positions, while those like must that scope over negation are interpreted in their post-movement positions, leaving a trace (or "trace") in their merge positions.
(26) a. Rivka cannot leave the party.
b. Rivka must not leave the party.
Thus, according to a straightforward implementation of trace conversion, degree morphemes like -er and less and modals like must have to be covertly replaced with bound definite determiners at lower copies. 8 While one can of course simply bite the bullet and accept that modals and degree heads (and other scope-taking heads) can be replaced at LF with definite determiners, an operation that covertly inserts the in syntactic environments in which it cannot overtly appear is at face value an undesirable one to posit, at least without substantial further motivation.
A reasonable response to this empirical problem would be to posit that syntactic trace conversion does not insert the actual lexical determiner the, but rather something else that has a similar semantic contribution. Moulton (2015), for example, adopts such a view, proposing a rule of Category-Neutral Trace Conversion (CNTC): [ DP 3: 3 is a square] = (3) iff square ( (3)) = 1; undefined otherwise But even though CNTC avoids the problem of determiners in places they do not belong, and even though Quantifier Removal itself is innocent enough (being a deletion operation), as a syntactic operation Index Interpretation is undesirable for the same reasons as the original formulation of trace conversion: it violates Inclusiveness and is motivated only on the grounds that it is necessary to maintain current semantic assumptions. In addition, while Moulton does not go through how the output of Index Interpretation is interpreted compositionally, it seems as though an altogether novel rule of semantic interpretation is required, along the following lines: An alternate version of CNTC might avoid the stipulation of a new rule of semantic composition as follows: Index Interpretation inserts some object that is distinct from the -call it schme -that is syntactically categoryneutral and semantically behaves like a type-generalized bound definite: This would give us a version of trace conversion that is category-neutral, but that does not require a new compositional rule. But in addition to retaining the syntactic disadvantages of the original formulations of trace conversion and Index Interpretation, there is another reason why schme should give us pause. Recall that one advantage of the original formulation of trace conversion-that is, the one that used the instead of schme-is that we already know that the exists, so we are only using syntactic objects whose existence can be independently justified. But there is no such justification for schme, since no language has an overt definite "determiner" that cuts across semantic types and syntactic categories. So what we are left with is a newly stipulated syntactic object whose existence cannot be justified on independent grounds, and that can only be inserted countercyclically at locations from which movement originates. In other words, once syntactic trace conversion is appropriately extended to account for QR of non-DPs, what we end up with is something that bears a suspicious resemblance to traces, which are the very thing that multiple-merge theories of movement have been trying to eliminate in the first place.
To summarize, while syntactic trace conversion has some semantic benefits, there is reason to think it is undesirable as a syntactic operation: it 14:15 violates Inclusiveness and is motivated only by its necessity under certain contingent semantic assumptions. Moreover, once trace conversion is extended beyond DP quantification, we have to either permit the to be inserted in syntactic environments in which it otherwise cannot appear, or we have to replace it with a newly stipulated syntactic object (schme) that has no overt counterpart, and that closely resembles the very thing multiple-merge theories of movement seek to replace: namely, traces.

Semantic trace conversion
An alternative possibility mentioned by Fox (2003) is that trace conversion is not syntactic but semantic: no alterations occur at LF, but the semantics interprets quantificational DPs at lower merge sites as if some syntactic alteration had taken place. (See Ruys 2015 for a similar proposal.) (31) Semantic Trace Conversion (Fox 2003: p. 110): In a structure formed by DP movement, DP [ …DP …], the derived sister of DP, , is interpreted as a function that maps an individual, , to the meaning of [ / ].
straints, this grants more power to the semantic apparatus than has otherwise been empirically motivated. For example, there is no natural language determiner nonce that has a different interpretation depending on whether or not it undergoes, say, raising: Nonce dog seems to be here. (≈ Every dog seems to be here.) Similarly, there is no pronoun faux whose referent must be animate if it undergoes A ′ -movement, or inanimate if it does not: Though such observations are often left tacit, it is difficult to overstate their importance to semantic inquiry. Traditional representation-only approaches to compositionality immediately predict these observations, but approaches that incorporate a mixture of derivational and representational information do not: if identical constituents with different derivational histories count as distinct as far as compositionality is concerned, without freshly stipulated restrictions all bets are off. 9 In summary, semantic trace conversion avoids the syntactic stipulations of syntactic trace conversion by shifting the work from the syntax to the semantics. However, this comes at the cost of either a partial abandonment of the principle of compositionality, or a weakening of the notion of compositionality in a way that has not been otherwise motivated. It is worth noting 9 Another possibility, suggested by Ruys (2015), is that higher and lower copies can be differentiated featurally: a lower copy might have an unchecked feature that is checked in a higher copy (e.g., Case). However, this does not seem to militate against lexical items like nonce or faux, whose movements would presumably also be feature-driven. Moreover, this approach seems to conflict with basic Minimalist assumptions about the architecture of the grammar. If the feature that differentiates between higher and lower copies is interpretable at LF, it should not be deleted upon checking and should thus be present in all copies. If the feature is uninterpretable at LF, then since the presence of uninterpretable features leads to a crash that feature must be deleted in all copies.
14:17 Robert Pasternak that the theory endorsed in this paper, and to which we next turn, bears a close resemblance to semantic trace conversion, in that the compositional apparatus is responsible for generating the semantic result of trace conversion. However, it differs by virtue of achieving this in a directly compositional manner. It thus might be thought of as a fully compositional implementation of semantic trace conversion. Hence, compositional trace conversion.

Compositional trace conversion and DP movement
Our task now is to define a semantics that generates the results of trace conversion, but does so compositionally and without any syntactic modifications. We first lay out the proposal for DP quantification. Notice that as far as the compositional semantics is concerned, the copy LF in (3) and the multidominance LF in Fig. 1 are identical: there is no difference between (i) distinct but indistinguishable copies at separate merge sites, and (ii) the same constituent merged at two separate locations. Any analysis that can interpret one LF directly can interpret either LF directly, including the analysis proposed in this paper. This in turn means that unlike syntactic trace conversion, which requires a distinction between higher copies that do not undergo trace conversion and lower copies that do, the present analysis is fully agnostic between copy theory and multidominance theory. However, for the sake of concreteness and simplicity I will often adopt the language of copy theory. In going over how the system works, we will use the LF in (3), repeated below, as our example: (3) [ TP every 2 teacher 2 [ TP a 1 student 1 T [ VP a 1 student like every 2 teacher]]]

Swap states as a vehicle for trace conversion
In order for our compositional semantics to work, we need some mechanism that will allow us to "swap out" a determiner's quantificational interpretation for a the -like interpretation at lower copies, but in a straightforwardly bottom-up compositional fashion. In order to do this I will make use of what I call swap states, or states for short. A swap state is a function that first takes an index , then what I will call an etett -any function of type ( )( ) , the traditional type of quantificational determiners -and returns 14:18 Compositional trace conversion a (possibly identical) etett. 10 For a given state , index , and etetts and ′ , if ( )( ) = ′ , I will say that swaps out for ′ at (index) , or equivalently, swaps in ′ for at (index) . For readability's sake, I will rewrite So now that we have swap states, how are they actually used? In short, they serve a role analogous to variable assignments in approaches like that of Heim & Kratzer (1998). Tradition has it that variable assignments are a parameter of semantic interpretation, and lambda abstraction returns a predicate true of an individual iff the pre-abstraction interpretation is true relative to a suitably altered variable assignment. A version of this is presented in Similarly, in the approach presented in this paper, interpretations are parameterized to swap states, and lambda abstraction generates a predicate true of an individual iff the pre-abstraction interpretation is true relative to a suitably altered swap state. A preview of what this will look like, with gaps to be filled in later, can be seen in (37): (37) New Lambda Abstraction (Preview): where [ , ?] is the ′ such that… The plan is that whatever X [ ,?] looks like, it will perform the semantic work typically assigned to the operation of trace conversion. To see how all of this works, let us start by building up the pre-abstraction VP. As always, we begin our bottom-up derivation by defining our lexical items. The denotations of teacher and student are as one might expect: they are state-insensitive -type predicates. These can be seen in (38): (38) a. teacher = . teacher( ) b. student = . student( ) I will often rewrite . teacher( ) as teacher when convenient, and likewise for . student( ). As for like , this is again more or less as one would 10 A note on notation: type is what is traditionally written as ⟨ , ⟩. Types are rightassociative, so is what would traditionally be written as ⟨ , ⟨ , ⟩⟩, while ( ) is the same as ⟨⟨ , ⟩, ⟩.
14:19 expect, except that it must be assigned a higher type in order to allow it to directly compose with two ( ) -type quantificational arguments. This leads to the definition in (39) This just leaves us with the determiners a 1 and every 2 , and these are where swap states make their appearance in the lexical semantics. Let SOME be the traditional existential etett (i.e., ′ . ∩ ′ ≠ ∅), and likewise for EVERY and the universal etett ( ′ . ⊆ ′ ). Instead of simply being SOME, a 1 will be whatever etett swaps in for SOME at index 1; similarly, every 2 will be whatever etett swaps in for EVERY at index 2: Composing our VP involves straightforward function application. First we combine every 2 with teacher , and then feed the result to like : Next we combine a 1 with student and feed the result to like every 2 teacher : (42) a. a 1 ( student ) = ′ .
Treating tense (T) as semantically vacuous for simplicity's sake, the next step is lambda abstraction via 1 . As mentioned above, this entails manipulating the swap state in order to swap out the quantificational etett SOME for a bound variable etett. Whatever this bound variable etett is, it must be 11 As a reviewer notes, like can also be defined as follows, while still allowing composition with two ( ) -type arguments: To my knowledge there is no meaningful difference between this definition and (39). 14:20 Compositional trace conversion parameterized to an individual: namely, the entity argument that is lambdaabstracted over. In keeping with standard implementations of trace conversion, I will use THE , defined below: (43) THE ∶= ′ ∶ ( ). ′ ( ) Now that we have a bound variable etett to swap in for SOME when lambda abstracting over index 1, we next need to decide on how to actually perform this swap. In the traditional lambda abstraction in (36), this is done by replacing the variable assignment with a variable assignment [1, ] that is identical to except that [1, ](1) = . Similarly, for us this will involve replacing the swap state with the state [1, THE ], which is the state identical to except that [1, THE ] swaps out all etetts for THE at index 1. More generally: With this in place, we now have the formal tools necessary in order to define lambda abstraction and fill in the blanks in (37). This can be seen in (45): (45) New Lambda Abstraction: Turning back to our example, the result of lambda abstraction is as follows: (46) 1 a 1 student like every 2 teacher = . a 1 student like every 2 teacher [1,THE ] = . [SOME] [ Notice that crucially, this analysis adheres to ILRH, an aforementioned desideratum: after lambda abstraction, the lower instance of student makes a semantic contribution in the form of a domain restriction.
[SOME] 1 (student) ( ∶ student( ). like( , ))) We have finished the derivation, but something is missing. At this point the interpretation we get is still relative to the swap state that is our parameter of interpretation: because (50) uses [SOME] 1 and [EVERY] 2 instead of just SOME and EVERY, the truth conditions of (50) are at the whim of . We of course do not wish this to be the case, and instead would like to decline the opportunity to further swap. We can easily define a swap state that does precisely this: namely, Stay as defined in (51), which swaps out every etett for itself at every index: (51) Stay ∶= .
We then simply say that every sentence is interpreted with Stay as its swap state parameter. In that case the compositional semantics up to this point will go exactly as before -nothing in the preceding discussion relied on any particulars about the parameter -and the final interpretation we get is as in (52): 14:22 Compositional trace conversion (52) (3) Stay = EVERY(teacher)( ∶ teacher( ). SOME(student) ( ∶ student( ). like( , ))) We thus have a semantic theory that generates the effects of trace conversion in a straightforwardly bottom-up compositional semantics, without either a syntactic or a semantic operation of trace conversion: instead, the semantic work of trace conversion is automatically performed by lambda abstraction. Before turning to the task of generalizing this analysis beyond DPs, I will first briefly discuss some other features of the analysis: how it can be used to define semantic reconstruction, and whether and how it might be replaced with an alternative with lower-type swap states.

A possible extension: Semantic reconstruction
Reconstruction is the process whereby a constituent takes scope at a position below its highest landing site. This is exemplified by the weaker not > every reading of (53), according to which at least one student did not leave: (53) Every student didn't leave.
A common analysis of reconstruction is as a syntactic phenomenon. Following May (1977May ( , 1985, traditional treatments use an operation of quantifier lowering, in which every student moves downward from its pronounced position to somewhere below negation -presumably, its VP-or P-internal merge position. While syntactic lowering operations are typically no longer used, the results of quantifier lowering can still be obtained in multiplemerge theories. Thus, in a copy theory of movement, any copies of every student that outscope negation can be erased or ignored, furnishing an LF structure that looks the same as if every student had never moved past not in the first place. For example, while (53) would have an LF like (54a) on its surface scope interpretation, its LF would look like (54b) on its inverse scope interpretation: (54) a. every 1 student 1 not every 1 student leave b. every 1 student not every 1 student leave Similarly, in a multidominance theory all connections can be severed between every student and points in the structure higher than negation, again generating an LF structure that looks like one in which every student never moves past negation. Such structures would compose in an entirely straightforward 14:23 Robert Pasternak fashion on the analysis in this paper, and the current theory doesn't have much of interest to add.
However, more recently it has been argued that some instances of reconstruction -perhaps even all -are not realized syntactically, but only semantically: the movement of one operator past another is not "undone" in the syntax, but compositional mechanisms ensure that when the moved operator composes in its higher position, the semantic result is still one in which it takes low scope (von Stechow 1991, Cresti 1995, Rullmann 1995, Lechner 1998, Sharvit 1999, Wurmbrand 2010, Ruys 2015). I will not in this paper provide empirical arguments for or against the existence of semantic reconstruction, either in addition to or instead of syntactic reconstruction. However, it is worth noting that if semantic reconstruction exists, it can be captured fairly easily on the analysis proposed in this paper. To illustrate, I will show how an inverse scope reading of (53) can be derived by means of semantic reconstruction within the confines of the present analysis.
Traditional treatments of semantic reconstruction involve traces being interpreted as higher-type variables. Suppose that the LF for (53) looks as in (55): (55) every 1 student 1 not t 1 leave If the trace is interpreted as an -type variable, the result after lambdaabstraction will be (56a); when every student takes this as an argument, the result will be a surface scope (every > not) interpretation. But if the trace is interpreted as an ( ) -type variable, then after lambda-abstraction the result will be the higher-type (56b). This will then take every student as its argument, generating a semantically reconstructed inverse scope (not > every) interpretation.

¬ (leave)
The same result can be derived on the analysis in this paper. Suppose that in addition to the "normal" lambda-abstractor , there is a higher-type lambda-abstractor ′ , which performs semantically reconstructing lambda abstraction. (Alternatively, we could suppose that there is one lambda-abstractor , with optionality as to whether reconstructing or nonreconstructing lambda abstraction takes place.) Thus, the LF for (53) on its inverse scope interpretation will be as in (57): (57) every 1 student ′ 1 not every 1 student leave 14:24 Compositional trace conversion Up to and excluding lambda abstraction, the interpretation would be exactly as we would predict based on the analysis developed thus far: (58) not every 1 student leave = ¬[EVERY] 1 (student)(leave) Next we define reconstructing lambda abstraction. First, a helpful abbreviation: (59) For of type , is type , and ∶= . .
is simply with an extra, vacuous argument tacked on that is of the same type as 's first argument. Thus, if is an ( ) -type quantifier, will be an etett. Reconstructing lambda abstraction can then be defined as in (60): (60) Reconstructing Lambda Abstraction: ′ X = . X [ , ] Much like on traditional approaches to semantic reconstruction, the result of lambda abstraction in (60) takes an ( ) -type argument instead of an -type argument. As discussed above, since the argument is type ( ) , is an etett, meaning that it can be the output to a swap state.
To see how this works, let's apply it to (58): Naturally, when we evaluate with respect to Stay, we get the desired inverse scope interpretation: (63) every 1 student ′ 1 not every 1 student leave Stay = ¬EVERY(student)(leave) Thus, the higher-type lambda abstraction required in order to perform semantic reconstruction is perfectly compatible with the analysis adopted in this paper. 12 12 Note that if semantic reconstruction is permitted, it must be closely regulated. Otherwise, Fox & Nissenbaum's (1999)

Lower-type swap states and ILRH
In the analysis adopted here, swap states trade in objects of type ( )( ) , i.e., etetts. However, a reviewer notes that this is not necessary in order for the semantics to work: swap states could just as easily trade in lower-type, ( ) -type quantifiers. Using as a variable over such lower-type swap states and adopting the same abbreviations as before, every can be defined as follows: Since EVERY is an etett, EVERY( ) is of type ( ) , and can thus serve as an input to our new lower-type swap states. Now consider the following simple structure: (65) every 1 student 1 every 1 student left Up until lambda abstraction, we get the following: We thus have an -type predicate that can recompose with every 1 student , and semantic composition can proceed as normal. But notice that this no longer satisfies ILRH, a claim for which we saw empirical evidence in Section 2. The restrictor student makes no semantic impact at lower merge sites: its semantic contribution is wiped out along serting ′ instead of after the post-late merge QR, allowing the DP to scope below the extraposition site. This problem is not specific to my analysis and extends to any approach that permits semantic reconstruction.

14:26
Compositional trace conversion with the determiner's through lambda abstraction. This can, however, be prevented by enforcing ILRH in the lexical semantics of every and other scopetaking heads: This gives us the following pre-and post-lambda-abstraction interpretations: (70) a. every 1 student left But this of course comes at a cost in the form of stipulation: why is the denotation of every (69) and not the seemingly simpler (64), and likewise for other determiners? Meanwhile, when using higher-type swap states the picture is simpler: for any determiner D there is an etett such that D It is worth emphasizing that in spite of invoking the empirical claims of ILRH, this is not an empirical argument in favor of using higher-type swap states: by using the somewhat more cumbersome definition in (69), ILRH is indeed still respected. I thus leave the choice between higher-and lowertype swap states as an open issue. However, regardless of one's preference between lower-and higher-type swap states, the fundamental point of the proposal adopted here remains the same: swap states of some sort are a useful apparatus for compositionally deriving the semantic results of trace conversion. 13 With this in mind, I will continue to use higher-type swap states for the rest of this paper. We next turn to the task of extending our approach beyond DP quantification to include other types of scope-takers. 13 A reviewer notes that if not suitably constrained, swap states could potentially be powerful enough to define lexical items like faux and nonce from Section 3.2, something that was previously deemed problematic under certain views of compositionality. This is prevented on my analysis by making swap states a parameter of interpretation that can only be manipulated in very specific ways by lambda-abstractors: other heads can be sensitive to swap states, but cannot manipulate them in the ways required to define faux and nonce. It is worth noting that traditional variable assignments have to be similarly grammatically constrained, as they too are formally powerful devices that could be abused to create non-existent semantic interpretations. Thus, the need to constrain formally powerful devices in order to avoid non-existent interpretations is not specific to swap states. 14:27

Generalizing
In this section I will show how our proposal can be extended beyond -type lambda abstraction, thereby generalizing to modals and degree phrases. We start in Section 5.1 with the formal extension, revising the mechanism in a way that will permit lambda abstraction over arbitrary types. In Section 5.2 the power of this type-generalized formalism is directed toward an analysis of modals, first operating under the assumption that modals have syntactically represented restrictors similar to determiner quantifiers, then showing that the theory is powerful enough as is to allow ourselves to eschew this assumption and treat modals as lacking syntactically represented restrictors. Finally, in Section 5.3 the analysis is extended to account for degree phrase scope-taking in comparatives.

Type-generalized state dependency
Let's go back to our traditional lambda abstraction, using variable assignments instead of swap states: (36) Traditional Lambda Abstraction: where [ , ] is the ′ identical to except that ′ ( ) = .
Now suppose variable assignments are of type , i.e., (partial) functions from indices to individuals. Given that is not the only type over which lambda abstraction takes place, a reasonable followup question is how this might be generalized in a manner that will permit lambda abstraction over arbitrary types, or at least those types over which the compositional semantics requires us to lambda abstract.
There seem to be two obvious candidate paths for type-generalizing assignment-sensitivity. The first is to utilize a single, type-flexible variable assignment: can take an index and return an object of any (permissible) type, so (1) might be type , (2) type (for degrees), (3) type (for worlds), etc. The second path is to replace a single variable assignment of type with a cluster of variable assignments for different types: one of type , one of type , etc. Let's flesh out this second view a little more. Suppose the variable assignment is replaced with an assignment cluster, a set (or tuple) containing exactly one function of type for each lambda-abstractable type . For
This gives us enough to define lambda abstraction over arbitrary types, as shown in (72). Notice that lambda abstractors come with not only an index, but a type parameter to indicate what type is lambda-abstracted over. This also holds of the type-generalized version of swap state lambda abstraction, meaning that the lambda-abstractors 1 and 2 in the previous section must be replaced with ,1 and ,2 .
(72) Traditional Lambda Abstraction (Type-Generalized): The same general technique extends equally well to swap states. The swap states in the previous section took etetts (type ( )( ) ) and returned etetts. To generalize, let's say that for any type , an -swap state takes an index and an object of type ( )( ) , and returns an object of type ( )( ) . Thus, the swap states seen in the previous section were -swap states; modals will use -swap states, and degree phrases will utilize -swap states. Much in the same way that we previously introduced assignment clusters, a swap state cluster (or state cluster for short) will include exactly one -swap state for each lambda-abstractable type , and for any state cluster , will be 's -swap state. We can also keep the same abbreviation convention as before: for state cluster , index , and of type ( )( ) , [ ] ∶= ( )( ). Thus, we can keep the same definitions for a and every as before, e.g., every = [EVERY] . In the previous section, our definition of lambda abstraction made use of the etett THE , with being the lambda-abstracted-over variable. In order to permit arbitrary-type lambda-abstraction, this must be generalized, so that for of type , THE will be type ( )( ) . Luckily this is easy to define: (73) For of type , THE is type ( )( ) , and THE ∶= ′ ∶ ( ). ′ ( ) Finally, there is the matter of defining ⟨ , ⟩: (74) For state cluster , index , and of type ( )( ) , ⟨ , ⟩ is the ′ identical to except that ′ = [ , ].

14:29
We now have enough to define type-generalized lambda abstraction in our system: (75) New Lambda Abstraction (Type-Generalized): Finally, recall that when we only had a single -state , we always evaluated relative to the state Stay, essentially declining the chance to make any more swaps at the end of the derivation. Now that we are operating with state clusters rather than a single -state, we will redefine Stay as a state cluster, as follows: (76) Stay is the state cluster s.t. for all (lambda-abstractable) types , Stay = ( )( ) . .
The proposal from the previous section has now been extended in a fully type-generalized manner. We next move on to seeing how the present theory fares when it comes to modals and degree quantifiers, starting with the former.

Modals
I will use the sentences in (26), repeated below, to illustrate the analysis of modal scope: (26) a. Rivka cannot leave the party.
b. Rivka must not leave the party.
For the time being I will assume that these sentences have the LF structures in (77); note that I follow Iatridou & Zeijlstra (2013) in treating modals as merged below negation and overtly moving above it, with must scoping in its postmovement position and can syntactically reconstructing to its pre-movement position. As mentioned in the beginning of this section, I start by temporarily operating under the assumption that modals have a syntactically represented restrictor res, which can be thought of as filling the role that on traditional Kratzerian accounts is also played by if -clauses (Kratzer 1981(Kratzer , 1991a(Kratzer ,b, 2012 Note that I assume that the reconstruction of can res is syntactic and not strictly semantic: rather than the overt movement being semantically undone by a ′ , it is undone in the syntax by removing the higher merge site. This is a contingent assumption: ′ can be type-generalized just as well as regular , and the reconstruction of can res can equally well be formulated as semantic rather than syntactic. Our denotations for constituents will now be relative to a context parameter and a world of evaluation parameter . Let us put aside state clusters temporarily and assume that and are the only parameters of evaluation. The denotation of Rivka leave the party-that is, the part of the sentence below int-is a truth value, true iff Rivka leaves the party in : (78) Rivka leave the party , = 1 iff Rivka leaves the party in As promised, int then lambda-abstracts over the world of evaluation, returning a proposition, i.e., a function from worlds to truth values (type ): (79) int X , = . X , (80) int Rivka leave the party , = . Rivka leaves the party in Next up are can 1 and res. As mentioned previously, res serves to restrict can , saturating an argument that in conditionals would be saturated by the antecedent. Thus, res , will be a contextually-determined proposition . This in turn makes can type ( )( ) : in terms of semantic type, it is a quantifier over worlds in a manner parallel to how a is a quantifier over individuals. If we ignore issues of scope and multiple-merge composition, we thus might define can , as in (81), where CAN is a ( )( )type world-quantifier that is (i) context-sensitive, allowing for differences in modal flavors; and (ii) world-dependent, since what is permissible (for example) varies from world to world: 14 I leave for future work the issue of how the present analysis might be integrated with an alternate theory in which possible worlds enter the compositional semantics through pronouns bound by lambda operators (Percus 2000). I see no reason to believe that any conflict should arise here. 14:31

Robert Pasternak
Note that it does not matter for our purposes what CAN actually is: it might be a best-worlds quantifier à la Kratzer (1981Kratzer ( , 1991aKratzer ( ,b, 2012, or it might be defined in terms of probabilities and utility functions (see, e.g., Lassiter 2011). What matters for our purposes is that CAN is of type ( )( ) . But since can is a modal, and at least by assumption modals can take scope via movement, we need to re-introduce state clusters into the mix: can must be state-sensitive in much the same way that a is. Since we have type-generalized our semantics by replacing swap states with state clusters, this is a simple matter: [CAN ] will be the ( )( ) -type worldquantifier that swaps in for CAN at index . Let us continue with our derivation. Since res , , is the restrictor proposition (type ), while can 1 , , is of type ( )( ) , the former restricts the latter, leading to a ( ) -type world-quantifier: (83) can 1 , , ( res , , ) = .
[CAN ] 1 ( )( ) This then composes with int Rivka leave the party , , , which I will abbreviate as the proposition rleave: (84) can 1 res , , ( int Rivka leave the party , , ) = [CAN ] 1 ( )(rleave) Next, this composes with not , which is of type and simply contributes boolean negation ( not , , = . ¬ ): (85) not , , ( can 1 res int Rivka leave the party , , ) = ¬[CAN ] 1 ( )(rleave) This gives us our final denotation relative to the state cluster . We then evaluate relative to Stay: (86) not can 1 res int Rivka leave the party , ,Stay = ¬CAN ( )(rleave) Of course, regardless of one's favorite theory of CAN, these will naturally be the correct truth conditions, with negation outscoping can: it is not permissible that Rivka leave. We next move on to the LF in (77b), in which must replaces can and takes scope over negation: (77b) [must 1 res] ,1 not [must 1 res] int Rivka leave the party We can define must in a manner parallel to can , but with the worldquantifier MUST replacing CAN:
[MUST ] ( )( ) Up to and including negation, the derivation for (77b) runs precisely parallel to that for (77a), leading to the following result: (88) not must 1 res int Rivka leave the party , , = ¬[MUST ] 1 ( )(rleave) We then lambda abstract via ,1 , following our revised rule for lambda abstraction: We then evaluate relative to Stay, giving us our final truth conditions: (91) must 1 res ,1 not must 1 res int Rivka leave the party , ,Stay = MUST ( )( ∶ ( ). ¬rleave( )) Once again, we derive the correct truth conditions, with must scoping over negation: it is obligatory that Rivka not leave.
Up to now, we have assumed that modals have a syntactically represented restrictor res, which saturates the same argument that in conditionals is saturated by the if -clause. However, it is not obvious that if -clauses restrict modals through argument saturation: for example, von Fintel (1994) argues at length against this view, and in favor of an analysis in which if -clauses effect their modal domain restriction through means other than argument saturation. If this is the case, then there is no longer any reason to assume that modals are of type ( )( ) , with a silent head res serving to restrict the modal. Instead, the more plausible analysis would be that modals are simply of type ( ) , with no head res at all. In this case, the LF for (26b), rather than being (77b), would instead be the simpler (92) This raises a conundrum for the present analysis. As things currently stand, all of the swap states in our state clusters trade in objects of some type ( )( ) , i.e., quantifiers with restrictor arguments. If modals do not have restrictor arguments, and are thus of type ( ) instead of ( )( ) , how do they fit into the present system? One possibility is to posit that in addition to the ( )( ) -type swap state clusters used thus far, the semantics can also make use of lower-type swap state clusters of the sort briefly discussed in Section 4.3. While this is certainly doable, it is unnecessary: quantificational operators without syntactic restrictors can already be accounted for without any revisions to the theory at hand.
Suppose that on a traditional, non-multiple-merge semantics, the denotation for must , is MST , which unlike MUST takes a single propositional argument, making it type ( ) . Now recall that for any of type , the abbreviation was defined as . , i.e., with a vacuous first argument tacked on. Thus, MST is a ( )( ) -type quantifier ( . MST ( )), meaning that it can be the input or output to for a given state cluster . We can then define must , , as follows: (93) must , , = .
[MST ] ( . 1)( ) As desired, the semantic type for must is ( ) rather than ( )( ) , but we still have something of type ( )( ) that is fed into the -swap state .
To see that this gets the right results, let's complete the derivation of (92). As before, the denotation up to and including the world-abstracting head int is the -type proposition rleave. This is now fed to must 1 , which is of type ( ) , giving us a truth value: In summary, then, the present proposal extends equally well to lower-type quantifiers: in this case, must was type ( ) , rather than ( )( ) , but higher-type swap states were equally effective in allowing for direct composition.

Degree phrases in comparatives
Finally we turn to degree phrases in comparatives, for which I will develop an account based on that of Bhatt & Pancheva (2004). I start by introducing traditional, trace-based theories of the syntax-semantics of comparatives in Section 5.3.1. In Section 5.3.2 I introduce Bhatt & Pancheva's (2004) multiplemerge analysis of comparatives, which builds on Fox's (2002) treatment of antecedent-contained deletion. In Section 5.3.3 I discuss the relationship between Bhatt & Pancheva's (2004) analysis and so-called wholesale late merger 14:35 of the sort that has been argued to arise in A-movement configurations (Takahashi & Hulsey 2009), as well as the semantic repercussions of this aspect of their analysis. Finally, in Section 5.3.4 I go through the compositional semantics of comparatives in full.

Classical accounts of comparatives
Before diving into Bhatt & Pancheva's (2004) analysis, it will help to discuss classical accounts of comparatives in the tradition of von Stechow (1984) and Heim (1985Heim ( , 2000. I will use (99) as a sample sentence: (99) Jo is taller than Al is.
On traditional accounts (99) has an LF along the lines of (100), in which the degree phrase headed by -er undergoes QR: (100) [-er 1 Op 2 2 than Al is tall t 2 ] 1 Jo is tall t 1 tall is a relation between a degree and individual , true iff is at least -tall: Op is a wh-like operator that triggers lambda abstraction over elided tall's degree argument, while than is generally treated as semantically vacuous. Therefore, the denotation of the restrictor of -er is a degree predicate true of iff Al is at least -tall, and the denotation of the scope of -er is a degree predicate true of iff Jo is at least -tall. -er is thus of type ( )( ) , i.e., a degree quantifier.
By using (103a), the predicted truth conditions are that the maximal degree not exceeding Al's height -that is, Al's height itself -is less than the maximal degree not exceeding Jo's height. In other words, Jo's height exceeds Al's.

14:36
Compositional trace conversion Using (103b), meanwhile, leads to the following truth conditions: there is a degree that is not less than or equal to Al's height, and that is less than or equal to Jo's height. This will only be the case if Jo's height exceeds Al's.
(105) (99) take 2 = ∃ [height(al) ≱ ∧ height(jo) ≥ ] For our purposes it will not matter which definition of -er we adopt. Rather, the important takeaway is that in (99), -er is a ( )( ) -type degree-quantifier that takes as its restrictor the set of degrees not exceeding Al's height, and as its scope the set of degrees not exceeding Jo's height.

Comparatives, ACD, and late-merged than-clauses
Notice that on the traditional analysis, clausal comparatives constitute a form of antecedent-contained deletion. After all, the elided AdjP is contained within its own antecedent -the matrix AdjP, taller than Al is -and so a quantificational constituent headed by -er must undergo QR to avoid an infinite regress. The parallels between "normal" ACD and comparative ACD are illustrated in (106); in the former, the antecedent and elided phrases are VPs and the quantifier that undergoes movement is a DP, while in the latter the antecedent and elided phrases are AdjPs and the quantifier that undergoes movement is a DegP. We have already seen one theory of ACD in a multiple-merge theory of movement: namely that of Fox (2002), who argues that ACD involves (often string-vacuous) extraposition, with the relative clause late-merging with the noun it modifies after the DP has undergone (rightward) QR. As a result, ACD is a misleading name, as the ellipsis site is never actually contained within its antecedent. In fact, Bhatt & Pancheva (2004) argue at length that comparatives should be analyzed in a similar fashion, with the than-clause being extraposed by late-merging with -er after the latter has undergone (rightward) QR. This is illustrated in (107) After trace conversion, the LF structure that is fed to semantic interpretation is roughly as in (108). Note that they adopt a version of trace conversion that is slightly different from the one discussed earlier, but the semantic result is essentially the same. This structure derives the correct interpretation: just like on the traditional account, the restrictor of -er is the set of height degrees not exceeding Al's height, and the scope of -er is the set of height degrees not exceeding Jo's height. (Naturally, for our own analysis we will not be using syntactic trace conversion, meaning that the interpreted LF structure will look more like (107) than (108).) The syntactic structure in (107) needs to be fleshed out a bit more. One issue that Bhatt & Pancheva (2004) do not explicitly address is how the degree argument of the than-clause's tall is saturated and lambda-abstracted, i.e., what does the work of (100)'s Op. While there are multiple possibilities, I will explore an approach that takes direct inspiration from the matching theory of relative clauses discussed earlier. Suppose that the than-clause-internal tall has its degree argument saturated not by Op, but by another -er. This -er then moves and triggers lambda abstraction, and is deleted upon matching with the instance of -er with which the than-clause late merges. This is illustrated in (109): In order for this to work, it must be the case that not only is the higher -er 2 deleted at PF, but it is also deleted -or at least bleached of all semantic content -at LF. More generally, on my analysis this must be said for any operator whose semantic function is to trigger lambda abstraction through movement. After all, that operator must have a well-defined denotation in order to compose at the lower merge site, but it must not make its contribution at the higher merge site and thereby "undo" the lambda abstraction it has triggered. Thus, the higher instance of the operator must be either semantically bleached or removed entirely. This extends equally well to matching theories of relative clauses. Take, for example, the NP book that I like: Clearly in order for composition to work within the relative clause, Op 1 must be an etett. But then the result of lambda abstraction must not be allowed to recompose with Op 1 book, or else we will end up with something of type , which cannot compose with book. Thus, the higher Op 1 book must be stripped of semantic content or removed from the LF structure entirely after matching the two instances of book: in some form, deletion upon matching must apply equally well to both PF and LF. 14:38 Compositional trace conversion For convenience, I will assume that the higher -er 2 is well and truly deleted after matching, rather than just semantically bleached. With this in mind, the final interpreted structure for (99) will be as in (111):

Comparatives and wholesale late merger
Before going through the compositional semantics in full, it is important to note one way in which the analyses of Bhatt & Pancheva (2004) and Fox (2002) diverge, which will have significant semantic repercussions. For Fox, the quantifier and its restrictor (e.g., every book) merge as a unit in the premovement position, and the relative clause late merges with the restrictor, thereby intersectively modifying it at the higher merge site. Meanwhile, for Bhatt & Pancheva the ( )( ) -type quantifier -er is merged on its own at the lower site, with the than-clause restrictor itself being late merged after movement. This distinction is illustrated in (112) (111), this issue arises twice: both the lower matrix -er 1 and the than-clause-internal -er 2 appear without syntactic restrictors. For Bhatt & Pancheva (2004) this conundrum is resolved through trace conversion: while lower -er does not have a restrictor, the particular version of trace conversion they adopt inserts one for it -or rather, for the the that replaces it. Naturally, this path is not available to us, meaning that something else must be done to semantically restrict instances of -er that lack syntactically represented restrictors. 15 15 A reviewer suggests that this may be resolved by assigning -er a lower-type (( ) ) denotation, obviating a than-clause restrictor at the lower merge site. But in this case it is not clear how the than-clause can then restrict -er after late merge does occur. In other words, 14:39

Robert Pasternak
In fact, Bhatt & Pancheva's (2004) analysis of comparatives is not the only proposal in which structures like (112b) crop up: Takahashi & Hulsey (2009) argue that such configurations also arise in the DP domain (see also Stanton 2016). To see why, note first that A ′ -movement often does not bleed Condition C when the R-expression (here, John) is contained within a nominal argument: (113) a. * He was sitting in the sunny corner of John 's room.
b. * Which corner of John 's room was he sitting in? (Takahashi & Hulsey 2009: p. 391, attributed to David Pesetsky) This can be attributed to multiple-merge: in (113b), John is both above and below the co-indexed he, and the lower position triggers a Condition C violation.
(114) * [which 1 corner of John 's room] was he sitting in [which 1 corner of John 's room] However, A-movement does seem to bleed Condition C in parallel configurations: (115) a. * It seems to him that every corner of John 's room is messy.
b. Every corner of John 's room seems to him to be messy.
To account for this, Takahashi & Hulsey (2009) propose that with A-movement the entire NP corner of John's room can be wholesale late merged after movement has taken place, leading to the structure in (116). Independent factors prevent this from happening with A ′ -movement, hence why only Amovement bleeds Condition C. the problem is not the need for lower -er to be interpreted as restrictor-less, but rather the mismatch between the required argument structures for higher and lower -er (restricted vs. restrictor-less, respectively).

14:40
Compositional trace conversion So without trace conversion, how can restrictors be lent to syntactically unrestricted lower copies? Perhaps the simplest option, and the one that I will adopt, is that a vacuous restriction is provided by means of a type-shift ⊤ : (117) For of type ( ) , ⊤ ∶= ( . 1) The lower, syntactically unrestricted instances of -er or every first undergo this type-shift before composing with their sister, while the syntactically restricted higher copies compose as normal. This type-shift seems like one that we should closely regulate, given that in theory it could be used in many places to derive interpretations that do not exist. One way in which the application of ⊤ can be regulated is by treating it as a last resort mechanism: ⊤ can only apply if its non-application would lead to a type crash. Its application can be further restricted by requiring that its use as a last resort be determined strictly locally: ⊤ can only be used if it will prevent an immediate type crash at the step in the semantic derivation at which it applies. We will see that this is indeed the case for comparatives, so ⊤ can still apply in those places that we need it to. Whether different or additional restrictions on the application of ⊤ are appropriate is a matter I leave for future work.

Semantic composition of comparatives
We now have all of the tools needed for a bottom-up composition of Jo is taller than Al is, with the syntactic structure in (111), repeated below.

14:42
Compositional trace conversion What these truth conditions end up being depends on how one defines ER. If we use (103a) we get (104), and if we use (103b) we get (105). Either way, the desired truth conditions obtain: namely, the same ones derived on traditional analyses.
Summing up, we have seen that Bhatt & Pancheva's (2004) analysis of comparatives can be translated into the theory proposed in this paper, so long as something is done to ensure that the degree-quantifier is semantically restricted at its lower merge sites. This was accomplished with a typeshift ⊤ , which can also be used to interpret structures with wholesale late merger of the sort that has been argued to be a possibility in A-movement configurations.

Concluding remarks
In this paper I have outlined a form of compositional trace conversion that generates the desired semantic effects of trace conversion, but without the syntactic stipulations or compositional difficulties of its syntactic and semantic variants. This analysis was shown to extend beyond quantificational DPs, also being able to account for scope-taking movement by modals and degree phrases. In tying a bow on this paper, I will discuss a couple of areas that seem to me to be worth exploring in future work.
The first and most obvious issue is empirical coverage. While I have attempted to illustrate the broad applicability of my analysis by extending its scope beyond quantificational DPs, there nonetheless remain gaps that need to be filled, such as wh-phrases, adverbs of quantification (e.g., always, usually), and operators that quantify over focus alternatives (only, even). In addition, the analysis in this paper must be integrated with an appropriate theory of pronominal binding. Note that while the interpretations of pronouns could perhaps be defined via swap states, it is also possible for swap state clusters to coexist with variable assignment clusters, with lambda-abstraction simultaneously manipulating swap states and variable assignments: (126) , X ,ℎ = . X ⟨ ,THE ⟩, ℎ⟨ , ⟩ Thus, the theory in this paper is by all appearances fully compatible with a traditional approach to pronouns as denoting (free or bound) variables. However, further elucidation of these topics is left for future exploration. Additionally, one of the primary arguments against a syntactic operation of trace conversion was that it violates the Inclusiveness Condition by 14:43 Robert Pasternak inserting semantically interpreted lexical material that does not appear in the numeration. However, lambda-abstracting nodes, of which I (and others) make liberal use, also violate this condition. One path forward could be to eliminate lambda-abstracting nodes from the syntax and perform the same semantic work via a separate composition rule: 16 (127) Abstract and Apply (AA): If X is type ( ) , and Y is type , then This rule seems to generate the right results for the basic cases, but questions may arise about its possible stipulativeness. This also leaves open the question of what to do about operators like Op, since on the present approach they trigger lambda abstraction but do not semantically compose at the higher merge site (see the discussion in Section 5.3). I leave these issues for future work. Moreover, as a reviewer notes, another potential problem comes from the semantically interpreted syntactic indices that appear on scope-taking heads, which some have argued to themselves constitute violations of Inclusiveness (see, e.g., Chomsky 2000). My analysis, like many others, makes crucial use of these indices in the lexical semantics of quantificational heads and the interpretation of lambda abstraction. Perhaps the most promising means of obviating syntactic indices is not to eliminate indices altogether, but rather to treat them as objects that are assigned through compositional semantic mechanisms, rather than in the syntax. But this, again, is a matter that must be left for later exploration.