Show item record

dc.contributor.advisorKittredge, Richard
dc.contributor.authorChuah, Choy-Kim
dc.date.accessioned2024-05-14T17:04:31Z
dc.date.availableNO_RESTRICTIONfr
dc.date.available2024-05-14T17:04:31Z
dc.date.issued2001-10-04
dc.date.submitted2001-04
dc.identifier.urihttp://hdl.handle.net/1866/33221
dc.subjectAbstractingfr
dc.subjectCondensationfr
dc.subjectSubstitutionfr
dc.subjectDeletionfr
dc.subjectMetadiscoursefr
dc.subjectExtraction du textefr
dc.subjectRésumé automatiquefr
dc.subjectCondensation du contenufr
dc.subjectTypologie de la condensationfr
dc.subject.otherLinguistics / Linguistique (UMI : 0290)fr
dc.titleLinguistic processes for content condensation in abstracting scientific textsfr
dc.typeThèse ou mémoire / Thesis or Dissertation
etd.degree.disciplineLinguistiquefr
etd.degree.grantorUniversité de Montréal
etd.degree.levelDoctorat / Doctoralfr
etd.degree.namePh. D.fr
dcterms.abstractWhile content selection has been intensively explored in the sentence extraction approach to automatic swnmarization, there is generally little work on the other process of content condensation. To understand this process of condensation, we propose a partial typology based on whether a linguistic unit is replaced, deleted, compressed into fewer essential units, or combined with another unit. Four important categories of condensation processes: generalization, deletion, compression, and aggregation, including their inverse processes, e.g. insertion, and expansion, which were occasionally observed, are proposed. To guide the usage of the same tenu for similar operations, we borrow definitions from linguistics. The type and function of the linguistic units involved are also discussed. We carried out an empirical analysis of 57 author-written abstracts of on-line journal articles in entomology, tracing each abstract sentence back to the plausible source sentences in the corresponding full text. Unlike other studies which focus on the resultant abstract, our study focuses on the processes leading to the production of abstract sentences from corresponding full-text sentences. We do not, however, propose an algorithm for abstracting, or account for all the conditions under which individual condensation operations may apply. While a range of substitutes were used in abstracting, about half of the stems of lexical units in our abstracts share the same stem as their source words, or are their derived forms. Only a small proportion of substitutes were synonyms, and the rest were (quasi-)synonyms, or imprecise equivalents. Authors tend to use less technical forms in abstracts possibly in anticipation of non-specialist abstract readers. Numerical expressions are rendered less precise although no less accurate: absolute numbers and decimals are rounded off, and percentages replaced by ratios or fractions. These observations are consistent with the "new" context of an abstract where only the gist of a document s content need be re-conveyed. Among the linguistic units commonly deleted are metadiscourse phrases, and segments of text (e.g. parenthetical texts, and apposed texts), which provide details and precision in the full text, but are out of place in an abstract. Redundancies inserted for various reasons, or units deemed to be implicit to the comprehension of targeted readers are also often removed. While deletion is an important sub-process of condensation, we observed some instances of adding experimental and other details to compact more information into abstract. The expansion or "unpacking" of compact linguistic units was also observed. The secondary role of inverse processes observed calls for a review of the meaning of condensation from "not giving as much detail or using fewer words" to include the adding of information in order to make a unit of text informatively compact. Among the linguistic units compressed are verbal complexes containing a support verb, or a catenative. Like semantically empty support verbs (e.g. X caused decreases in Y = X reduced Y), some catenatives too may be deleted without significant changes in meaning to the verbal complex (e.g. X was allowed to hatch E-e X hatched). Redundancy in meaning between an adjective and a noun in a noun phrase, e.g. functional role, may be removed, and the phrase compressed to just the stem of the adjective, i.e. function. While not frequently occurring in the corpus studied, the compression of such units may be described by rules, and hence, might be operationalized for automatic abstracting. Aggregation, the combining of units of text within or between sentences, is an important sub-process of condensation. Two-thirds of sentences in abstracts studied were written using multiple sentences, and more sentences were combined without than with the use of an explicit sign, such as a connective, a colon or a semi-colon. If research in summarization is to progress beyond sentence selection, then we must work towards: (a) a clear distinction between operations that are condensation processes, and those that are not; (b) bringing operationally similar processes together under the same designation, and (c) a greater understanding of sub-processes constitutiiig condensation. To this end, our provisional typology for condensation, the range of type of linguistic units involved and their functions sets the first step to advance research into content condensation. We have only just begun to identify the condensation sub-processes in operation during abstracting. The factors that are critical on the interplay of these processes still need to be investigated.fr
dcterms.descriptionThèse numérisée par la Direction des bibliothèques de l’Université de Montréal
dcterms.languageengfr


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show item record

This document disseminated on Papyrus is the exclusive property of the copyright holders and is protected by the Copyright Act (R.S.C. 1985, c. C-42). It may be used for fair dealing and non-commercial purposes, for private study or research, criticism and review as provided by law. For any other use, written authorization from the copyright holders is required.