Monografias.com > Lengua y Literatura
Descargar Imprimir Comentar Ver trabajos relacionados

Corpus Linguistics




Enviado por Victor Birkner



  1. Semantics
  2. Corpus
    linguistics
  3. Preliminary considerations
  4. LC and
    its pedagogical transfer
  5. Modified texts
  6. Conclusions
  7. Bibliography

Semantics

The use of corpora in the semantic studies has not been
frequent. Semantics has known its greatest developments in highly
theoretical work that have developed systems with high levels of
abstraction. This fact has often been understood as a separation
from its raw language, intensified by the study of certainly
marginal phenomena and the use of exemplifications which could be
strange for a typical native speaker (Kilgarrif,
2003).

Perhaps it is this reason that has caused the discipline
feels difficult and mysterious for non-specialists and is the
cause that it has remained far away from practical approaches
(for example, of the annotation of corpora and computational
linguistics in general, usually based on phonological and morph
syntactic aspects). The leap into empirical research is not
straightforward and brings with it major problems.

Linguistic studies have addressed the descriptions – and
also requirements – from different approaches, from those whose
interests focused on the origins and evolution of languages to
those which, mainly thanks to the advent of the increasing
storage capacity of computers, intended to give an account of the
behavior of specific languages through the observation and study
of vast collections of speech/text.

Corpus
linguistics

Corpus-based studies are, contrary to the prevailing
imaginary, quite older than you think; by the way, the appearance
of the recent technological-computational tools is responsible
for both the statistical fact of such studies, as well as its
increasing overgrowing.

Without explicitly using the term Corpus Linguistics, at
the end of the 19th century, Kading (1897), gathered a corpus of
German of over eleven million words in order to determine, among
other things, sequences of letters in that language. Likewise,
some of the most prominent scholars of the structuralist period
who included, for example, Boas (1940), also made use of corpora
in order to observe the behavior of Amerindian
languages.

In the 1930s a significant number of investigations
carried out where corpora was analyzed to set lexical frequencies
in use of real speech (Palmer, 1933;) Palmer &Hornby;, 1937).
This work was characterized by being led and performed by
professionals closely connected with the teaching of
English.

In 1960, Quirk designed and conducted a study called
Survey of English Usage, research that later Starvic digitized
and complemented with the famous Brown corpus, giving rise to
what Leech (1991) considered as a resource without comparison for
those interested in studying of spoken English.

The use of newspapers (a type of corpus), carefully
prepared by those linguists dedicated to language acquisition
processes, have been until now a rich source of language studying
material. Longitudinal studies in this investigative line, a
tradition also quite extensive, have made use of important
collections of statements contained in journals, for example
Brown (1973) and Bloom (1970).

Chomsky, the linguist in his personal merit marked a
turning point in modern Linguistics, postulated criticism implied
to corpus Linguistics, because for him, investigative linguist
core lies in what he calls the competence, i.e. knowledge that
the speaker's possesses of (the rules) their language; on the
other hand, what he called performance, corresponds to a poor
demonstration of richness contained in competence.

In the following section some preliminary considerations
on the use of corpora in language studies will be introduced,
along with the advantages that LC presents. Then, it is intended
to address more directly the debate on the (possible) transfer of
studies of the LC to the teaching of English. Finally, we will
analyze a position that collects some of the most outstanding
aspects of the debate in question, together with concrete
examples applicable to the context of English classes.

Preliminary
considerations

Corpora today are compelled to respond to three general
basic requirements: (i) size, although this depends on the
motivation of their construction and their specific uses; (ii)
internal balance, because it must respond to the period which it
claims to represent, to the dialectal variety chosen and with an
internal structure in such a proportion of text to validate
emerging conclusions from it, and (iii) the simplicity of use
depending on, once again, the use it will have. In addition, the
difficulty that exists on author's rights, especially when there
are several participants involved in an oral communicative act
chosen for the corpus and care in the codification of the
linguistic data.

The impact of LC has allowed that virtually all the
linguistics sub-components are affected by technological
developments, they include Phonetics, articulatory, acoustic and
auditory; the same applies to phonology, which explores the
distribution of both segments and suprasegments in a particular
language. Grammar is probably one of the areas that has given
rise a productive publication of literature; However, the study
of the lexicon is which has generated a series of dictionaries
and bounded studies which sometimes express pretensions of
becoming innovative materials for educational
purposes.

As Leech claims (1991), for many the use of corpus in
the pre-chomskyan research was conceived as a unique source of
evidence for linguistic theory, from the most passionate
exponents as Harris (1951) to the more moderate as Hocket (1948).
Chomsky here presents a series of objections and values, in their
absence, introspection, i.e., the space where the linguistic
competence lies; it only allows us to disambiguate statements, in
addition to establishing what sentences are grammatically correct
or incorrect.

In respect to the benefits normally raised by
corpus-based studies we can include the following:

(i) Emerging corpus data is observable and objective
(Leech, 1992);

(ii) The vast majority of the sentences contained in
corpus are grammatically correct, (Labov, 1969), in clear
opposition to Chomsky"s statements (1968).

(iii) The processing capacity of linguistic data is
growing at speeds increasingly higher and margins of error
virtually around zero.

(iv) The examples of linguistic data correspond to real
speech- even though this term is contentious enough for many
scholars – found in (small) context (s).

(v) The easy access to corpus that can be analyzed and
worked is such that too sophisticated technological tools are not
required; Access can now be personal and domestic.

The vibrancy with which this this renovated and
technologized stage of LC has been received that, according to
Johansson (1991), the number of significant studies based on
corpus has gone from ten in 1965 to more than 320 carried out
only in a span of fifteen years. This has gone hand in hand with
the construction of corpus in English, of which the first is the
aforementioned Survey of English Usage, in 1960; then the
so-called Brown University Corpus of American English arose, the
first computerized corpus in the sixties; the Lancaster-Oslo/
Bergen Corpus of British English occurred later in the seventies;
in the 1980s the Collins Birmingham University International
Language Database (COBUILD) was created, work from which the
dictionary that bears its name emerges. Back in the nineties a
project called Bank of English emerges along with the British
National Corpus (BNC) and the International Corpus of English,
among others.

LC and its
pedagogical transfer

The debate here is mainly based on the academic writings
contained in Controversies in Applied Linguistics, Barbara
Seidlhofer (2003), which presents two positions in relative
opposition on the eventual pedagogical transfer of the underlying
paradigm to the LC to the English as a foreign language
classroom.

In this respect, Carter and McCarthy along with Gavioli
and Aston are presented as defenders of the relationship between
LC and teaching of English, while Prodromou & Cook question
this relationship of linguistic description and pedagogical
prescription.

A feature that is often presented as an advantage of
linguistic descriptions, with its subsequent pedagogical
implications, is that collected speech is natural, 'real' –
although as Carter asserts (1998), the term "real" is extremely
loaded with positive connotations. The aforementioned is in clear
contrast to the forms of speech contained in texts.

What seems to be even more audacious is to assure that
the form of informal British speech, as McCarthy & Carter do
(1995), is 'real' English. It is true: the concordances widely
used by those who are fond of the study of corpora are
accountable of certain linguistic truths that often stated
against the traditional teachings.

Nevertheless, the above mentioned 'real' samples are
subordinated to the membership of an individual to the cultural
community of the language in question. In other words, it is not
possible to speak like a speaker of British English of informal
record if one does not belong to the above mentioned practice
community, particularly if we add to the above mentioned demand
the segmental and prosodic aspects of English, especially if, on
having analyzed a significant sample of the plans of study of
almost 100 programs of formation of teachers of English in Chile,
a gradual disappearance of classes of English phonetics and
phonology is observed. It is for reason that it is worth
wondering if it sounds more "'strange" (McCarthy and Oil pan,
1995:207) to use a bookish lexicon or to speak with a proper
lexicon of the variety and register mentioned, but with a rhythm,
accentuation and intonation clearly foreign.

On the other hand, as Prodromou (1996) questions: to
what extent can a non-native teacher, under the special premise
previously mentioned teach "real" English? In the same sense, the
underlying principle to the education of "real" English, from
samples belonging to a certain dialectal variety and to a
particular register, includes, perhaps in an indirect way, the
assumption that our students of English as a foreign language
learn the above mentioned language to communicate with native
speakers of English. It considers, in turn, that the 'native
speaker' – increasingly evasive concept in the
anthropolinguistic reflection – turns out to be invariably
our model, empowering him at the expense of a non-native teacher.
The above mentioned presumption turns out to be extraordinarily
fallacious when one observes that the growing number of speakers
of English as a second language and as a foreign language exceed
at length 400 million, so to adopt a dialectal variety that
represents "real" English must be reformulated due to the
intrinsic value of English considered the lingua franca of modern
times.

It is probably true that teachers of languages are
absorbed with the processes of natural speech, and as this –
partially, of course – is contained in corpora, we tend to think
that this is what should penetrate the classrooms and language
texts.

Modified
texts

This position taken by Carter is called 'moderate' or
even weak by Cook (1998), who in fact, for example, recognizes
that one of the great contributions of the LC is to show that
language in use is not limited to the domain of grammatical
rules, in harmonious combination with lexical items; it is,
rather, a vast collection of collocations, a principle to which
Willis (2003), Larsen-Freeman (2003), McCarthy & Carter
(1995) subscribe. In this sense it has been spread, both at
academic levels and organization responsible for public policies,
the virtual conviction that the degrees of comprehension of a
text in English are exclusive and mathematically related to the
number and type of words – according to frequency-that the
student knows. However, this purely mathematical relationship
ignores fundamental aspects inherent to the pedagogical exercise,
namely the treatment of the students" expectations by the
teacher, individual differences, in which learning strategies are
inserted; attitudes of the teacher and students, cultural
diversity, among others.

In addition, as Cook points out (1998), the deployments
of concordances of a corpus in terms of lexical frequencies or
range show produced speech, but ignores another equally important
aspect as it is the perception of speech and its interpretation,
aspects now covered by pragmatic. Such displays of concordances,
useful by the way, don't consider other aspects so real of the
use of the language such as the eventual infrequency of a
particular item, but its supreme potential usefulness or
pedagogical relevance; or the frequency of an item and its
narrowness of contextual ranges.

Finally, if we assume that speaking samples contained in
corpora must become models for our students, especially about the
logic of mathematical registers of frequency.

Conclusions

With the evidence
of the background presented regarding the debate stated here, we
can conclude the following:

(i) Research and corpus-based analyses are here to stay
(Mc – Carthy & crankcase, 1995). This is not only based on
the number of investigations of this nature, but also in the
productivity of the debate that LC has risen in the academic
community and relevant participants. This increase in academic
production, the technological tools of processing and storage of
linguistic data have definitely, played a fundamental
role.

(ii) There is already an extensive diversification of
types of corpora, in which the inquiry about the lexicon exceeds
the other available uses.

(iii) Contributions that LC is able to conceive are
extraordinarily rich in terms of objectivity due the fact that it
is responsible, not only for the absolute mathematical aspects
such as frequency of linguistic items, but also for the lexical
nature of the English language, in particular.

(iv) An important part of the findings challenge, from
the empirical evidence provided by the analysis of concordances,
eloquently a significant number of beliefs (linguistic) from
traditional texts based on introspection.

(v) It seems extremely dangerous to assert that real
speech corresponds to a dialectal variety, in a given register,
McCarthy & Carter (1995), discrediting other varieties, other
registers and an already indisputable truth: our students of
pedagogy in English, together with their students in the school
system used, probabilistically speaking, English with other
non-native speakers. Therefore, the inevitable question arises
why to work in shaping our education on the basis of a variety
which most likely our students will not listen in real
contexts?(vi) That fact which is conceived as real speech, by the
fact to come from a corpus, is invariably covered with a
socio-cultural context virtually not transferable to the class
room. If this is added to linguistic aspects such as prosodic
elements of the speech that, given the characteristics of our
students and related studies, are not acquired on regular
training programs, we should question the relevance of an
emphasis on "real speech", understood as real lexicon use, but
encapsulated in a foreign supra segmental wrapper. The principle
of real speech based on the native speaker makes that kind of
speech unteachable for the non-native teacher along with the fact
that it invalidates him socially speaking.

(vii) It seems to be that the invaluable data that LC
provides are more easily susceptible to discourse analysis or
conversational analysis, rather than the immediate use of this
type of material in the classroom. The main reason for this is
because this material is usually full of ellipsis, interruptions
in taking turns, textual bookmarks, false beginnings, hesitation,
etc. That is why it is sensible to consider intermediate
positions as Carter offers in his suggestion for amending and
remodeling texts.

(viii) The data provided by corpus concordances in terms
of lexical frequencies should not be the only criterion to
determine what is taught and what is not.

Bibliography

Meyer Charles (ed.). (2007). English Corpus
Linguistics. An Introduction. New York. Cambridge University
Press.

Bloom L (1970) Language development: form
and function in emerging grammars, Cambridge, MA: MIT
Press.        

Boas F. (1940). Race, language and
culture. New York:
Macmillan.        

Brown R. (1973). A first language: the
early stages. Cambridge. MA: Harvard University
Press.        

 

 

Autor:

Victor Birkner

 

Nota al lector: es posible que esta página no contenga todos los componentes del trabajo original (pies de página, avanzadas formulas matemáticas, esquemas o tablas complejas, etc.). Recuerde que para ver el trabajo en su versión original completa, puede descargarlo desde el menú superior.

Todos los documentos disponibles en este sitio expresan los puntos de vista de sus respectivos autores y no de Monografias.com. El objetivo de Monografias.com es poner el conocimiento a disposición de toda su comunidad. Queda bajo la responsabilidad de cada lector el eventual uso que se le de a esta información. Asimismo, es obligatoria la cita del autor del contenido y de Monografias.com como fuentes de información.

Categorias
Newsletter