David Deterding

An Introduction to Corpus Linguistics. Graeme Kennedy. Harlow: Longman, 1998, 315 pp. ISBN 0-582-23514-X

Corpus linguistics is becoming an increasingly important field of study with relevance for most areas of linguistics, and this book provides a very useful overview of the field. It is divided into three main chapters: the history and development of corpora; knowledge about English that has been obtained from corpus studies; and methods of corpus analysis. In addition, there is a short final chapter on applications of corpus linguistics, including its implications for pedagogy.

This book might be regarded as both too long and too short. It is too long because it attempts to cover all areas of corpus linguistics, and there are times when it almost seems to degenerate into a list, of all the major corpora of English that have been collected, of all possible analyses of verb forms of English, and of all the ways that language data can be arranged and counted. And it is too short because none of these areas is dealt with comprehensively, so one is constantly left wishing for more information, or for elaboration of an issue that is introduced, and furthermore there are a few obvious omissions in the areas that are covered

There are, of course, limitations to the amount that can be presented in an introductory book such as this, but it is perhaps a pity that almost all of the corpora considered are of English, with just very occasional, almost cursory mentions of data from other languages. Maybe this is inevitable for a book written in English, but it might prove frustrating for someone looking for some pointers to hard-to-find resources for non-English languages. Furthermore, even for the coverage of English, it is rather startling that there is almost no mention of the language of email, the medium that surely nowadays comprises the overwhelming bulk of written English. The fact that there is not even an entry for email in the index suggests either that the book is already a little out of date or that corpus compilers have been slow to keep up with the times.

In addition to these areas where the overall coverage might be regarded as lacking, there are regular instances where additional elaboration would have been valuable. For example: we are told (p.137) that, for phrasal verbs which have more than one particle, if the first particle is around, down, or away, it is usually necessary for the second particle to be present for the idiomaticity of the phrasal verb to be maintained, but no examples are given; we learn (p.163) that apposition tends to provide more specific information in about 59% of instances and less specific information in about 16% of instances, but this description would be greatly enhanced by some appropriate examples of each category; and we discover (p.152) that, for the occurrence of once in a finite clause, 21% of instances are with the simple present form of the verb while 8% occur with simple present passive, but there is no further interpretation of this information. In cases such as these, one feels that, in the absence of some elaboration, the data might have been omitted, to allow a more comprehensive coverage of other areas.

Throughout the book, it is suggested that the findings of corpus analysis have major implications for the design of pedagogical materials, but it is often not completely clear what these implications are. Certainly, textbook writers need to take account of word frequencies when preparing their materials, but at the same time they cannot slavishly adhere to word counts for determining what is appropriate. For example, we are told that corpus analysis reveals that the metaphorical use of prepositions (on account of) is far more common than their literal use (on the table) and that this must result in "implications for the content of language teaching" (p.144), but one might argue that it sometimes makes sense to teach the literal meaning of a word first and then progress to the metaphorical extensions, even if it is the latter that occur more regularly in real language usage. Furthermore, we find out (p.285) that words such as ticket, boat and football occur very regularly in second language teaching materials, while words such as activity, attempt and community are actually more common in English. However, surely it is right that learners of English should encounter a word such as boat before a more abstract word such as attempt? Surely ticket provides an exceptionally useful word for a foreign learner struggling to get by, while it is not clear that activity would prove such an essential word for a beginner? Clearly, strict adherence to word counts in the design of teaching materials is not always appropriate.

In conclusion, there are certainly some areas where the presentation in this book might be questioned or where it is frustratingly inadequate. However, it does provide a highly accessible introduction and overview to an increasingly important field of study, and if there are occasions where readers need to access further materials to make full sense of some of the issues, or where they must consider carefully for themselves what the exact implications of corpus analysis are, maybe the book has been successful in stimulating interest in a wide range of issues while not providing all the answers. And if, as is suggested, the book is both too long and too short, perhaps one could conclude that in reality, as an introductory text, it is about right.

From: SAAL Quarterly Vol 58 May 2002 pp. 2-4