Department of English and American Studies




Скачать 372.24 Kb.
НазваниеDepartment of English and American Studies
страница5/15
Дата30.10.2012
Размер372.24 Kb.
ТипДокументы
1   2   3   4   5   6   7   8   9   ...   15

4.3 Hypothesis


Klenová’s work has proven that contemporary British “recipes represent a highly specialized register, on the level of lexis in particular” (2010: 142). Considering the long history of cookbooks and the developments they have undergone in each language, this analysis aims to draw an interligual comparison. The languages in question are Czech and English. It is being assumed that recorded cooking instructions always entail a code in the meaning of Roman Jakobson’s communication model and that this code is different from language to language. The hypothesis to be consequentially tested through quantitative research is that Czech recipes also show a highly specialized register, some features of which, however, are distinct from the English ones.

Another hypothesis tested is that the register of recipes the source language of which was English tends to show rather English register characteristics in its Czech translation.

5. Methodology


This chapter describes the exact procedures of data acquisition and processing and the criteria by which literature has been chosen. It also gives a brief preview on the criteria of analysis and discusses some disadvantages and limitations of the method used. The preconditions of the analyses are comparable to those in Klenová’s work. “The key term register is employed for the purpose of the analysis of the language variety” (Klenová 2010: 22). The term is being defined as “a variety of a language or a level of usage, as determined by degree of formality and choice of vocabulary, pronunciation, and syntax, according to the communicative purpose, social context, and standing of the user.” (Oxford Dictionaries)

5.1 Choice of Primary Sources


The primary sources chosen are the excerpts of Feast: The Food that Celebrates Life by Nigella Lawson, Jamie’s Ministry of Food: Anyone Can Learn How to Cook in 24 Hours by Jamie Oliver and Gordon Ramsey’s Fast Food: Recipes from the F Word by Gordon Ramsey which Klenová used as a corpus in her work. Primary sources in the Czech language are Zdeněk Pohlreich’s Prostřeno bez servítků, Jiří Babica’s Babicovy dobroty 2 and Mňam aneb Prima vařečka 3 edited by Marie Formáčková. A British book in Czech translation also taken into account is Šéfkuchař bez čepice „Dny plné chutí“ by Jamie Oliver the original title of which is Happy Days with the Naked Chef.

The excerpts of the first three works have been chosen because of the fact that they are digitalized already which makes them easier to work with within the scope of a quantitative analysis. Moreover Klenová has already presented her findings and conclusions based on these works. These can be used for a comparison to results of the quantitative analysis of works in the Czech language. The cookbooks by Pohlreich, Babica and Mňam aneb Prima vařečka 3 have been chosen because they are contemporary works which allows working in the line of synchronic analysis that has also been adopted by Klenová. Moreover these are also works by “widely known chefs [and] TV presenters” (Klenová 2010: 24) so that their work is closely related to that of Lawson, Oliver and Ramsey. In theory this should provide the best possible comparability for the communicative situation is in all cases very similar. In addition to that Šéfkuchař bez čepice „Dny plné chutí“, another work by Oliver has been chosen, though in Czech translation in order to test the second hypothesis. The choice between this book and alternatives was limited by availability.

5.2 Data Processing


“Quantitative research aims to classify features, count them, and construct statistical models in an attempt to explain what is observed” (Neill 2007). This is an empirical approach in which the “researcher [commonly] uses tools, such as questionnaires or equipment to collect numerical data” (Neill 2007). The immediate aim of this approach is to create convincing statistics of frequency in order to detect certain regularities. Statistics are most reliably and effectively computed with the assistance of EDP. The basis of statistical computing are digitalized data which first have to be created. This subchapter explains how this has been done and which tools have been used in order to create analyzable corpora.

5.2.1 Optical Character Recognition


A corpus which can be analyzed in terms of word frequency has to be present in an editable text document. The software used and described later requires the MS Word format with a “*.doc” file extension. The first three of the primary sources listed above are available in this format in the Appendix to Klenová’s work, however, the other four sources are available in print only. For the purpose of digitalizing these a so-called Optical Character Recognition - or OCR-software, has been chosen. OCR is “identification of printed characters using photoelectric devices and computer software” (Oxford Dictionaries). The photoelectric device used was a customary scanner. The software had to fulfill the requirements of high accuracy in character recognition and the ability to recognize words in the Czech language in order to apply corrections to misread characters.

Based on tests and professional reviews on specialized internet portals such as pcadvisor.co.uk and chip.cz concerning the accuracy and on the manufacturer’s information concerning recognizable languages one of the market-leaders, namely FineReader 10 Professional by ABBYY has been acquired. The selected texts have then been scanned, read and transferred to the required format through this software.

5.2.2 Choice of Excerpts


From these scanned works excerpts have been chosen in the manner Klenová had chosen excerpts for her work, namely 15 recipes per cookbook. The quality of OCR is dependent on many factors such as font, font size, typeface, background and text color as well as the arrangement of the text. Recipes with an apparent high error rate have been excluded from further selection. From the remaining recipes the choice was random but providing that recipes from each of the chapters would be selected in order to prevent an imbalanced vocabulary and thus distorted test results.

5.2.3 Revision of Digitalized Materials


OCR-scanned texts are rarely completely free of misspelling. In order to further improve the quality of the tested materials the texts have been automatically proofread by MS Word and highlighted potential mistakes have been revised and, if necessary, corrected in accordance with the original text in print.

5.2.4 Corpora


Following the previous steps three text files have been created. The first containing the excerpts from the British cookbooks in English taken from Klenová’s Appendix, the second containing the excerpts from Prostřeno bez servítků, Babicovy Dobroty 2 and Mňam aneb Prima vařečka 3, the third containing excerpts from the Czech translation of Jamie Oliver’s Happy Days with the Naked Chef, namely Šéfkuchař bez čepice „Dny plné chuti“. These files still containing images and tables have been transferred to plain text files with “*.txt” extensions in order to extract the text only and then transferred back to MS Word documents. These files are the basis of the creation of the desired frequency statistics. In the terms of linguistics they will be referred to as corpora. A corpus may be defined as “a collection of written or spoken material in machine-readable form, assembled for the purpose of linguistic research” (Oxford Dictionaries).

This is a legitimate research method from the field of corpus linguistics. Friederike Müller and Birgit Waibel of the University of Freiburg state in their Corpus Linguistics – An Introduction “that corpus linguistics is the study of language by means of naturally occurring language samples; analyses are usually carried out with specialised software programmes on a computer. Corpus linguistics is thus a method to obtain and analyse data” (2011).

The total size of the used corpora is 20,686 words. The first file contains 11,511 words, the second 5,811 and the third 3,364.

5.2.5 Word Frequency


In order to determine the average frequency of occurrence of a word, or more accurately a word form, in a language word frequency lists are being created from vast corpora containing several millions of words which have been found in texts of all genres. Typically the results are being tabulated. Table 1 shows an excerpt of word frequency statistics by Lexiteria of the ten most frequent English word forms. By Lexiria’s own account it is based on a word corpus of 636,417,051 word forms taken from the World Wide Web. (Lexiteria English 2010)


Table 1:

ID

Word

PoS

Count

Per million

Length

1

the

Art

24983484

41506.1274578191

3

2

of

Pp

23334834

38767.1549016563

2

3

and

Con

19357013

32158.6312293619

3

4

in

Pp

18273139

30357.9451284075

2

5

A

art,sym

14378797

23888.1086790020

1

6

to

Pp

14045710

23334.7370405011

2

7

was

va

7767936

12905.2033615561

3

8

is

va

7431276

12345.8957457748

2

9

for

pp

5623317

9342.25635374639

3

10

as

con,pp

5596850

9298.28559788920

2

(Lexiteria 2010)


According to Lexiteria’s statistics the most frequent English word in average is the article “the” while in Czech, based on a corpus of 28,575,409 word forms (Lexiria Czech 2010) it is the conjunction “a”. The most frequent noun forms are “time” as the 52nd most frequent in English and “roce” as the 16th most frequent in Czech. It is notable from these statistics that a number as little as 200 word forms accounts for over 44 per cent of English written texts on the internet. The same number of word forms still accounts for more than one third of Czech written texts. In terms of methodology this is valuable information because it shows that a relatively small number of words is representative of a whole text. A text of a specific genre using specialized recurring vocabulary will necessarily show different results in word frequency. By analyzing a rather manageable number of word forms the majority of the text will be covered and tendential conclusions on the general style can be drawn.

5.2.5.1 Text Analysis Software


For the purpose of statistical generation specialized software had to be obtained. The requirements it had to meet were to be either independent of language or to be designed for working with English and Czech texts. It was desirable that it be able to display concordance. By this a “list of the words (especially the important ones ) present in a text or texts, usually with citations of the passages concerned or with the context displayed on a computer screen” (Oxford Dictionaries) is meant. The Software chosen by these criteria is TextStat 2.8g for Windows available at the homepage of the Department of Dutch Studies at the Free University of Berlin. This very basic program is able to sort word forms by frequency of occurrence with information on how many times a word form has been detected. Its concordance function lists the desired number of words in the text that surround the word form to be analyzed in every single case of its occurrence. The lists can be transferred to MS Excel.

5.2.5.2 Tables and Further Calculations


Based on the frequency lists created by TextStat 2g for Windows it is easily possible to calculate the percentage of the occurrence of one word form in relation to all word forms in a certain corpus with the aid of MS Excel. This helps to detect how many word forms represent more than 50 per cent of the text. According to this information it will be possible to comment on the variety of the languages of cooking instructions to be investigated on the one hand and on the other reasonable limits concerning the number of words to be analyzed can be set.

1   2   3   4   5   6   7   8   9   ...   15

Похожие:

Department of English and American Studies iconDepartment of English and American Studies

Department of English and American Studies iconDepartment of English and American Studies

Department of English and American Studies iconMasaryk university in brno faculty of Arts Department of English and American Studies

Department of English and American Studies icon101 American Idiom: Understanding and Speaking English Like an American by Harry Collis and Mario Russo published by Mc Graw Hill

Department of English and American Studies iconAddress: American Studies Program

Department of English and American Studies iconDepartment of Communication Studies

Department of English and American Studies iconThe Department of Social Studies

Department of English and American Studies iconDepartment of Educational Policy Studies

Department of English and American Studies iconDepartment of Government & International Studies

Department of English and American Studies iconDepartment of Politics and International Studies

Разместите кнопку на своём сайте:
Библиотека


База данных защищена авторским правом ©lib.znate.ru 2014
обратиться к администрации
Библиотека
Главная страница