Скачать 238.92 Kb.

MATHEMATICAL METHODS IN NATURAL LANGUAGE PROCESSING
1. Finitestate automata and transducers, rules in phonology and morphology 2. Counting words in corpora, Zipf's law 3. Hidden Markov Models, training and decoding algorithms 4. Speech recognition architecture, lowlevel processing, feature extraction 5. Discriminative training of Hidden Markov Models 6. Language modeling: ngram and factored language models 7. Maximum entropy modeling 8. Document classification STOCHASTICS PROCESSES AND APPLICATIONS
The most common classes of stochastic processes are presented that are important in applications an stochastic modeling. Several real word applications are shown. Emphasis is put on learning the methods and the tricks of stochastic modeling.
The main goal of the course is to learn the basic tricks of stochastic modeling via studying many applications. It is also important to understand the theoretical background of the methods.
The students will learn the most common methods in stochastic processes and their applications.
Books: 1. S. M. Ross, Applied Probability Models with Optimization Applications, HoldenDay, San Francisco, 1970. 2. S. Asmussen, Applied Probability and Queues, Wiley, 1987. STATISTICS OF STOCHASTIC PROCESSES
1. Stationary processes, ARMA processes 2. Time series, trend and seasonality analysis 3. Spectrum analysis, parameter estimation of stationary processes 4. Markov decision processes, semiMarkov decision processes 5.Inventory theory, continuous time optimization models 6. Hidden Markov Models and their applications 7. Observable Operator Models MULTIVARIATE STATISTICAL INFERENCE
The course is based on the Probability and Statistics course, and generalizes the concepts studied there to multivariate observations and multidimensional parameter spaces. Students will be introduced to basic models of multivariate analysis with applications. We also aim at developing skills to work with realworld data.
The first part of the course gives an introduction to the multivariate normal distribution and deals with spectral techniques to reveal the covariance structure of the data. In the second part dimension reduction methods will be introduced (factor analysis and canonical correlation analysis) together with linear models, regression analysis and analysis of variance. In the third part students will learn classification and clustering methods to establish connections between the observations. Finally, algorithmic models are introduced for large data sets. Applications are also discussed, mainly on a theoretical basis, but we make the students capable of using statistical program packages.
Students will be able to identify multivariate statistical models, analyze the results and make further inferences on them. Students will gain familiarity with basic methods of dimension reduction and classification (applied to scale, ordinal or nominal data). They will become familiar with applications to realworld data sets, and will be able to choose the most convenient method for given reallife problems.
1. Multivariate normal distribution, conditional distributions, multiple and partial correlations. 2. The Wishart distribution and distribution of eigenvalues of sample covariance matrices. 3. Multidimensional Central Limit Theorem. Multinomial sampling and the chisquare test. 4. Parameter estimation and Fisher information matrix. 5. Likelihood ratio tests and testing hypotheses about the mean. Hotelling’s Tsquare distribution. 6. Multivariate statistical methods for reduction of dimensionality: principal components and factor analysis, canonical correlation analysis. 7. Theory of least squares. Multivariate regression, GaussMarkov theory. 8. FisherCochran Theorem. Analysis of variance. 9. Classification and clustering. Discriminant analysis, kmeans and hierarchical clustering methods. 10. Factoring and classifying categorical data. Contingency tables, correspondence analysis. 11. Algorithmic models: EMalgorithm for missing data, ACEalgorithm for generalized regression, KaplanMeier algorithm for censored data. 12. Resampling methods: jackknife and bootstrap. Statistical graph theory. Literature: 1. R.A. Johnson, G.K. Bhattacharyya, Statistics. Principles and Methods. Wiley, New York, 1992. 2. C.R. Rao, Linear statistical inference and its applications. Wiley, New York, 1973. 3. K.V. Mardia, J.T. Kent, M. Bibby, Multivariate analysis. Academic Press, New York, 1979. Handouts: ANOVA tables and outputs of the BMDP Program Package, while processing realworld data. SURVEY METHODOLOGY
Every empirical investigation in the social sciences requires valid and reliable data, and the application of carefully selected statistical methods. The typical form of data collection is conducting a survey, and well designed surveys can provide the researcher with good data, even based on surprisingly small sample sizes. The course will discuss the most important concepts and techniques in survey design. A clear understanding of these methods is necessary for any scientist who is engaged in data collection, but it is also useful for the researcher who analyses or interprets data. Topics: • Surveys and censuses • Probability versus nonprobability samples • Role of the sample size, accuracy of estimates • Sampling and nonsampling errors • Samplebased and modelbased approaches to surveys • Questionnaire design • Sample survey design • Main sampling techniques: simple random sampling stratified sampling cluster sampling • Handling of missing data Facilities and Infrastructure The Central European University (CEU) has a rich library which offers books and journal collections in many fields of interest, including mathematics and its applications. Lately, an additional departmental library has been organized to offer our students some of the most important books, including several copies of textbooks which are frequently used by our faculty. Furthermore, the Renyi Institute (which is close to our departmental offices) has one of the richest mathematical library in the region, with about 40,000 volumes and 350 periodicals. There are many other libraries in Budapest which are freely available. CEU has classrooms, study rooms, and computer labs. They are equipped with the usual facilities, including blackboards, computers, overhead projectors, printers, scanners. There is an IT department which is responsible for the overall soft and hardware development, maintenance, and acquisitions. A significant portion of the university budget is used every year to maintain and develop IT facilities in accordance with the standards of researchintensive universities. In addition to CEU's computer resources, our students and faculty have free use of the Renyi Institute computers. Our students can benefit from the very rich cultural and scientific life of Budapest. There are several universities in Budapest, in particular the Eötvös University (ELTE) and the Budapest University of Technology and Economics (BME). They organize frequently seminars and conferences. Our department is already integrated into this scientific environment. We also organize seminars and workshops. CEU has a Residence Center where students may live. This is essentially a modern hotel, with a restaurant as well as conference rooms, swimming pool, sauna, fitness room  all freely available. Faculty An important issue is attracting quality faculty to participate in our M.S. program. CEU has an agreement with the Renyi Institute of the Hungarian Academy of Sciences, so some of the courses are delivered by Renyi professors. To increase our teaching force, we invite specialists from other local Hungarian institutions (the Eotvos University (ELTE), the Budapest University of Technology and Economics (BME), the Computer and Automation Research Institute (SZTAKI) and the Research Institute for Particle Nuclear Physics (KFKI) of the Hungarian Academy of Sciences) who are able to cover various applied fields of the program. Furthermore, we invite frequently foreign specialists, depending on the interests of our students. MS Teaching Program for AY 20092010
