**Bibliography for Clementine Modeling Tools**
**Apriori**
Agrawal, R., Srikant, R. 1994. *Fast Algorithms for Mining Association Rules. *
Download from __http://www.almaden.ibm.com/cs/quest/publications.html__. This is one of a number of papers on association rule induction available at this site. This is a key paper for understanding Clementine’s Apriori modeling tool.
Berry, M.J.A., Linoff, G. 1997. *Data Mining Techniques for Marketing, Sales, and Customer Support.* New York: John Wiley and Sons. See Chapter 8 on Market Basket Analysis.
#### Build C5.0
*Berry, M.J.A., Linoff, G. 1997. Data Mining Techniques for Marketing, Sales, and Customer Support*. New York: John Wiley and Sons. See Chapter 12 on Decision Trees.
Mitchell, T. 1997. *Machine Learning.* Boston: McGraw-Hill. See Chapter 3 on Decision Tree Learning.
Quinlan, R. 1993. *C4.5: Programs for Machine Learning.* San Mateo: Morgan Kaufmann Publishers. Detailed description of C4.5 with source code listing.
Quinlan, R. __http://www.rulequest.com/see5-comparison.html__ and __http://www.rulequest.com/see5-win.html__. The Rulequest website has some comments on C5.0 versus C4.5.
Quinlan, R. __http://www.cse.unsw.edu.au/~quinlan/__ Ross Quinlan’s academic website has a downloadable paper “Boosting, Bagging, and C4.5.”
#### C & RT
Berry, M.J. and G. Linoff. (2000). *Mastering data mining: The art and science of*
*customer relationship management*. Wiley, New York.
Breiman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone. (1984).* Classification and*
*regression trees. *Wadsworth, Belmont, Calif.
Kass, G. (1980). *An exploratory technique for investigating large quantities of categorical data.* Applied Statistics, 29:2. pp. 119–127.
Lim, T.S., W.Y. Loh, and Y.S. Shih. (2000). *A comparison of prediction accuracy,*
*complexity, and training time of thirty-three old and new classification algorithms.* Machine Learning, 40.
Loh, W.Y., and Y.S. Shih. (1997). *Split selection methods for classification trees.* Statistica Sinica, 7. pp. 815–840.
**Factor Analysis/ Principal Components Analysis**
Dziubin and Shirkey (1974) Harman (1976) Hendrickson and White (1964) Jöreskog (1977) Kaiser (1963) Rummel (1970)
Darlington, Richard B., Sharon Weinberg, and Herbert Walberg (1973).* **Canonical variate analysis and related techniques.* Review of Educational Research, 453-454.
Gorsuch, Richard L. (1983) Factor Analysis. Hillsdale, NJ: Erlbaum
Morrison, Donald F. (1990) *Multivariate Statistical Methods.* New York: McGraw-Hill.
Rubenstein, Amy S. (1986). *An item-level analysis of questionnaire-type measures of intellectual curiosity.* Cornell University Ph. D. thesis.
#### GRI
Berry, M.J.A., Linoff, G. 1997. *Data Mining Techniques for Marketing, Sales, and Customer Support.* New York: John Wiley and Sons. See Chapter 8 on Market Basket Analysis.
Mallen, Bramer. 1995. “*Cupid – Utilising Domain Knowledge in Knowledge Discovery.*” Expert Systems XI. Discusses a KDD system developed to utilize domain knowledge in induction from noisy datasets.
Smyth, P., Goodman, R. 1992. *“An Information Approach to Rule Induction from Databases.”* IEEE Transactions on Knowledge Engineering and Data Engineering, vol. 4, number 4. Proposes the J measure, an information-theoretic measure of “interestingness.”
#### Multinomial Logistic Regression
Agresti, A. 1990. *Categorical Data Analysis.* New York: John Wiley & Sons.
Agresti, A. 1996.* An Introduction to Categorical Data Analysis. *New York: John Wiley & Sons.
Collett, D. 1991.* Modelling Binary Data.* London: Chapman and Hall.
Cox, D.R. and Snell, E.J. 1989.* The Analysis of Binary Data.* 2nd ed. New York: John Wiley & Sons.
Hosmer, D.W. and Lemeshow, S. 1989. *Applied Logistic Regression. *New York: John Wiley & Sons.
McCullagh, P and Nelder, J.A. 1989. *Generalized Linear Models.* 2nd ed. London: Chapman and Hall.
**Regression**
Jain, D. 1994.* “Regression Analysis for Marketing Decisions.”* In Principles of Marketing Research, edited by Richard P. Bagozzi. Blackwell Publishers. Discusses regression analysis from a market research perspective.
#### Train Kmeans
Arabie, P., and Hubert, H. 1994. *“Cluster Analysis in Marketing Research.”* In Advanced Methods of Marketing Research, edited by Richard P. Bagozzi. Blackwell Publishers. Useful recent review of cluster analysis.
Berry, M.J.A., Linoff, G. 1997. *Data Mining Techniques for Marketing, Sales, and Customer Support. *New York: John Wiley and Sons. See Chapter 10 on cluster analysis.
#### Train Kohonen
Berry, M.J.A., Linoff, G. 1997. *Data Mining Techniques for Marketing, Sales, and Customer Support.* New York: John Wiley and Sons. See Chapter 13 on Artificial Neural Networks, especially the section on using neural networks for undirected data mining.
Martin-del-Brio, B., Serrano-Cinca, C. 1995. “*Self-organizing Neural Networks: The Financial State of Spanish Companies.*” In Neural Networks in the Capital Markets, edited by Apostolos-Paul Refenes. New York: John Wiley and Sons. An application paper.
#### Train Net
Berry, M.J.A., Linoff, G. 1997. *Data Mining Techniques for Marketing, Sales, and Customer Support. *New York: John Wiley and Sons. See Chapter 13 on Artificial Neural Networks.
Bigus, J.P. 1996. *Data Mining with Neural Networks: Solving Business Problems—from Application Development to Decision Support.* New York: McGraw-Hill. Listed on the Neural Network FAQ as a good book for business executives.
Bishop, C.M. 1995. *Neural Networks for Pattern Recognition. *Oxford: Oxford University Press. A standard reference for the statistician.
Masters, T. 1993. *Practical Neural Network Recipes in C++. *Academic Press.
Masters, T. 1995.* Advanced Algorithms for Neural Networks: A C++ Sourcebook.* New York: John Wiley and Sons. You can read Masters’ two books in their own right even if you’re not interested in the code.
Ripley, B.D. 1996.* Pattern Recognition and Neural Networks.* Cambridge: Cambridge University Press. Another standard reference for the statistician. Discusses not only neural networks but other methods too.
**Two Step Cluster**
Banfield J. D. and A. E. Raftery. (1993). *Model-based Gaussian and non-Gaussian clustering.* Biometrics, 49. p. 803–821.
Fraley C. and A.E. Raftery. (1998). *How many clusters? Which clustering method?*
*Answers via model-based cluster analysis.* Computer Journal, 4. p. 578–588.
Fraley, C. (1998). *Algorithms for model-based Gaussian hierarchical clustering.* SIAM Journal on Scientific Computing, 20. p. 270–281.
Huang, Z. (1998). *Extensions to the k-means algorithm for clustering large data sets with*
*categorical values*. Data Mining and Knowledge Discovery, 2. p. 283–304.
Kaufman, L. and P.J. Rousseeuw. (1990). *Finding groups in data: An introduction to*
*cluster analysis. *Wiley, New York.
Melia, M. and D. Heckerman. (1998). *An experimental comparison of several clustering*
*and initialization methods. *Microsoft Research Technical Report MSR-TR-98-06.
Theodoridis, S. and K. Koutroumbas. (1999). *Pattern recognition.* Academic Press, New York.
Zhang, T., R. Ramakrishnon and M. Livny. (1996). *BIRCH: An efficient data clustering*
*method for very large databases.* Proceedings of the ACM SIGMOD Conference on Management of Data.p. 103–114, Montreal, Canada.
ג'ניוס מערכות בע"מ, נציגת SPSS בישראל _{}_{ }הסיבים 7, קרית מטלון ת.ד. 7796 פתח תקוה 49170 _{}_{ }03-9222204 _{} __www.spss.com/Israel__ |