Web Search and Mining Fall 2012 
Nov 20: Solution for Homework 3 posted.
Nov 02: Homework 3 posted. Due on Nov 09.
Nov 01: Solution for Homework 2 posted.
Oct 24: Homework 2 posted. Due on Oct 31.
Oct 16: Project posted.
Oct 16: Solution for Homework 1 posted.
Sep 27: Homework 1 posted. Due on Oct 10.
Sep 11: Course website launched.
WuJun Li (liwujun@cs.sjtu.edu.cn; http://www.cs.sjtu.edu.cn/~liwujun; Rm 3537, SEIEE Building; 34206661)
Office Hours: Thur 10:00am  11:00am
ZhiQin Yu (xiaoyu199175@gmail.com)
Wed 10:00  10:45 & 10:55  11:40
Fri 12:55  13:40 & 14:00  14:45
Rm 308, RuiQiu Chen Building(陈瑞球楼308)
[IIR]: Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
The English reprint edition (英文影印版) can be bought through ChinaPub. You can also download it from the book website.
[SE]: Bruce Croft, Donald Metzler, and Trevor Strohman. Search Engines: Information Retrieval in Practice. Addison Wesley, 2009.
(The English reprint edition can be bought through ChinaPub.)[WDM]: Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, 2006.
[DM]: Jiawei Han, and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, Second Edition, 2006.
(The English reprint edition can be bought through ChinaPub.)[ESL]: Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Second Edition, 2009.
(http://wwwstat.stanford.edu/~tibs/ElemStatLearn/index.html)[PRML]: Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
(I acknowledge Christopher D. Manning for allowing me to use his slides, and to make some modifications if desired. The slides for Lec 1 are mainly adapted from those provided by Bruce Croft.)
Date
Topics
Slides
Readings
Sep 12Introduction: Web search overview, Web crawling and indexes
IIR Ch. 19  20
SE Ch. 1  3
Web Crawling (from Bing Liu) Sep 14Boolean retrieval
IIR Ch.1
Sep 19The term vocabulary and postings lists
IIR Ch.2
Sep 21Dictionaries and tolerant retrieval
IIR Ch.3
Sep 26Index construction and compression
IIR Ch.4  5
Sep 28Scoring, term weighting, and the vector space model
IIR Ch.6
Oct 10Computing scores in a complete search system
IIR Ch.7
Oct 12Evaluation and relevance feedback
IIR Ch.8  9
Oct 17Probabilistic information retrieval
IIR Ch.11
Oct 19Language models
IIR Ch.12
Oct 21 Form groups,and select a paper (a topic). Then send the group and paper information to TA. Deadline: 23:59pm Oct 24 Matrix factorization and latent semantic indexingIIR Ch.18
Oct 26 Link analysis: PageRank and HITSIIR Ch.21
Oct 31Supervised learning: classification
IIR Ch.13  15
Nov 02Unsupervised learning: clustering
IIR Ch. 16 17
data structure, design and analysis of algorithms, linear algebra, probability theory
1. In class quizzes (30%)
2. Homework (30%)
3. Project + presentation (40%)
Assignments turned in late will be penalized 20% per late day.
Honesty and integrity are central to the academic work. All your submitted assignments must be entirely your own (or your own group's). Any student found cheating or performing plagiarism will receive a final score of zero for this course.
