Randomized Algorithms. Rajeev Motwani and Prabhakar Raghavan.
Approximation Algorithms. Vijay V. Vazirani.
Mathematics for Computer Science. Eric Lehman, F Thomson Leighton, Albert R Meyer. Online at http://courses.csail.mit.edu/6.042/spring12/mcs.pdf
Reference books
Introduction to Algorithms. Thomas H. Cormen.
Computational Geometry: An Introduction. Franco P. Preparata and Michael I. Shamos.
Algorithms and Data Structures for External Memory. Jeffrey Scott Viter.
Probability and Computing. Mitzenmacher and Upfal.
Note: this book is more basic compared with Motwani’s book
Computational Complexity. Christos H. Papadimitriou.
Combinatorial Optimization: Algorithms and Complexity. Christos H. Papadimitriou and Kenneth Steiglitz.
Computers and Intractability: A Guide to the Theory of NP. M. R. Garey and D. S. Johnson.
Data Stream
Michael Greenwald, Sanjeev Khanna: Space-Efficient Online Computation of Quantile Summaries. SIGMOD Conference 2001: 58-66
Mayur Datar, Aristides Gionis, Piotr Indyk, Rajeev Motwani: Maintaining stream statistics over sliding windows (extended abstract). SODA 2002: 635-644
Xuemin Lin, Hongjun Lu, Jian Xu, Jeffrey Xu Yu: Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream. ICDE 2004: 362-373
Kyriakos Mouratidis, Spiridon Bakiras, Dimitris Papadias: Continuous monitoring of top-k queries over sliding windows. SIGMOD Conference 2006: 635-646
Gurmeet Singh Manku, Rajeev Motwani: Approximate Frequency Counts over Data Streams. VLDB 2002: 346-357
Richard M. Karp, Scott Shenker, Christos H. Papadimitriou: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28: 51-55 (2003)
Pedro Domingos, Geoff Hulten: Mining high-speed data streams. KDD 2000: 71-80
Daniel M. Kane, Jelani Nelson, David P. Woodruff: An optimal algorithm for the distinct elements problem. PODS 2010: 41-52
Edith Cohen: Size-Estimation Framework with Applications to Transitive Closure and Reachability. J. Comput. Syst. Sci. 55(3): 441-453 (1997)
Note: need to read its TR version.
Noga Alon, Yossi Matias, Mario Szegedy: The Space Complexity of Approximating the Frequency Moments. STOC 1996: 20-29
Graham Cormode, S. Muthukrishnan: An Improved Data Stream Summary: The Count-Min Sketch and Its Applications. LATIN 2004: 29-38
Rui Zhang, Nick Koudas, Beng Chin Ooi, Divesh Srivastava: Multiple Aggregations Over Data Streams. SIGMOD Conference 2005: 299-310
Phillip B. Gibbons, Yossi Matias: Synopsis Data Structures for Massive Data Sets. SODA 1999: 909-910
Mini Textbook
Graham Cormode, Minos N. Garofalakis, Peter J. Haas, Chris Jermaine: Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. Foundations and Trends in Databases 4(1-3): 1-294 (2012)
S. Muthukrishnan: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science 1(2) (2005)
Check out the 120 page version online
Graph
Graph Mining
Xifeng Yan, Philip S. Yu, Jiawei Han: Graph Indexing: A Frequent Structure-based Approach. SIGMOD Conference 2004: 335-346
James Cheng, Yiping Ke, Wilfred Ng, An Lu: Fg-index: towards verification-free query processing on graph databases. SIGMOD Conference 2007: 857-872
Chen Chen, Cindy Xide Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han: Mining Graph Patterns Efficiently via Randomized Summaries. PVLDB 2(1): 742-753 (2009)
Similarity Search
Zhiping Zeng, Anthony K. H. Tung, Jianyong Wang, Jianhua Feng, Lizhu Zhou: Comparing Stars: On Approximating Graph Edit Distance. PVLDB 2(1): 25-36 (2009)
Glen Jeh, Jennifer Widom: SimRank: a measure of structural-context similarity. KDD 2002: 538-543
Sanjeev Arora, Satish Rao, Umesh V. Vazirani: Geometry, flows, and graph-partitioning algorithms. Commun. ACM 51(10): 96-105, 2008
Jure Leskovec, Jon M. Kleinberg, Christos Faloutsos: Graphs over time: densification laws, shrinking diameters and possible explanations. KDD 2005: 177-187
Spatial-temporal and Multi-dimensional Databases
Spatial Databases
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331
Yannis Theodoridis, Timos K. Sellis: A Model for the Prediction of R-tree Performance. PODS 1996: 161-171
Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, Mario A. Lopez: Indexing the Positions of Continuously Moving Objects. SIGMOD Conference 2000: 331-342
Jun Zhang, Manli Zhu, Dimitris Papadias, Yufei Tao, Dik Lun Lee: Location-based Spatial Queries. SIGMOD Conference 2003: 443-454
Zaiben Chen, Heng Tao Shen, Xiaofang Zhou, Yu Zheng, Xing Xie: Searching trajectories by locations: an efficiency study. SIGMOD Conference 2010: 255-266
Top-k
Ronald Fagin, Amnon Lotem, Moni Naor: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4): 614-656 (2003)
Gao Cong, Christian S. Jensen, Dingming Wu: Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects. PVLDB 2(1): 337-348 (2009)
High-dimensional Space
Jon M. Kleinberg: Two Algorithms for Nearest-Neighbor Search in High Dimensions. STOC 1997: 599-608
Sariel Har-Peled, Piotr Indyk, Rajeev Motwani: Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality. Theory of Computing 8(1): 321-350 (2012)
H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang: iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30(2): 364-397 (2005)
Similarity Queries for Texts
Edit Distance
W. Masek and M. Paterson, A Faster Algorithm Computing String Edit Distances, Journal of Computer and System Sciences, vol. 20, pp. 18–31, 1980.
H. Berghel and D. Roach, An extension of ukkonen’s enhanced dynamic programming asm algorithm, ACM Trans. Inf. Syst., vol. 14, pp. 94–106, Jan. 1996.
Gene Myers: A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming. J. ACM 46(3): 395-415 (1999).
Prefix/Suffix Filtering and Mismatch Filtering
R. J. Bayardo, Y. Ma, and R. Srikant, Scaling up all pairs similarity search, in WWW, 2007.
Chuan Xiao, Wei Wang, Xuemin Lin, Jeffrey Xu Yu. Efficient Similarity Joins for Near Duplicate Detection. WWW 2008.
Edit Similarity Queries
Chen Li, Bin Wang, Xiaochun Yang: VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams. VLDB 2007: 303-314
S. Chaudhuri and R. Kaushik, Extending autocompletion to tolerate errors, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 707–718, ACM, 2009.
G. Li, D. Deng, J. Wang, and J. Feng, Pass-join: A partition-based method for similarity joins, PVLDB, vol. 5, no. 3, pp. 253–264, 2011.
G. Weikum. DB&IR: both sides now. In SIGMOD, pages 25–30, 2007.
Yi Chen, Wei Wang, Ziyang Liu, Xuemin Lin: Keyword search on structured and semi-structured data. SIGMOD Conference 2009: 1005-1010.
slides available at http://www.cse.unsw.edu.au/~weiw/project/keyword_sigmod09_tutorial.pptx
Also check out a later version: Yi Chen, Wei Wang, Ziyang Liu: Keyword-based search and exploration on databases. ICDE 2011: 1380-1383.
slides available at http://www.cse.unsw.edu.au/~weiw/project/ICDE11_KWS_tutorial_with_refs.pptx
Relational/Graph Algorithms
Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou. SPARK: Top-k Keyword Query in Relational Databases. SIGMOD 2007.
B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In ICDE, 2007
K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In SIGMOD, 2008.
P. P. Talukdar, M. Jacob, M. S. Mehmood, K. Crammer, Z. G. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. PVLDB, 1(1):785–796, 2008.
Hao He, Haixun Wang, Jun Yang, Philip S. Yu: BLINKS: ranked keyword searches on graphs. SIGMOD Conference 2007: 305-316
XML
C. Sun, C.-Y. Chan, and A. Goenka. Multiway SLCA-based keyword search in XML data. In WWW, 2007.
Junfeng Zhou, Zhifeng Bao, Wei Wang, Jinjia Zhao, and Xiaofeng Meng. Efficient query processing for XML keyword queries based on the IDList index. VLDB Journal, 2013.
Yufei Tao, Stavros Papadopoulos, Cheng Sheng, Kostas Stefanidis: Nearest keyword search in XML documents. SIGMOD Conference 2011: 589-600.
Effectiveness
Z. Liu and Y. Chen. Reasoning and identifying relevant matches for xml keyword search. PVLDB, 1(1):921–932, 2008.
Arash Termehchy, Marianne Winslett: Using structural information in XML keyword search effectively. ACM Trans. Database Syst. 36(1): 4 (2011).
D. Xin, Y. He, and V. Ganti. Keyword++: A Framework to Improve Keyword Search Over Entity Databases. PVLDB, 3(1):711–722, 2010.
Xiaohui Yu, Huxia Shi: CI-Rank: Ranking Keyword Search Results Based on Collective Importance. ICDE 2012: 78-89.
Ken Q. Pu: Keyword query cleaning using hidden Markov models. KEYS 2009: 27-32.
Zhen Zhang, Bin He, Kevin Chen-Chuan Chang: Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. SIGMOD Conference 2004: 107-118.
Uncertain Data Management
Anish Das Sarma, Omar Benjelloun, Alon Y. Halevy, Jennifer Widom: Working Models for Uncertain Data. ICDE 2006: 7
possible world semantics
Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar: Evaluating Probabilistic Queries over Imprecise Data. SIGMOD Conference 2003: 551-562
simple query, 1D data
Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kay Ngai, Ben Kao, Sunil Prabhakar: Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions. VLDB 2005: 922-933
indexing
Nilesh N. Dalvi, Dan Suciu: The dichotomy of conjunctive queries on probabilistic structures. PODS 2007: 293-302
theoretical foundation
Pei Jian, Bin Jiang, Xuemin Lin, Yidong Yuan: Probabilistic Skylines on Uncertain Data. VLDB 2007: 15-26
skyline
Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): 502-513 (2009)
ranking
Pei Jian, Bin Jiang, Xuemin Lin, Yidong Yuan: Probabilistic Skylines on Uncertain Data. VLDB 2007: 15-26
streaming
Xiang Lian, Lei Chen: A Generic Framework for Handling Uncertain Data with Local Correlations. PVLDB 4(1): 12-21 (2010)
George Beskales, Mohamed A. Soliman, Ihab F. Ilyas: Efficient Search for the Top-k Probable Nearest Neighbors in Uncertain Databases. PVLDB 1(1): 326 - 339 (2008).
nearest neighbor
Jianzhong Li, Zhaonian Zou, Hong Gao: Mining frequent subgraphs over uncertain graph databases under probabilistic semantics. VLDB J. 21(6): 753-777 (2012)