Course Materials
Books
- Mining of Massive Datasets,Jure Leskovec, Rajaraman and Ullman, Cambridge University Press, 2021 [link]
- Data-Intensive Text Processing with MapReduce, Chris Dyer and Jimmy Lin, 2010 [link]
- Learning Spark, Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia, O’Reilly Media, Inc., 2015 [link]
- Spark: The Definitive Guide: Big Data Processing Made Simple, Bill Chambers and Matei Zaharia, O’Reilly Media, Inc., 2018 [link]
Tools
- Libraries and Languages
Additional Course Material
- Additional Learning Material will be added as we move forward in the course.
| Research-Papers | Paper Link | Github Link | Related Videos | Related Tutorials |
|---|---|---|---|---|
| MapReduce: Simplified Data Processing on Large Clusters | [Link] | |||
| Bigtable: A Distributed Storage System for Structured Data | [Link] | |||
| Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing | [Link] | |||
| Discretized Streams: Fault-Tolerant Streaming Computation at Scale | [Link] |




