Course Materials

Books

  1. Mining of Massive Datasets,Jure Leskovec, Rajaraman and Ullman, Cambridge University Press, 2021 [link]
  2. Data-Intensive Text Processing with MapReduce, Chris Dyer and Jimmy Lin, 2010 [link]
  3. Learning Spark, Holden Karau, Andy Konwinski, Patrick Wendell and Matei Zaharia, O’Reilly Media, Inc., 2015 [link]
  4. Spark: The Definitive Guide: Big Data Processing Made Simple, Bill Chambers and Matei Zaharia, O’Reilly Media, Inc., 2018 [link]

Tools

  1. Libraries and Languages
    • Pythin: Python programming language [link]
    • Spark: Data Science, Data Engineeringn and Machine Learning tool [link]
    • Scala: The Scala Programming Language [link]
    • hadoop: hadoop library framework [link]

Additional Course Material


Research-Papers Paper Link Github Link Related Videos Related Tutorials
MapReduce: Simplified Data Processing on Large Clusters [Link]
Bigtable: A Distributed Storage System for Structured Data [Link]
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing [Link]
Discretized Streams: Fault-Tolerant Streaming Computation at Scale [Link]