APAN 5400 - Managing Data - Assignment 8 MapReduce Application

  • 作业标题: APAN 5400 - Assignment 8 MapReduce Application
  • 课程名称:C-lumbia University APAN 5400 Managing Data
  • 完成周期:2天

This assignment will reinforce your understanding of the MapReduce algorithm by requiring you to trace the working of the MapReduce algorithm on a specific example.


Step 1 - Run the MongoDB Docker Container

Refer to our MongoDB Installation Guide 

Step 2 - Create a MongoDB database

Refer to Week5_MongoDB.ipynb  download, MongoDB Basics Tutorial (Links to an external site.) and PyMongo Tutorial (Links to an external site.):

Use PyMongo (from the Week10_MapReduce.ipynb Notebook (This will be sent to you as an announcment)), create a MongoDB database collection by importing data from the webhose_netflix.json  downloadfile

Step 3 - Build PyMongo MapReduce queries (with explicit map and reduce functions) against the collection to do the following:

Refer to the Week 10 Class Exercise, MongoDB MapReduce Tutorial (Links to an external site.), and PyMongo Tutorial (Links to an external site.):

  1. Use MapReduce functions in MongoDB and PyMongo to get the total count of “Netflix” mentions in the body (“text” field) of the articles with domain rank below 20,000
  2. Use MapReduce functions in MongoDB and PyMongo to find and print names of distinct person entities with counts sorted in descending order. 
  3. Use MapReduce functions in MongoDB and PyMongo to find and print titles that contain the terms “Netflix” and “Disney” published between 2020-05-20 and 2020-05-30


See the attached rubric for detailed assessment criteria.


To complete your submission,

  1. Please submit a PDF file or Word Document.
  2. Click the blue Submit Assignment button at the top of this page.
  3. Click the Choose File button, and locate your submission.
  4. Feel free to include a comment with your submission.
  5. Finally, click the blue Submit Assignment button.


文章作者: 量子数字
版权声明: 本博客所有文章除特別声明外,均采用 CC BY-NC-ND 4.0 许可协议。转载请注明来源 量子数字 !