- 作业标题: APAN 5400 - Assignment 8 MapReduce Application
- 课程名称:C-lumbia University APAN 5400 Managing Data
- 完成周期:2天
This assignment will reinforce your understanding of the MapReduce algorithm by requiring you to trace the working of the MapReduce algorithm on a specific example.
Details
Step 1 - Run the MongoDB Docker Container
Refer to our MongoDB Installation Guide
Step 2 - Create a MongoDB database
Refer to Week5_MongoDB.ipynb download, MongoDB Basics Tutorial (Links to an external site.) and PyMongo Tutorial (Links to an external site.):
Use PyMongo (from the Week10_MapReduce.ipynb Notebook (This will be sent to you as an announcment)), create a MongoDB database collection by importing data from the webhose_netflix.json downloadfile
Step 3 - Build PyMongo MapReduce queries (with explicit map and reduce functions) against the collection to do the following:
Refer to the Week 10 Class Exercise, MongoDB MapReduce Tutorial (Links to an external site.), and PyMongo Tutorial (Links to an external site.):
- Use MapReduce functions in MongoDB and PyMongo to get the total count of “Netflix” mentions in the body (“text” field) of the articles with domain rank below 20,000
- Use MapReduce functions in MongoDB and PyMongo to find and print names of distinct person entities with counts sorted in descending order.
- Use MapReduce functions in MongoDB and PyMongo to find and print titles that contain the terms “Netflix” and “Disney” published between 2020-05-20 and 2020-05-30
Assessment
See the attached rubric for detailed assessment criteria.
Submission
To complete your submission,
- Please submit a PDF file or Word Document.
- Click the blue Submit Assignment button at the top of this page.
- Click the Choose File button, and locate your submission.
- Feel free to include a comment with your submission.
- Finally, click the blue Submit Assignment button.
。。。