APAN 5400 - Managing Data - Assignment 1 - Data Types & Interfaces


  • 作业标题: APAN 5400 - Assignment 1 Data Types & Interfaces
  • 课程名称:Columbia University APAN 5400 Managing Data
  • 完成周期:2天

In the Week 2 module, you have learned about various data structures, types, representations, encodings, and formats such as JSoN, HTML, XML, PDF. You have also been introduced to the RESTful API technology for accessing and retrieving data from the backoend data storage.

1. overview

In this assignment, you will import data from the provided JSoN and CSV data files and implement a simple REST API service to access and interact with the data programmatically. The API service may be implemented using Python requests or Flask packages. The purpose of this assignment is to assess your understanding of the basic data structures, types, formats, and interfaces.

2. objectives

After completing this assignment, you will be able to:

  • Import data from files into data structures in Python.
  • Recognize Python packages that deal with specific document formats (Word, JSoN, PDF, HTML, etc.) and encodings (UTFo8, ASCII, etc.)
  • Extract documents from the webobased repositories using API services
  • Extract metadata from different types of documents, such as emails, tweets, web pages.

3. Details

Download, unzip the provided Webhose dataset (webhose_apple.json) (JSoN) file and the Crunchbase data (cb_sample.csv) (CSV) file.

Develop Python modules in Jupyter Lab or in Google Colab to. Please use the Demo Code given to you as a reference:

  1. Read JSoN feeds from the Webhose file (webhose_apple.json) into an array of dictionaries or a Pandas dataframe
    • Print the JSON object schema
    • Print the count of JSON objects in the collection
  2. Read entries from Crunchbase CSV file (cb_sample.csv) into an array of dictionaries or a Pandas dataframe
    • Print the CSV schema
    • Print the count of entries in the collection
  3. Implement Flask API in Jupyter Notebook (Do Not Use Google Colab) functions that
    • Return a list of Crunchbase companies (from the cb_sample.csv file) based in the city of New York
    • Take a user input through an API endpoint to return a list of JSoN objects where the “title” fields contain the query string

4. Assessment

Your submission will be assessed based on the implementation and correctness of output for the following items:

  • Reading JSoN file into the data structure (25 pts)
  • Reading CSV file into the data structure (25 pts)
  • Python Flask API service (50 pts)

Please see the attached rubric for detailed assessment criteria.

。。。


文章作者: 量子数字
版权声明: 本博客所有文章除特別声明外,均采用 CC BY-NC-ND 4.0 许可协议。转载请注明来源 量子数字 !
  目录