CIS 5450 Homework 1: Data Wrangling and Cleaning (Fall 2024)Python

最新推荐文章于 2025-11-23 18:41:31 发布

原创

最新推荐文章于 2025-11-23 18:41:31 发布 · 695 阅读

11 ·

CC 4.0 BY-SA版权

文章标签：

#开发语言

Java Python CIS 5450 Homework 1: Data Wrangling and Cleaning (Fall 2024)

Hello future data scientists and welcome to CIS 5450! In this homework, you will familiarize yourself with Pandas and Polars! Both are cute animals and essential libraries for Data Science. This homework is focused on one of the most important tasks in Data Science, preparing datasets so that they can be analyzed, plotted, used for machine learning models, etc...

This homework will be broken into analyzing several datasets across four sections!

1. Working with Amazon Prime Video Data to understand the details behind its movies

2. Working on merged/joined versions of the datasets (more on this later though).

3. Regex

4. Working with Used Cars Dataset and Polars to see performance between Pandas, eager execution in Polars, and lazy execution in Polars.

IMPORTANT NOTE: Before starting, you must click on the "Copy To Drive" option in the top bar. This is the master notebook so you will not be able to save your changes without copying it ! Once you click on that, make sure you are working on that version of the notebook so that your work is saved

Run the following 4 cells to setup the notebook

%set_env HW_ID=cis5450_fall24_HW1

%%capture

!pip install penngrader-client

from penngrader.grader import *

import pandas as pd

import numpy as np

import seaborn as sns

from string import ascii_letters

import matplotlib.pyplot as plt

import datetime as dt

import requests

from lxml import html

import math

import re

import json

import os

!wget -nc https://storage.googleapis.com/penn-cis5450/credits.csv

!wget -nc https://storage.googleapis.com/penn-cis5450/titles.csv

What is Pandas?

Apart from animals, Pandas is a Python library to aid with data manipulation/analysis. It is built with support from Numpy. Numpy is another Python package/library that provides effi cient calculations for matrices and other math problems.

Let's also get familiarized with the PennGrader. It was developed specifi cally for 545 by a previous TA, Leonardo Murri.

PennGrader was developed to provide students with instant feedback on their answer. You can submit your answer and know whether it's right or wrong instantly. We then record your most recent answer in our backend database. Let's tr

最低0.47元/天解锁文章