CIS 5450 Homework 1: Data Wrangling and Cleaning (Fall 2024)Python

Java Python CIS 5450 Homework 1: Data Wrangling and Cleaning (Fall 2024)

Hello future data scientists and welcome to CIS 5450! In this homework, you will familiarize yourself with Pandas and Polars! Both are cute animals and essential libraries for Data Science. This homework is focused on one of the most important tasks in Data Science, preparing datasets so that they can be analyzed, plotted, used for machine learning models, etc...

This homework will be broken into analyzing several datasets across four sections!

1. Working with Amazon Prime Video Data to understand the details behind its movies

2. Working on merged/joined versions of the datasets (more on this later though).

3. Regex

4. Working with Used Cars Dataset and Polars to see performance between Pandas, eager execution in Polars, and lazy execution in Polars.

IMPORTANT NOTE: Before starting, you must click on the "Copy To Drive" option in the top bar. This is the master notebook so you will not be able to save your changes without copying it ! Once you click on that, make sure you are working on that version of the notebook so that your work is saved

Run the following 4 cells to setup the notebook

%set_env HW_ID=cis5450_fall24_HW1

%%capture

!pip install penngrader-client

from penngrader.grader import *

import pandas as pd

import numpy as np

import seaborn as sns

from string import ascii_letters

import matplotlib.pyplot as plt

import datetime as dt

import requests

from lxml import html

import math

import re

import json

import os

!wget -nc https://storage.googleapis.com/penn-cis5450/credits.csv

!wget -nc https://storage.googleapis.com/penn-cis5450/titles.csv

What is Pandas?

Apart from animals, Pandas is a Python library to aid with data manipulation/analysis. It is built with support from Numpy. Numpy is another Python package/library that provides effi cient calculations for matrices and other math problems.

Let's also get familiarized with the PennGrader. It was developed specifi cally for 545 by a previous TA, Leonardo Murri.

PennGrader was developed to provide students with instant feedback on their answer. You can submit your answer and know whether it's right or wrong instantly. We then record your most recent answer in our backend database. Let's tr

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值