Group Assignment - Case Study
BACKGROUND
BeachBoys BikeShare is a bike share service provider where users can take and return bikes at
any of the 70 sta�ons on their network. The company wants to leverage their data to beter
understand and, hopefully, op�mize their opera�ons. BikeShare has decided to start by
harnessing analy�cs to enhance opera�ons in the logis�cs department, by improving the
redistribu�on of bikes between sta�ons to meet demand, and ensuring that there are bikes and
return docks available when and where users need them.
As a key step towards tackling this challenge, management has tasked you to develop a model
capable of predic�ng the net rate of bike ren�ng for a given sta�on, which is defined as the
number of bikes returned to, minus the number of bikes taken from, the given sta�on in a given
hour. In other words, your model should enable BikeShare's logis�cs team to make the
statement - "In the next hour, the quan�ty of bikes at sta�on A will change by X”.
In addi�on, management would also like you to:
• Help them understand the factors that affect bike rental, which could inform future
decisions on where to locate BikeShare's sta�ons.
• Help them conceptualize how your predic�on may be used to improve the redistribu�on
of bikes within the network.
• Highlight any assump�ons or drawbacks of the analysis, if any, and suggest how they
may be verified or addressed in the future.
ASSIGNMENT
Explore, transform, and visualize the given data as appropriate, before using it to train and
evaluate an appropriate ML model for the problem. Address the issues highlighted by
management as described above, albeit in a less in-depth manner. Finally, ar�culate your
findings and recommenda�ons in a concise, coherent, and professional manner, making
reference to any earlier results or diagrams as appropriate to support your conclusions.
Please use Python to complete this task, using any libraries you might deem necessary for your
analysis, e.g., pandas, sklearn, etc. Detail your code, analysis findings, and recommenda�ons
clearly in a reproducible Jupyter notebook with appropriate comments and documenta�on, so
that an individual viewing the notebook will be able to follow through your steps and
understand the reasoning involved and inferences made.
DELIVERABLES
You should upload the following deliverables in a .zip file:
• A Jupyter notebook detailing your analysis and findings for this project, • A PDF-ed copy of the Jupyter notebook above, which should not exceed 30 pages,
• The datasets used in your analysis, which should be loaded into your notebook,
• Addi�onal files relevant to your analysis, which should be described in your notebook.
Finally, please prepare presenta�on slides for the group presenta�on (10-15 mins). All team
members need to present in English.
THE DATA
The company has collected informa�on on the sta�ons, trips taken, and on weather condi�ons
in each of the ci�es from September 2014 to August 2015. You can find the data here -
bikes_data.zip (3.1 MB). Below, you will also find detailed informa�on on all the fields available
in the dataset. The way you include this informa�on in your model is up to you and should be
clearly jus�fied and documented in your report. You are free to use any other data sources
provided you specify a link to this informa�on in your report.
Sta�on Data
• ld: sta�on ID number
• Name: name of sta�on
• Lat: la�tude
• Long: longitude
• Dock Count: number of total docks at sta�on
• City: one of San Francisco, Redwood City, Palo Alto, Mountain View, or San Jose
Please note that during the period covered by the dataset, several sta�ons were moved. Sta�ons
23, 25, 49, 69, and 72 became respec�vely sta�ons 85, 86, 87, 88, 89 (which in turn became 90
a�er a second move).
Trip Data
• Trip ld: numeric ID of bike trip
• Dura�on: �me of trip in seconds
• Start Date: start date of trip with date and �me, in Pacific Standard Time
• Start Sta�on: sta�on id of start sta�on
• Start Terminal: numeric reference for start sta�on
• End Date: end date of trip with date and �me, in Pacific Standard Time
• End Sta�on: sta�on id for end sta�on
• Subscrip�on Type: Subscriber (annual or 30-day member) or Customer (24hour or 3-day
member)
Weather Data
• Date: day for which the weather is being reported
• Temperature (day min, mean and max): in F
• Dew point (day min, mean and max): Temperature in F below which dew can form
• Humidity (day min, mean and max): in %
• Pressure (day min, mean and max): Atmospheric pressure at sea level in inches of
mercury
• Visibility (day min, mean and max): distance in miles
• Wind Speed (day max and mean): in mph
• Max Gust Speed: in mph
• Precipita�on: total amount of precipita�ons in inches
• Cloud Cover: scale of 0 (clear) tons (totally covered)
• Events: Special meteorological events
• Wind Direc�on: in degrees
• Zip: area code for San Francisco (94107), Redwood City (94063), Palo Alto (94301),
Mountain View (94041), and San Jose (95113)