The scenario for this case study was as follows:
You are a junior data analyst working on the marketing analyst team at Cyclistic, a bike-share
company in Chicago. The director of marketing believes the company’s future success
depends on maximizing the number of annual memberships. Therefore, your team wants to
understand how casual riders and annual members use Cyclistic bikes differently.
From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
Basic summary statistics were obtained from a pivot table.
The planned application for this analysis is to convert casual riders into members, yet casual riders make up only 9% of total rides;
having unique user IDs in the data or at least a user count of members and casual riders would allow for a better picture of the target demographic.
Additionally, one might generally suspect that no one is actually riding for negative 9 minutes or 2,953 hours.
In a real life scenario, Cyclistic could ideally advise on the minimum and maximum time lengths to consider as real rides.
In the absence of both that guidance and any clear breakpoint in the distribution of ride times, 1 minute 1 second was used as the minimum time
since the 2019 data appeared to use that as the minimum rental time.
Using a column of custom set times and a column of COUNTIFS formulas, ride time lengths were sorted into buckets to figure out where to cut off
the longest rides. A skewed axis is used here since the vast majority of rides are clustered around a length of 5 minutes.
Other spreadsheet cleaning included standardizing the column names and value formats of the 2019 data to those of the 2020 data.
The relevant columns of the remaining 782,127 records were then uploaded to BigQuery Sandbox to speed up calculations.
The first area of analysis was looking at what time of day members and casual riders typically start their rides. Rather than trying to plot hundreds of thousands of individual data points, times were rounded to the nearest quarter hour and tallied up with SQL before exporting back to Google Sheets for graphing.
An aggregated area chart of the data gives a clear picture that members have a bimodal distribution with peaks just before and after the 9 to 5 work day, whereas casual riders have a more normal distribution clustered in the afternoon.
Next up was analyzing the data by day of the week.
Oddly, INTERVALs in BigQuery don't appear to have their own FORMAT function and don't export to spreadsheets very well.
As a workaround, a TIME object was created from the EXTRACTed values for average ride length.
The earlier insight on time of day habits inspired Friday to volunteer to act as a weekday for members and as part of the weekend for casual riders.
While casual rides vary more in length than members, they are consistently about 3 times longer than member rides.
Although members take fewer weekend rides, they are still about same length as their short weekday rides; this is likely due to weekend workers and/or errands.
The overall take away from the analysis is that members use the service primarily as a means to commute to work, whereas casual members tend to take longer joy rides.
Marketing strategies might include prompting casual riders to consider bike-sharing as a commute option, or price comparisons on how many longer rides would make membership cheaper than paying per ride.
Rider IDs or data on the number of members & casual riders would allow for comparison of ride frequency.
Geolocation data could show areas needing more stations or bikes to convert casual riders to members.
Annualized data analysis may reveal seasonal differences between rider types.
Cyclistic advised 8% of users use assistive options; data on bike type could be another area of analysis.
About Me
A math and science nerd looking for work in the data analysis field. I've previously worked as an analyst in medical management consulting and worldwide freight-forwarding logistics and am returning to data analysis after working as a math teacher and tutor.
I sometimes spend more time making spreadsheets to analyze video games than actually playing them
and have extensive professional experience with Google Sheets and Microsoft Excel
(including VBA Macros and both Office Suites.)
Other video game time-thieves include Python and AutoHotkey,
and I also have practice with SQL, R, Java, C,
and familiarity with Tableau, HTML, CSS, and PHP.
