My previous project “MTA Subway Ridership” presented the average weekday subway ridership data in bar charts, showing accumulated ridership of five boroughs, changes in ridership and stations ranking. In the current project, I matched the weekday average ridership data with geographical information and visualized the dataset on an interactive map.
MTA releases the ridership dataset every year on their website, and the dataset had been visualized previously. A very recent visualization was published on “Measuring a growing subway ridership” by The New York World in March 2015. This is an interactive visualization based on Google map. Stations are represented by same size circles, and are colored in red or green to show ridership increase or decrease from previous year. Since the main purpose of this visualization is to emphasize change in ridership, individual stations were not differentiated and readers need to click on the circles to find out station ridership. One good feature of this graph is that readers can select to view stations on individual subway lines through the drop down menu on the top.
(Credit: The New York World)
A visualization by New York Times was posted in their article “New York City Subways: Mostly Fewer Riders” in 2010. This visualization is basically what I had in mind: showing ridership breakdown with sized bubble at each station with colored subway lines. There are two additional graphs in different color showing increase and decrease in ridership. Both the New York Times map and the previous example used simple gray base map, which reduced distractions. Also, when mousing over the stations, an information window would give reader more specific information.
On the Left: ridership is down; on the right: ridership is up
Another visualization by Visual News used simplified shape instead of Google Map as base map. Both size and color were used to visualize ridership data. To some degree this is more intuitive and would aid readers to find the stations from overlapping bubbles, but it also could be confusing. This map is not interactive, so station names were added to the map. One thing in this visualization that shaped my design is that a list of “Top 5/Medium 5 stations” were attached to the graph, which would be the information that many people inquire.
(Credit: Visual News)
Inspired by previous visualizations, I developed my interactive ridership map.
To use interactive functions click HERE
I choose the “Position” base map, a reduced gray map with only neighborhood names. From the second and third example, it seems that size is more intuitive at a glance. Therefore, instead of using colors, I chose bubbles to represent ridership. I used “Heads/Tails” for quantification since that ranked station data distribution has a big head and a very long tail. Station names, ridership data and rank would shown in information window when mouse over or click on the station. Then I added a layer of subway lines and colored them according to the MTA subway map. To introduce the map to reader, a title, description and top 10 ranked station were shown on the left part of the map.
Software I used include Microsoft Excel and CartoDB. Ridership dataset comes from MTA website, and the Shapefile of stations location and subway lines were found on a personal blog, which was built based on public MTA data.
It was convenient since the shapefile is well organized. The trouble is that station names were not spelled in the same way as it was in ridership dataset, and cannot match automatically. So I exported the shapefile from CartoDB to a csv file, and manually matched station ID from the shapfile to the ridership dataset, which took me some time. Another problem is that in the ridership dataset, transfer stations or stations with multiple lines were merged. It was done in purpose for counting ridership, but would result in duplicated ridership data on the map. So I decided to skip shpefile stations that where merged to another station in order to avoid overlapping confusions. However, the shapefile stations without ridership data still show up on the map, and I need to manually remove those stations.
Since the number of subway lines exceeds the automatic color category, I added the subway lines through changing CSS. I tried to add another layer with ridership change in stations represented by colored circles, following the first map example (The New York World). However, it seems that negative numbers cannot be processed or mapped through current Choropleth feature. I also hope to add filter that allow readers see individual lines, which wont work on CartoDB either.
Nevertheless, this visualization stills reveal interesting ridership patterns. We can see from the map that Manhattan is the busiest borough, especially near financial district and midtown. In Brooklyn, the downtown area is relatively busy. It is also possible to estimate that 7 train and 4/5/6 train were more packed on a weekday.
Next step for this project is to try other visualization tools that would allow me add filters and another layer for showing changes in ridership. I also realized that ridership on Staten Island is missing. Perhaps the statistic is small enough that it can be ignored.