Part 1: Data Gathering

Part 1.1: Downloading data

Part 2: Data Analysis

Distribution of hourly entries based on rain condition

Observation: This distribution is highly right-skewed for both the cases. Hourly entries are significantly higher when it doesn’t rain. It means more people use the subway when it is not raining.

Part 3: MapReduce

Preview of weather data CSV file
Preview of mapper results
Preview of reducer results


Future Improvements

