Photo by Asael Peña on Unsplash


Part 1: Data Gathering

Part 1.1: Downloading data

Part 2: Data Analysis

Distribution of hourly entries based on rain condition

Observation: This distribution is highly right-skewed for both the cases. Hourly entries are significantly higher when it doesn’t rain. It means more people use the subway when it is not raining.

Part 3: MapReduce

Preview of weather data CSV file
Preview of mapper results
Preview of reducer results


Future Improvements

Not even a single mention of HDFS?

Writer • Mentor • Recovering Shopaholic • IITR 2019 • ✍🏼 Personal Growth, Positive Psychology & Lifelong Learning• IG: sanjeevai • List: