Factor Analysis of Train Accidents in India

Sanjeev Yadav
5 min readDec 7, 2018
Utkal-Express Derailment: One of the fatal train accidents in 2017

The year 2017 was disappointing for Indian railways — safety-wise. Trains are the most common modes of travel for long-distance journeys in India.

It is more frequently used than flights because trains also cover nearby inter-city travels. Accidents occur, and there are measures that authorities can take to curtail.

In 2017, India had seen many accidents in railways due to several factors, including natural causes and human error.

I did factor analysis for the years 2002-2017. Results are based on significant accidents — accidents that received news coverage — till 2017, which made it unsafe to travel in trains.

Data source

I gathered data from various online journals, news archives, Wikipedia articles, etc.

Data were available online from 1890-2017, but I did the analysis for 2002-2017 because news reports on many old incidents were inaccurate, and some news reports weren’t even in detail.

To cover significant features affecting India's accident rate and closely compare them, the 2002-2017 year range provided enough observations (114) to make some conclusions.

I’ve hosted the data on my GitHub.

We’ve answered the following questions:

  1. Which region has experienced more accidents?
  2. Which season shows more accidents and why?
  3. What are the major causes?

#1. Which region has experienced more accidents?

Figure 1. Accidents based on region

The distribution based on railway division shows peaks at more general regions ( say bigger in the area ). If we consider n, nc, ne, nef and nw ( here are abbreviations ), they collectively represent accidents in North India. Some of the accidents are hard to categorise based on such granular distinctions.

To tackle this problem, we will see accidents in different parts of India, i.e., North, East, West, South, and Central, to generalise our idea.

Figure 2. Accidents categorised by region

Accidents in the South are almost half of that in the North region. In the North, the state which contributed the largest to the accident count is Uttar Pradesh.

Accidents in U.P. are so frequent that there is an entire Wikipedia article on it.

I did not visualise district-level data because that would be too specific and time-consuming.

#2. Which seasons show more accidents and why?

Figure 3. Accidents categorised by season

These peaks do not directly convey the number of accidents in each season because every season lasts for a different period.

Autumn lasts for the shortest period, and so the count is also less for it. We can include more features in this plot to consolidate our idea.

Figure 4. Effect of season and visibility

Blue peaks clarify that visibility plays a significant role here. Most of the accidents occur during nighttime or wee hours. There is an unexpected red peak in winter because dense fog in the morning reduces visibility.

#3. What are the major causes?

Figure 5. Accidents categorised by cause

Here is a brief explanation of causes:

  • Attack: It includes external forces like bombing by terrorists, attack by a local mob, sabotage, hijack.
  • Human error: It includes improper signalling, speeding by drivers, un-monitored level crossing, negligence about train timings on level-crossing. One incident is when people were watching fireworks while standing on the railway track.
  • Natural: these are beyond our control. Natural causes are heavy rain, flash flood, dense fog, etc.
  • Technical: It includes malfunctioning of railway system like no prior alert for the driver to stop, poor maintenance of track, track pending for construction, brake fail, no warning about poor tracks lying ahead.
  • Unclear track: when there are boulders, another train or roadways vehicle on track.

Human error is responsible for most train accidents because they are directly related to the shortcomings in the system.

Mistakes like improper signalling and poor maintenance of tracks are susceptible areas for railway security. In some cases, the train driver overshot the red signal. The railway administration shouldn’t tolerate these mistakes on such a large scale.

Proper timings can easily avoid over-speeding for crossings and loop-line entry.


From the data of 16 years, it is clear that some cases are preventable.

Natural causes are sporadic in causing accidents. Every year, train accidents occur, and the exact cause of the accident is unknown in many cases.

If the investigations of accidents are more serious, then there are chances of improvement in disaster management.

Internal accidents like compartment fire are rare nowadays.

Human negligence is a significant cause and will remain in future if train drivers give no attention, roadways riders stall on tracks, people walk on tracks, vehicles stay on tracks, etc.


There are some limitations to this data:

  • It doesn’t address a solution for external attacks.
  • Condition ( old or newly launched ) of trains is not known.

Feel free to extend this analysis. If you have a new idea to share with me, you can find a detailed analysis in the project’s GitHub repo.

In upcoming posts, I will be sharing the projects that I did as a part of Udacity’s Data Analyst Nanodegree Program.


This project was not possible without the supervision of Prof. J. K. Nayak, Department of Management Studies, IIT Roorkee.



Sanjeev Yadav

Writer • Mentor • Recovering Shopaholic • IITR 2019 • ✍🏼 Personal Growth, Positive Psychology & Lifelong Learning• IG & Threads: sanjeevai