Logistics Data — There’s more to it than meets the eye!

Eddie Toth
2 min readMay 31, 2021

I wanted to get a better understanding of logistics data. Logistics usually involves getting a package from A to B. Sounds easy, right? But where A and B are can make all the difference. What’s the mode of transport? Is there a lot of traffic in between? Are A and B, different countries? And think about the other factors that affect a package’s delivery - the weather, holidays in a country, and even COVID restrictions/regulations. There are more things to consider. That’s why it makes it an interesting problem to tackle.

Wait I said ‘problem.’ I’m talking about — what interesting insights could we gather from logistics data?

  1. Understand what factors affect package delivery
  2. Estimated time for a package to be delivered
  3. Predict whether a package will be late or on-time
  4. Optimize the route of package delivery
  5. Clustering of package delivery times
Photo by Adeolu Eletu on Unsplash

So what’s a logistic dataset look like? Let’s look at a Kaggle dataset for truck deliveries (download data). Here are some important variables/features.

  • A — Latitude-longitude/ or name of origin.
  • B — Latitude-longitude/or name of the destination. Should we look at coordinates or treat places as categorical labels?
  • Transportation distance — Total distance of travel.
  • Vehicle Type — The type of truck affects the speed and capacity to hold packages.
  • Minimum distance to be covered in a day Drivers may need to cover a certain amount of distance. Different companies have different requirements.
  • Type of trip — Regular vendors on contract. Market vendor without a contract. Are the delivery times different?
  • Driver Name — Details of the driver. If a model could be personalized to the driver, would it give more accurate predictions?
  • Customer ID — Customer details. Does the driver have to wait longer for certain customers?

So there are lots of interesting questions to ask. But how can you turn this into a prediction problem? You also have to consider possible target variables:

  • On-time — Was the package delivered on time? This would lead to a binary classification problem which would be evaluated by metrics (accuracy, precision, recall, and/or f1).
  • Date-of-delivery — When the package was delivered? This is a numerical prediction problem that would be evaluated by mean squared error (or other distance metrics).

Dealing with logistics data seems like a simple problem but there’s more to it than meets the eye!

If you learned something new, please like, share and comment below.

--

--