Logistics Data — There’s more to it than meets the eye!
I wanted to get a better understanding of logistics data. Logistics usually involves getting a package from A to B. Sounds easy, right? But where A and B are can make all the difference. What’s the mode of transport? Is there a lot of traffic in between? Are A and B, different countries? And think about the other factors that affect a package’s delivery - the weather, holidays in a country, and even COVID restrictions/regulations. There are more things to consider. That’s why it makes it an interesting problem to tackle.
Wait I said ‘problem.’ I’m talking about — what interesting insights could we gather from logistics data?
- Understand what factors affect package delivery
- Estimated time for a package to be delivered
- Predict whether a package will be late or on-time
- Optimize the route of package delivery
- Clustering of package delivery times
So what’s a logistic dataset look like? Let’s look at a Kaggle dataset for truck deliveries (download data). Here are some important variables/features.
- A — Latitude-longitude/ or name of origin.
- B — Latitude-longitude/or name of the destination. Should we look at coordinates or treat places as categorical labels?
- Transportation distance — Total distance of travel.
- Vehicle Type — The type of truck affects the speed and capacity to hold packages.
- Minimum distance to be covered in a day — Drivers may need to cover a certain amount of distance. Different companies have different requirements.
- Type of trip — Regular vendors on contract. Market vendor without a contract. Are the delivery times different?
- Driver Name — Details of the driver. If a model could be personalized to the driver, would it give more accurate predictions?
- Customer ID — Customer details. Does the driver have to wait longer for certain customers?
So there are lots of interesting questions to ask. But how can you turn this into a prediction problem? You also have to consider possible target variables:
- On-time — Was the package delivered on time? This would lead to a binary classification problem which would be evaluated by metrics (accuracy, precision, recall, and/or f1).
- Date-of-delivery — When the package was delivered? This is a numerical prediction problem that would be evaluated by mean squared error (or other distance metrics).
Dealing with logistics data seems like a simple problem but there’s more to it than meets the eye!
If you learned something new, please like, share and comment below.