Data-Informed Thinking + Doing

Hypothesis Testing

Testing statistical inferences on delivery data—using R, Python, and Julia.

Hypothesis testing is valuable in making inferences about on-time delivery by providing a systematic approach to assess the significance of factors influencing delivery performance. It allows us to formulate hypotheses, collect data, and analyze it to determine if there is evidence to support or reject these hypotheses.

With on-time delivery, hypothesis testing helps us evaluate the impact of various factors, such as transportation mode, order volume, or geographic location, on delivery performance. By testing hypotheses and drawing statistical conclusions, we can make informed decisions, optimize operations, and identify strategies to improve on-time delivery rates in supply chain management and logistics.

Let’s look at how we can apply this technique to the delivery dataset.

Getting Started

If you are interested in reproducing this work, here are the versions of R, Python, and Julia used (as well as the respective packages for each). Additionally, Leland Wilkinson’s approach to data visualization (Grammar of Graphics) has been adopted for this work. Finally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.

cat(R.version$version.string, R.version$nickname)
R version 4.2.3 (2023-03-15) Shortstop Beagle
require(devtools)
devtools::install_version("fst", version = "0.9.8", repos = "http://cran.us.r-project.org")
devtools::install_version("dplyr", version = "1.1.2", repos = "http://cran.us.r-project.org")
devtools::install_version("tibble", version = "3.2.1", repos = "http://cran.us.r-project.org")
devtools::install_version("ggplot2", version = "3.4.2", repos = "http://cran.us.r-project.org")
library(fst)
library(dplyr)
library(tibble)
library(ggplot2)
import sys
print(sys.version)
3.11.4 (v3.11.4:d2340ef257, Jun  6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
!pip install pandas==2.0.3
!pip install plotnine==0.12.1
import random
import datetime
import pandas
import plotnine
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server
using Pkg
Pkg.add(name="RData", version="1.0.0")
Pkg.add(name="CSV", version="0.10.11")
Pkg.add(name="DataFrames", version="1.5.0")
Pkg.add(name="CategoricalArrays", version="0.10.8")
Pkg.add(name="Colors", version="0.12.8")
Pkg.add(name="Cairo", version="1.0.5")
Pkg.add(name="Gadfly", version="1.4.0")
using Dates  # Included in Base
using CSV
using DataFrames
using CategoricalArrays
using Colors
using Cairo
using Gadfly
using MLJ
using GLM
late_shipments <- read.fst("../../dataset/late-shipments.fst")
str(late_shipments)
'data.frame':	1000 obs. of  26 variables:
 $ id                      : num  73003 41222 52354 28471 16901 ...
 $ country                 : chr  "Vietnam" "Kenya" "Zambia" "Nigeria" ...
 $ managed_by              : chr  "PMO - US" "PMO - US" "PMO - US" "PMO - US" ...
 $ fulfill_via             : chr  "Direct Drop" "Direct Drop" "Direct Drop" "Direct Drop" ...
 $ vendor_inco_term        : chr  "EXW" "EXW" "EXW" "EXW" ...
 $ shipment_mode           : chr  "Air" "Air" "Air" "Air" ...
 $ late_delivery           : num  0 0 0 1 0 0 0 0 0 0 ...
 $ late                    : chr  "No" "No" "No" "Yes" ...
 $ product_group           : chr  "ARV" "HRDT" "HRDT" "HRDT" ...
 $ sub_classification      : chr  "Adult" "HIV test" "HIV test" "HIV test" ...
 $ vendor                  : chr  "HETERO LABS LIMITED" "Orgenics, Ltd" "Orgenics, Ltd" "Orgenics, Ltd" ...
 $ item_description        : chr  "Efavirenz/Lamivudine/Tenofovir Disoproxil Fumarate 600/300/300mg, tablets, 30 Tabs" "HIV 1/2, Determine Complete HIV Kit, 100 Tests" "HIV 1/2, Determine Complete HIV Kit, 100 Tests" "HIV 1/2, Determine Complete HIV Kit, 100 Tests" ...
 $ molecule_test_type      : chr  "Efavirenz/Lamivudine/Tenofovir Disoproxil Fumarate" "HIV 1/2, Determine Complete HIV Kit" "HIV 1/2, Determine Complete HIV Kit" "HIV 1/2, Determine Complete HIV Kit" ...
 $ brand                   : chr  "Generic" "Determine" "Determine" "Determine" ...
 $ dosage                  : chr  "600/300/300mg" "N/A" "N/A" "N/A" ...
 $ dosage_form             : chr  "Tablet - FDC" "Test kit" "Test kit" "Test kit" ...
 $ unit_of_measure_per_pack: num  30 100 100 100 60 20 100 30 30 25 ...
 $ line_item_quantity      : num  19200 6100 1364 2835 112 ...
 $ line_item_value         : num  201600 542900 109120 252315 1618 ...
 $ pack_price              : num  10.5 89 80 89 14.4 ...
 $ unit_price              : num  0.35 0.89 0.8 0.89 0.24 1.6 0.8 0.55 0.12 0.45 ...
 $ manufacturing_site      : chr  "Hetero Unit III Hyderabad IN" "Alere Medical Co., Ltd." "Alere Medical Co., Ltd." "Alere Medical Co., Ltd." ...
 $ first_line_designation  : chr  "Yes" "Yes" "Yes" "Yes" ...
 $ weight_kilograms        : num  2719 3497 553 1352 1701 ...
 $ freight_cost_usd        : num  4085 40917 7845 31284 4289 ...
 $ line_item_insurance_usd : num  207.24 895.78 112.18 353.75 2.67 ...
summary(late_shipments)
       id          country           managed_by        fulfill_via        vendor_inco_term   shipment_mode      late_delivery      late           product_group      sub_classification    vendor          item_description   molecule_test_type    brand              dosage          dosage_form        unit_of_measure_per_pack line_item_quantity line_item_value     pack_price     unit_price   manufacturing_site first_line_designation weight_kilograms freight_cost_usd line_item_insurance_usd
 Min.   :   92   Length:1000        Length:1000        Length:1000        Length:1000        Length:1000        Min.   :0.00   Length:1000        Length:1000        Length:1000        Length:1000        Length:1000        Length:1000        Length:1000        Length:1000        Length:1000        Min.   :   1             Min.   :     1     Min.   :      0   Min.   :   0   Min.   : 0.0   Length:1000        Length:1000            Min.   :     1   Min.   :    14   Min.   :   0           
 1st Qu.:19492   Class :character   Class :character   Class :character   Class :character   Class :character   1st Qu.:0.00   Class :character   Class :character   Class :character   Class :character   Class :character   Class :character   Class :character   Class :character   Class :character   1st Qu.:  30             1st Qu.:   450     1st Qu.:  10067   1st Qu.:   7   1st Qu.: 0.1   Class :character   Class :character       1st Qu.:   136   1st Qu.:  1900   1st Qu.:  15           
 Median :39631   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Median :0.00   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Mode  :character   Median :  60             Median :  2744     Median :  62318   Median :  24   Median : 0.5   Mode  :character   Mode  :character       Median :   844   Median :  5887   Median :  95           
 Mean   :40308                                                                                                  Mean   :0.07                                                                                                                                                                              Mean   :  82             Mean   : 14291     Mean   : 151129   Mean   :  40   Mean   : 1.4                                             Mean   :  2102   Mean   : 11342   Mean   : 233           
 3rd Qu.:63148                                                                                                  3rd Qu.:0.00                                                                                                                                                                              3rd Qu.: 100             3rd Qu.: 10000     3rd Qu.: 219520   3rd Qu.:  70   3rd Qu.: 0.9                                             3rd Qu.:  2403   3rd Qu.: 15533   3rd Qu.: 329           
 Max.   :82005                                                                                                  Max.   :1.00                                                                                                                                                                              Max.   :1000             Max.   :333334     Max.   :2801262   Max.   :1243   Max.   :24.9                                             Max.   :154780   Max.   :289653   Max.   :4939           
Applied Advanced Analytics & AI in Sports