Sampling: First and Last Records
Fast, efficient, and easy verification for initial data exploration—using Julia, Python, and R.
Imagine you’ve just received a CSV file containing student mathematics performance data. The school district claims they’ve been consistent with their data collection, but you’ve been around long enough to know that “consistent” is relative. Before you invest hours building models or generating reports, you need answers to critical questions:
- Did the data load properly?
- Are the column names intact?
- Did the data collection methods change over time?
This is where a quick sampling of first and/or last n rows shines. By examining both extremes of your dataset, you can quickly spot issues that would otherwise derail your analysis hours later.
Why Both Ends Matter
Looking at just the first few rows tells you how the data collection began. Looking at the last few rows shows you how it ended. In time-ordered data, this is especially invaluable. Perhaps the school changed their testing format midway through. Maybe they added new demographic fields in year three. Or perhaps there’s a data quality issue where recent entries are incomplete. Checking both ends takes seconds but can save you from embarrassing mistakes like training a model on inconsistent data formats or missing a critical data migration that happened between row 5,000 and row 5,001.
Ingesting the Data
Let’s load our student mathematics performance data, a CSV file named student-math.csv with columns for student information and performance.
using DataFrames
using CSV
# Load the data
student_math_jl = CSV.File("../../dataset/uci-ml-repo/student-performance/student-math.csv"; delim=";") |> DataFrames.DataFrame;
println("Dataset loaded: $(nrow(student_math_jl)) rows, $(ncol(student_math_jl)) columns")
Dataset loaded: 395 rows, 33 columns
import pandas
# Load the data
student_math_py = pandas.read_csv("../../dataset/uci-ml-repo/student-performance/student-math.csv", sep=";")
print(f"Dataset loaded: {student_math_py.shape[0]} rows, {student_math_py.shape[1]} columns")
Dataset loaded: 395 rows, 33 columns
#Load data
student_math_r <- read.csv("../../dataset/uci-ml-repo/student-performance/student-math.csv", sep=";", stringsAsFactors=TRUE)
cat(sprintf("Dataset loaded: %d rows, %d columns\n", nrow(student_math_r), ncol(student_math_r)))
Dataset loaded: 395 rows, 33 columns
Exploring the First n Rows
When exploring a new dataset, one of the first things you might want to do is to take a look at the first few records. This helps you get a sense of the data structure, types of variables, and potential issues that may need cleaning or transformation. For example, if you want to view the first 12 records of the dataset, you can use the following code snippets in each language:
# View first 12 rows
first(student_math_jl, 12)
12×33 DataFrame
Row │ school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian traveltime studytime failures schoolsup famsup paid activities nursery higher internet romantic famrel freetime goout Dalc Walc health absences G1 G2 G3
│ String3 String1 Int64 String1 String3 String1 Int64 Int64 String15 String15 String15 String7 Int64 Int64 Int64 String3 String3 String3 String3 String3 String3 String3 String3 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ GP F 18 U GT3 A 4 4 at_home teacher course mother 2 2 0 yes no no no yes yes no no 4 3 4 1 1 3 6 5 6 6
2 │ GP F 17 U GT3 T 1 1 at_home other course father 1 2 0 no yes no no no yes yes no 5 3 3 1 1 3 4 5 5 6
3 │ GP F 15 U LE3 T 1 1 at_home other other mother 1 2 3 yes no yes no yes yes yes no 4 3 2 2 3 3 10 7 8 10
4 │ GP F 15 U GT3 T 4 2 health services home mother 1 3 0 no yes yes yes yes yes yes yes 3 2 2 1 1 5 2 15 14 15
5 │ GP F 16 U GT3 T 3 3 other other home father 1 2 0 no yes yes no yes yes no no 4 3 2 1 2 5 4 6 10 10
6 │ GP M 16 U LE3 T 4 3 services other reputation mother 1 2 0 no yes yes yes yes yes yes no 5 4 2 1 2 5 10 15 15 15
7 │ GP M 16 U LE3 T 2 2 other other home mother 1 2 0 no no no no yes yes yes no 4 4 4 1 1 3 0 12 12 11
8 │ GP F 17 U GT3 A 4 4 other teacher home mother 2 2 0 yes yes no no yes yes no no 4 1 4 1 1 1 6 6 5 6
9 │ GP M 15 U LE3 A 3 2 services other home mother 1 2 0 no yes yes no yes yes yes no 4 2 2 1 1 1 0 16 18 19
10 │ GP M 15 U GT3 T 3 4 other other home mother 1 2 0 no yes yes yes yes yes yes no 5 5 1 1 1 5 0 14 15 15
11 │ GP F 15 U GT3 T 4 4 teacher health reputation mother 1 2 0 no yes yes no yes yes yes no 3 3 3 1 2 2 0 10 8 9
12 │ GP F 15 U GT3 T 2 1 services other reputation father 3 3 0 no yes no yes yes yes yes no 5 2 2 1 1 4 4 10 12 12
# Check column types
describe(student_math_jl)
33×7 DataFrame
Row │ variable mean min median max nmissing eltype
│ Symbol Union… Any Union… Any Int64 DataType
─────┼────────────────────────────────────────────────────────────────────
1 │ school GP MS 0 String3
2 │ sex F M 0 String1
3 │ age 16.6962 15 17.0 22 0 Int64
4 │ address R U 0 String1
5 │ famsize GT3 LE3 0 String3
6 │ Pstatus A T 0 String1
7 │ Medu 2.74937 0 3.0 4 0 Int64
8 │ Fedu 2.52152 0 2.0 4 0 Int64
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
27 │ Dalc 1.48101 1 1.0 5 0 Int64
28 │ Walc 2.29114 1 2.0 5 0 Int64
29 │ health 3.55443 1 4.0 5 0 Int64
30 │ absences 5.70886 0 4.0 75 0 Int64
31 │ G1 10.9089 3 11.0 19 0 Int64
32 │ G2 10.7139 0 11.0 19 0 Int64
33 │ G3 10.4152 0 11.0 20 0 Int64
18 rows omitted
# View first 12 rows
student_math_py.head(n=12)
school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian traveltime studytime failures schoolsup famsup paid activities nursery higher internet romantic famrel freetime goout Dalc Walc health absences G1 G2 G3
0 GP F 18 U GT3 A 4 4 at_home teacher course mother 2 2 0 yes no no no yes yes no no 4 3 4 1 1 3 6 5 6 6
1 GP F 17 U GT3 T 1 1 at_home other course father 1 2 0 no yes no no no yes yes no 5 3 3 1 1 3 4 5 5 6
2 GP F 15 U LE3 T 1 1 at_home other other mother 1 2 3 yes no yes no yes yes yes no 4 3 2 2 3 3 10 7 8 10
3 GP F 15 U GT3 T 4 2 health services home mother 1 3 0 no yes yes yes yes yes yes yes 3 2 2 1 1 5 2 15 14 15
4 GP F 16 U GT3 T 3 3 other other home father 1 2 0 no yes yes no yes yes no no 4 3 2 1 2 5 4 6 10 10
5 GP M 16 U LE3 T 4 3 services other reputation mother 1 2 0 no yes yes yes yes yes yes no 5 4 2 1 2 5 10 15 15 15
6 GP M 16 U LE3 T 2 2 other other home mother 1 2 0 no no no no yes yes yes no 4 4 4 1 1 3 0 12 12 11
7 GP F 17 U GT3 A 4 4 other teacher home mother 2 2 0 yes yes no no yes yes no no 4 1 4 1 1 1 6 6 5 6
8 GP M 15 U LE3 A 3 2 services other home mother 1 2 0 no yes yes no yes yes yes no 4 2 2 1 1 1 0 16 18 19
9 GP M 15 U GT3 T 3 4 other other home mother 1 2 0 no yes yes yes yes yes yes no 5 5 1 1 1 5 0 14 15 15
10 GP F 15 U GT3 T 4 4 teacher health reputation mother 1 2 0 no yes yes no yes yes yes no 3 3 3 1 2 2 0 10 8 9
11 GP F 15 U GT3 T 2 1 services other reputation father 3 3 0 no yes no yes yes yes yes no 5 2 2 1 1 4 4 10 12 12
# Check data types and missing values
student_math_py.dtypes
school object
sex object
age int64
address object
famsize object
Pstatus object
Medu int64
Fedu int64
Mjob object
Fjob object
reason object
guardian object
traveltime int64
studytime int64
failures int64
schoolsup object
famsup object
paid object
activities object
nursery object
higher object
internet object
romantic object
famrel int64
freetime int64
goout int64
Dalc int64
Walc int64
health int64
absences int64
G1 int64
G2 int64
G3 int64
dtype: object
# View first 12 rows
head(x=student_math_r, n=12)
school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian traveltime studytime failures schoolsup famsup paid activities nursery higher internet romantic famrel freetime goout Dalc Walc health absences G1 G2 G3
1 GP F 18 U GT3 A 4 4 at_home teacher course mother 2 2 0 yes no no no yes yes no no 4 3 4 1 1 3 6 5 6 6
2 GP F 17 U GT3 T 1 1 at_home other course father 1 2 0 no yes no no no yes yes no 5 3 3 1 1 3 4 5 5 6
3 GP F 15 U LE3 T 1 1 at_home other other mother 1 2 3 yes no yes no yes yes yes no 4 3 2 2 3 3 10 7 8 10
4 GP F 15 U GT3 T 4 2 health services home mother 1 3 0 no yes yes yes yes yes yes yes 3 2 2 1 1 5 2 15 14 15
5 GP F 16 U GT3 T 3 3 other other home father 1 2 0 no yes yes no yes yes no no 4 3 2 1 2 5 4 6 10 10
6 GP M 16 U LE3 T 4 3 services other reputation mother 1 2 0 no yes yes yes yes yes yes no 5 4 2 1 2 5 10 15 15 15
7 GP M 16 U LE3 T 2 2 other other home mother 1 2 0 no no no no yes yes yes no 4 4 4 1 1 3 0 12 12 11
8 GP F 17 U GT3 A 4 4 other teacher home mother 2 2 0 yes yes no no yes yes no no 4 1 4 1 1 1 6 6 5 6
9 GP M 15 U LE3 A 3 2 services other home mother 1 2 0 no yes yes no yes yes yes no 4 2 2 1 1 1 0 16 18 19
10 GP M 15 U GT3 T 3 4 other other home mother 1 2 0 no yes yes yes yes yes yes no 5 5 1 1 1 5 0 14 15 15
11 GP F 15 U GT3 T 4 4 teacher health reputation mother 1 2 0 no yes yes no yes yes yes no 3 3 3 1 2 2 0 10 8 9
12 GP F 15 U GT3 T 2 1 services other reputation father 3 3 0 no yes no yes yes yes yes no 5 2 2 1 1 4 4 10 12 12
# Check structure
str(student_math_r)
'data.frame': 395 obs. of 33 variables:
$ school : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
$ age : int 18 17 15 15 16 16 16 17 15 15 ...
$ address : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
$ famsize : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
$ Pstatus : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
$ Medu : int 4 1 1 4 3 4 2 4 3 3 ...
$ Fedu : int 4 1 1 2 3 3 2 4 2 4 ...
$ Mjob : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
$ Fjob : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
$ reason : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
$ guardian : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
$ traveltime: int 2 1 1 1 1 1 1 2 1 1 ...
$ studytime : int 2 2 2 3 2 2 2 2 2 2 ...
$ failures : int 0 0 3 0 0 0 0 0 0 0 ...
$ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
$ famsup : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
$ paid : Factor w/ 2 levels "no","yes": 1 1 2 2 2 2 1 1 2 2 ...
$ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
$ nursery : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
$ higher : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
$ internet : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
$ romantic : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
$ famrel : int 4 5 4 3 4 5 4 4 4 5 ...
$ freetime : int 3 3 3 2 3 4 4 1 2 5 ...
$ goout : int 4 3 2 2 2 2 4 4 2 1 ...
$ Dalc : int 1 1 2 1 1 1 1 1 1 1 ...
$ Walc : int 1 1 3 1 2 2 1 1 1 1 ...
$ health : int 3 3 3 5 5 5 3 1 1 5 ...
$ absences : int 6 4 10 2 4 10 0 6 0 0 ...
$ G1 : int 5 5 7 15 6 15 12 6 16 14 ...
$ G2 : int 6 5 8 14 10 15 12 5 18 15 ...
$ G3 : int 6 6 10 15 10 15 11 6 19 15 ...
Exploring the Last n Rows
Similarly, looking at the last few records of a dataset can provide insights into how the data ends, which can be particularly useful for time-series data or datasets that may have been appended over time. To view the last 12 records of the dataset, you can use the following code snippets in each language:
# View last 12 rows
last(student_math_jl, 12)
12×33 DataFrame
Row │ school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian traveltime studytime failures schoolsup famsup paid activities nursery higher internet romantic famrel freetime goout Dalc Walc health absences G1 G2 G3
│ String3 String1 Int64 String1 String3 String1 Int64 Int64 String15 String15 String15 String7 Int64 Int64 Int64 String3 String3 String3 String3 String3 String3 String3 String3 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ MS M 19 R GT3 T 1 1 other services other mother 2 1 1 no no no no yes yes no no 4 3 2 1 3 5 0 6 5 0
2 │ MS M 18 R GT3 T 4 2 other other home father 2 1 1 no no yes no yes yes no no 5 4 3 4 3 3 14 6 5 5
3 │ MS F 18 R GT3 T 2 2 at_home other other mother 2 3 0 no no yes no yes yes no no 5 3 3 1 3 4 2 10 9 10
4 │ MS F 18 R GT3 T 4 4 teacher at_home reputation mother 3 1 0 no yes yes yes yes yes yes yes 4 4 3 2 2 5 7 6 5 6
5 │ MS F 19 R GT3 T 2 3 services other course mother 1 3 1 no no no yes no yes yes no 5 4 2 1 2 5 0 7 5 0
6 │ MS F 18 U LE3 T 3 1 teacher services course mother 1 2 0 no yes yes no yes yes yes no 4 3 4 1 1 1 0 7 9 8
7 │ MS F 18 U GT3 T 1 1 other other course mother 2 2 1 no no no yes yes yes no no 1 1 1 1 1 5 0 6 5 0
8 │ MS M 20 U LE3 A 2 2 services services course other 1 2 2 no yes yes no yes yes no no 5 5 4 4 5 4 11 9 9 9
9 │ MS M 17 U LE3 T 3 1 services services course mother 2 1 0 no no no no no yes yes no 2 4 5 3 4 2 3 14 16 16
10 │ MS M 21 R GT3 T 1 1 other other course other 1 1 3 no no no no no yes no no 5 5 3 3 3 3 3 10 8 7
11 │ MS M 18 R LE3 T 3 2 services other course mother 3 1 0 no no no no no yes yes no 4 4 1 3 4 5 0 11 12 10
12 │ MS M 19 U LE3 T 1 1 other at_home course father 1 1 0 no no no no yes yes yes no 3 2 3 3 3 5 5 8 9 9
# View last 12 rows
student_math_py.tail(n=12)
school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian traveltime studytime failures schoolsup famsup paid activities nursery higher internet romantic famrel freetime goout Dalc Walc health absences G1 G2 G3
383 MS M 19 R GT3 T 1 1 other services other mother 2 1 1 no no no no yes yes no no 4 3 2 1 3 5 0 6 5 0
384 MS M 18 R GT3 T 4 2 other other home father 2 1 1 no no yes no yes yes no no 5 4 3 4 3 3 14 6 5 5
385 MS F 18 R GT3 T 2 2 at_home other other mother 2 3 0 no no yes no yes yes no no 5 3 3 1 3 4 2 10 9 10
386 MS F 18 R GT3 T 4 4 teacher at_home reputation mother 3 1 0 no yes yes yes yes yes yes yes 4 4 3 2 2 5 7 6 5 6
387 MS F 19 R GT3 T 2 3 services other course mother 1 3 1 no no no yes no yes yes no 5 4 2 1 2 5 0 7 5 0
388 MS F 18 U LE3 T 3 1 teacher services course mother 1 2 0 no yes yes no yes yes yes no 4 3 4 1 1 1 0 7 9 8
389 MS F 18 U GT3 T 1 1 other other course mother 2 2 1 no no no yes yes yes no no 1 1 1 1 1 5 0 6 5 0
390 MS M 20 U LE3 A 2 2 services services course other 1 2 2 no yes yes no yes yes no no 5 5 4 4 5 4 11 9 9 9
391 MS M 17 U LE3 T 3 1 services services course mother 2 1 0 no no no no no yes yes no 2 4 5 3 4 2 3 14 16 16
392 MS M 21 R GT3 T 1 1 other other course other 1 1 3 no no no no no yes no no 5 5 3 3 3 3 3 10 8 7
393 MS M 18 R LE3 T 3 2 services other course mother 3 1 0 no no no no no yes yes no 4 4 1 3 4 5 0 11 12 10
394 MS M 19 U LE3 T 1 1 other at_home course father 1 1 0 no no no no yes yes yes no 3 2 3 3 3 5 5 8 9 9
# View last 12 rows
tail(x=student_math_r, n=12)
school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason guardian traveltime studytime failures schoolsup famsup paid activities nursery higher internet romantic famrel freetime goout Dalc Walc health absences G1 G2 G3
384 MS M 19 R GT3 T 1 1 other services other mother 2 1 1 no no no no yes yes no no 4 3 2 1 3 5 0 6 5 0
385 MS M 18 R GT3 T 4 2 other other home father 2 1 1 no no yes no yes yes no no 5 4 3 4 3 3 14 6 5 5
386 MS F 18 R GT3 T 2 2 at_home other other mother 2 3 0 no no yes no yes yes no no 5 3 3 1 3 4 2 10 9 10
387 MS F 18 R GT3 T 4 4 teacher at_home reputation mother 3 1 0 no yes yes yes yes yes yes yes 4 4 3 2 2 5 7 6 5 6
388 MS F 19 R GT3 T 2 3 services other course mother 1 3 1 no no no yes no yes yes no 5 4 2 1 2 5 0 7 5 0
389 MS F 18 U LE3 T 3 1 teacher services course mother 1 2 0 no yes yes no yes yes yes no 4 3 4 1 1 1 0 7 9 8
390 MS F 18 U GT3 T 1 1 other other course mother 2 2 1 no no no yes yes yes no no 1 1 1 1 1 5 0 6 5 0
391 MS M 20 U LE3 A 2 2 services services course other 1 2 2 no yes yes no yes yes no no 5 5 4 4 5 4 11 9 9 9
392 MS M 17 U LE3 T 3 1 services services course mother 2 1 0 no no no no no yes yes no 2 4 5 3 4 2 3 14 16 16
393 MS M 21 R GT3 T 1 1 other other course other 1 1 3 no no no no no yes no no 5 5 3 3 3 3 3 10 8 7
394 MS M 18 R LE3 T 3 2 services other course mother 3 1 0 no no no no no yes yes no 4 4 1 3 4 5 0 11 12 10
395 MS M 19 U LE3 T 1 1 other at_home course father 1 1 0 no no no no yes yes yes no 3 2 3 3 3 5 5 8 9 9
The Five-Second Insight
In those few moments examining the head and tail, you might discover:
- The first year used letter grades (A, B, C) while recent years use numeric scores (0-100)
- Early entries are missing demographic data that later became mandatory
- The testing date format changed from MM/DD/YYYY to YYYY-MM-DD
- Recent rows have a new “intervention_program” column that doesn’t exist in older data
Each of these discoveries prevents hours of debugging later. That’s the beauty of this particular method sampling: maximum insight with minimum effort.
The Bottom Line
First and last n rows sampling isn’t about statistical rigor; it’s about practical wisdom. Before you stratify, cluster, or systematically sample your way to sophisticated strategies, spend 30 seconds checking both ends of your dataset. Your future self will thank you when you’re not explaining to your team why your “comprehensive analysis” crashed because row 47,293 had a text value in a numeric column.
In data science, sometimes the simplest check is the most powerful one.
Appendix A: Environment, Language & Package Versions, and Coding Style
If you are interested in reproducing this work, here are the versions of Julia, Python, and R that I used (as well as the respective packages for each). Additionally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin22.4.0)
CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 8 virtual cores
Environment:
DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk-21.jdk/Contents/Home/lib/server
using Pkg
Pkg.add(name="CSV", version="0.10.11")
Pkg.add(name="DataFrames", version="1.6.1")
using DataFrames
using CSV
import sys
import platform
import os
import cpuinfo
print(
"Python", sys.version,
"\nOS:", platform.system(), platform.platform(),
"\nCPU:", os.cpu_count(), "x", cpuinfo.get_cpu_info()["brand_raw"]
)
Python 3.11.4 (v3.11.4:d2340ef257, Jun 6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
OS: Darwin macOS-10.16-x86_64-i386-64bit
CPU: 8 x Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
!pip install pandas==2.0.3
import pandas
cat(
R.version$version.string, "-", R.version$nickname,
"\nOS:", Sys.info()["sysname"], R.version$platform,
"\nCPU:", benchmarkme::get_cpu()$no_of_cores, "x", benchmarkme::get_cpu()$model_name
)
R version 4.2.3 (2023-03-15) - Shortstop Beagle
OS: Darwin x86_64-apple-darwin17.0
CPU: 8 x Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
Further Readings
- Cortez, P. (2008). Student Performance [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5TG7T.
- Oehlert, G. W. (2000). A first course in design and analysis of experiments. W. H. Freeman.
- Schindler, P. S. (2018). Business research methods (13th ed.). McGraw Hill.
- Shmueli, G., Patel, N. R., & Bruce, P. C. (2007). Data mining for business intelligence: concepts, techniques, and applications in Microsoft Office Excel with XLMiner. Wiley.