Data-Informed Thinking + Doing
Principal Component Analysis for Dimensionality Reduction
Reducing complexity by finding uncorrelated linear data with maximum variance.
Getting Started
If you are interested in reproducing this work, here are the versions of R, Python, and Julia used (as well as the respective packages for each). Additionally, Leland Wilkinson’s approach to data visualization (Grammar of Graphics) has been adopted for this work. Finally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.
cat(R.version$version.string, R.version$nickname)
R version 4.2.3 (2023-03-15) Shortstop Beagle
require(devtools)
devtools::install_version("tibble", version="3.2.1", repos="http://cran.us.r-project.org")
devtools::install_version("dplyr", version="1.1.2", repos="http://cran.us.r-project.org")
devtools::install_version("ggplot2", version="3.4.2", repos="http://cran.us.r-project.org")
devtools::install_version("cowplot", version="1.1.1", repos="http://cran.us.r-project.org")
devtools::install_version("ggcorrplot", version="0.1.4", repos="http://cran.us.r-project.org")
library(tibble)
library(dplyr)
library(ggplot2)
library(cowplot)
library(ggcorrplot)
import sys
print(sys.version)
3.11.4 (v3.11.4:d2340ef257, Jun 6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
!pip install pandas==2.0.3
!pip install plotnine==0.12.1
import pandas
import plotnine
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin22.4.0)
CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 8 virtual cores
Environment:
DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server
using Pkg
Pkg.add(name="CSV", version="0.10.11")
Pkg.add(name="DataFrames", version="1.5.0")
Pkg.add(name="Colors", version="0.12.10")
Pkg.add(name="Cairo", version="1.0.5")
Pkg.add(name="Gadfly", version="1.4.0")
using DataFrames
using CSV
using Colors
using Cairo
using Gadfly
Importing and Examining Dataset
References
- Shmueli, G., Patel, N. R., & Bruce, P. C. (2007). Data Mining for Business Intelligence. Wiley.
- Albright, S. C., Winston, W. L., & Zappe, C. (2003). Data Analysis for Managers with Microsoft Excel (2nd ed.). South-Western College Publishing.
Applied Advanced Analytics & AI in Sports