Data-Informed Thinking + Doing

Univariate, Bivariate, and Multivariate Analyses on Numerical Data

Exploratory data analysis of the PGA golf dataset—using R, Python, and Julia.

Univariate, bivariate, and multivariate analyses are essential in exploring and understanding numerical data comprehensively. Univariate analysis examines individual variables, revealing their distributions, central tendencies, and variability. Bivariate analysis assesses relationships between two variables, uncovering correlations, associations, or dependencies. Multivariate analysis considers interactions among multiple variables simultaneously, providing a deeper understanding of complex relationships, patterns, and dependencies.

These analyses (commonly known as Exploratory Data Analysis or EDA) enable data-driven decision-making, hypothesis testing, predictive modeling, and insights into various domains like finance, healthcare, and social sciences. By employing these techniques, researchers and analysts can derive valuable insights, uncover hidden patterns, and make informed decisions based on a thorough understanding of the data (not only numerical data, but also categorical data and other data types).

Let’s have some fun and look at this golf dataset.

Getting Started

If you are interested in reproducing this work, here are the versions of R, Python, and Julia used (as well as the respective packages for each). Additionally, Leland Wilkinson’s approach to data visualization (Grammar of Graphics) has been adopted for this work. Finally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.

cat(R.version$version.string, R.version$nickname)
R version 4.2.3 (2023-03-15) Shortstop Beagle
require(devtools)
devtools::install_version("tibble", version="3.2.1", repos="http://cran.us.r-project.org")
devtools::install_version("dplyr", version="1.1.2", repos="http://cran.us.r-project.org")
devtools::install_version("ggplot2", version="3.4.2", repos="http://cran.us.r-project.org")
devtools::install_version("cowplot", version="1.1.1", repos="http://cran.us.r-project.org")
devtools::install_version("ggcorrplot", version="0.1.4", repos="http://cran.us.r-project.org")
library(tibble)
library(dplyr)
library(ggplot2)
library(cowplot)
library(ggcorrplot)
import sys
print(sys.version)
3.11.4 (v3.11.4:d2340ef257, Jun  6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
!pip install pandas==2.0.3
!pip install plotnine==0.12.1
import pandas
import plotnine
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server
using Pkg
Pkg.add(name="CSV", version="0.10.11")
Pkg.add(name="DataFrames", version="1.5.0")
Pkg.add(name="Colors", version="0.12.10")
Pkg.add(name="Cairo", version="1.0.5")
Pkg.add(name="Gadfly", version="1.4.0")
using DataFrames
using CSV
using Colors
using Cairo
using Gadfly

Importing and Examining Dataset

# https://www.kaggle.com/jmpark746/pga-tour-data-2010-2018
pga_r <- read.csv("../../dataset/pga-tour.csv")
str(pga_r)
'data.frame':	2312 obs. of  18 variables:
 $ Player.Name       : chr  "Henrik Stenson" "Ryan Armour" "Chez Reavie" "Ryan Moore" ...
 $ Rounds            : num  60 109 93 78 103 103 93 94 77 50 ...
 $ Fairway.Percentage: num  75.2 73.6 72.2 71.9 71.4 ...
 $ Year              : int  2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
 $ Avg.Distance      : num  292 284 286 289 279 ...
 $ gir               : num  73.5 68.2 68.7 68.8 67.1 ...
 $ Average.Putts     : num  29.9 29.3 29.1 29.2 29.1 ...
 $ Average.Scrambling: num  60.7 60.1 62.3 64.2 59.2 ...
 $ Average.Score     : num  69.6 70.8 70.4 70 71 ...
 $ Points            : chr  "868" "1,006" "1,020" "795" ...
 $ Wins              : num  NA 1 NA NA NA NA NA NA NA NA ...
 $ Top.10            : num  5 3 3 5 3 6 5 5 3 2 ...
 $ Average.SG.Putts  : num  -0.207 -0.058 0.192 -0.271 0.164 0.442 0.037 0.546 0.167 0.389 ...
 $ Average.SG.Total  : num  1.153 0.337 0.674 0.941 0.062 ...
 $ SG.OTT            : num  0.427 -0.012 0.183 0.406 -0.227 -0.166 0.378 0.364 0.093 -0.392 ...
 $ SG.APR            : num  0.96 0.213 0.437 0.532 0.099 0.036 0.298 0.345 0.467 0.179 ...
 $ SG.ARG            : num  -0.027 0.194 -0.137 0.273 0.026 0.253 -0.027 -0.122 -0.186 0.235 ...
 $ Money             : chr  "$2,680,487" "$2,485,203" "$2,700,018" "$1,986,608" ...
# https://www.kaggle.com/jmpark746/pga-tour-data-2010-2018
pga_python = pandas.read_csv("../../dataset/pga-tour.csv")
# https://www.kaggle.com/jmpark746/pga-tour-data-2010-2018
pga_julia = CSV.read("../../dataset/pga-tour.csv", DataFrame);
show(pga_julia, allcols = true)
2312×18 DataFrame
  Row │ Player Name           Rounds     Fairway Percentage  Year   Avg Distance  gir         Average Putts  Average Scrambling  Average Score  Points    Wins       Top 10     Average SG Putts  Average SG Total  SG:OTT       SG:APR       SG:ARG       Money
      │ String31              Float64?   Float64?            Int64  Float64?      Float64?    Float64?       Float64?            Float64?       String7?  Float64?   Float64?   Float64?          Float64?          Float64?     Float64?     Float64?     String15?
──────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    1 │ Henrik Stenson             60.0               75.19   2018         291.5       73.51          29.93               60.67         69.617  868       missing          5.0            -0.207             1.153        0.427        0.96        -0.027  $2,680,487
    2 │ Ryan Armour               109.0               73.58   2018         283.5       68.22          29.31               60.13         70.758  1,006           1.0        3.0            -0.058             0.337       -0.012        0.213        0.194  $2,485,203
    3 │ Chez Reavie                93.0               72.24   2018         286.5       68.67          29.12               62.27         70.432  1,020     missing          3.0             0.192             0.674        0.183        0.437       -0.137  $2,700,018
    4 │ Ryan Moore                 78.0               71.94   2018         289.2       68.8           29.17               64.16         70.015  795       missing          5.0            -0.271             0.941        0.406        0.532        0.273  $1,986,608
    5 │ Brian Stuard              103.0               71.44   2018         278.9       67.12          29.11               59.23         71.038  421       missing          3.0             0.164             0.062       -0.227        0.099        0.026  $1,089,763
    6 │ Brian Gay                 103.0               71.37   2018         282.9       64.52          28.25               63.26         70.28   880       missing          6.0             0.442             0.565       -0.166        0.036        0.253  $2,152,501
    7 │ Kyle Stanley               93.0               71.29   2018         295.7       71.09          29.89               54.8          70.404  1,198     missing          5.0             0.037             0.686        0.378        0.298       -0.027  $3,916,001
    8 │ Emiliano Grillo            94.0               70.16   2018         295.2       68.84          29.04               61.05         70.152  901       missing          5.0             0.546             1.133        0.364        0.345       -0.122  $2,493,163
    9 │ Russell Henley             77.0               70.03   2018         293.0       68.77          29.8                54.33         70.489  569       missing          3.0             0.167             0.541        0.093        0.467       -0.186  $1,516,438
   10 │ Jim Furyk                  50.0               69.91   2018         280.5       63.19          28.73               62.58         70.342  291       missing          2.0             0.389             0.412       -0.392        0.179        0.235  $660,010
   11 │ Steve Wheatcroft           60.0               69.79   2018         288.9       66.57          29.29               61.03         71.631  138       missing          1.0            -0.128            -0.339        0.112       -0.065       -0.258  $309,656
   12 │ Kevin Streelman            94.0               69.11   2018         295.1       71.56          29.67               60.93         70.436  673       missing          5.0            -0.25              0.619        0.439        0.415        0.014  $1,523,642
   13 │ C.T. Pan                  104.0               68.98   2018         292.7       71.2           29.66               56.89         70.457  693       missing          1.0            -0.067             0.478        0.215        0.267        0.063  $1,881,787
   14 │ David Lingmerth            82.0               68.93   2018         285.4       63.03          28.5                58.57         71.043  274       missing    missing               0.229            -0.007        0.006       -0.16        -0.081  $616,758
   15 │ Keegan Bradley             98.0               67.9    2018         299.6       69.18          29.68               56.78         70.303  872       missing          4.0            -0.358             0.793        0.237        0.888        0.026  $4,069,464
   16 │ Rafa Cabrera Bello         75.0               67.85   2018         295.1       70.16          29.47               57.98         69.887  784       missing          4.0             0.273             1.112        0.256        0.487        0.096  $2,449,869
   17 │ Billy Horschel             86.0               67.8    2018         295.4       71.75          29.46               58.03         70.154  960             1.0        3.0             0.392             1.112        0.538        0.352       -0.169  $4,315,200
   18 │ Russell Knox               94.0               67.7    2018         291.7       69.57          29.7                59.43         70.568  585       missing          3.0            -0.088             0.383        0.059        0.263        0.149  $1,424,030
   19 │ Ben Crane                  65.0               67.52   2018         281.1       64.88          28.69               63.0          71.097  267       missing          1.0             0.332             0.176       -0.302       -0.038        0.184  $620,646
   20 │ Vaughn Taylor              83.0               67.51   2018         286.1       67.02          29.15               59.91         70.692  445       missing          3.0            -0.08              0.219       -0.005        0.305       -0.002  $965,691
   21 │ Brian Harman               94.0               67.14   2018         291.9       67.59          29.29               56.95         70.536  1,056     missing          8.0             0.273             0.29         0.137       -0.024       -0.096  $2,733,463
   22 │ Sam Ryder                  82.0               66.91   2018         297.3       72.08          29.88               56.47         70.914  442       missing          3.0            -0.349             0.154        0.203        0.399       -0.099  $1,046,166
   23 │ Ted Potter, Jr.            87.0               66.83   2018         286.0       63.03          28.45               57.51         71.024  744             1.0        1.0             0.074            -0.094       -0.074       -0.2          0.105  $1,976,198
   24 │ Austin Cook               107.0               66.76   2018         292.3       66.51          28.72               62.02         70.469  1,060           1.0        3.0             0.315             0.569        0.12        -0.045        0.179  $2,448,920
   25 │ Tyler Duncan               97.0               66.74   2018         294.4       69.65          30.19               52.76         71.04   457       missing          2.0            -0.566             0.017        0.273        0.476       -0.166  $944,021
   26 │ David Hearn                66.0               66.63   2018         285.1       68.89          29.58               55.65         71.325  315       missing          2.0            -0.127            -0.031       -0.17         0.379       -0.113  $622,383
   27 │ Alex Cejka                 77.0               66.49   2018         286.7       63.77          28.52               64.0          70.675  502       missing          2.0             0.009             0.312       -0.024       -0.169        0.495  $1,198,541
   28 │ Ian Poulter                73.0               66.41   2018         293.6       67.01          28.97               57.11         70.593  1,030           1.0        4.0             0.223             0.85         0.141        0.435        0.051  $2,714,450
   29 │ Joel Dahmen                93.0               66.36   2018         295.8       68.82          29.26               63.31         70.578  676       missing          3.0            -0.277             0.4          0.3          0.381       -0.004  $1,476,838
   30 │ Kevin Kisner               89.0               66.33   2018         290.8       65.38          28.91               60.29         70.729  971       missing          4.0             0.513             0.037       -0.043       -0.316       -0.118  $2,972,285
   31 │ J.J. Henry                 81.0               66.22   2018         291.8       70.52          30.37               54.77         71.372  239       missing          1.0            -0.569            -0.193        0.174        0.34        -0.138  $482,052
   32 │ J.J. Spaun                 85.0               66.18   2018         298.2       69.48          29.63               55.76         70.525  849       missing          4.0            -0.149             0.292        0.286        0.41        -0.255  $1,978,906
   33 │ Justin Rose                70.0               66.02   2018         303.5       69.95          28.67               63.03         68.993  1,991           2.0        8.0             0.424             1.952        0.551        0.526        0.45   $8,130,678
   34 │ Kelly Kraft                94.0               65.84   2018         288.6       63.83          28.92               58.93         71.333  627       missing          3.0             0.075            -0.32        -0.286        0.017       -0.126  $1,496,253
   35 │ Conrad Shindler            60.0               65.77   2018         291.3       66.27          29.14               60.0          71.465  92        missing    missing              -0.166            -0.313        0.114       -0.008       -0.252  $187,399
   36 │ Scott Piercy               84.0               65.68   2018         296.6       69.72          29.86               55.96         70.736  802             1.0        2.0            -0.569             0.29         0.209        0.553        0.097  $1,882,337
   37 │ Adam Hadwin                92.0               65.6    2018         289.4       68.04          29.26               58.59         70.75   638       missing          3.0             0.106             0.596        0.057        0.166        0.267  $1,932,488
   38 │ Ben Silverman              90.0               65.56   2018         290.2       64.95          28.74               57.36         71.281  323       missing          2.0             0.238            -0.206        0.069       -0.385       -0.128  $793,140
   39 │ Satoshi Kodaira            51.0               65.53   2018         293.6       60.3           29.88               51.02         72.182  600             1.0        1.0            -0.645            -1.832        0.027       -0.284       -0.93   $1,471,462
   40 │ Rickie Fowler              74.0               65.33   2018         299.8       69.52          28.99               63.05         69.435  1,302     missing          4.0             0.296             1.275        0.244        0.494        0.242  $4,235,237
   41 │ Xinjun Zhang               81.0               65.33   2018         298.7       67.04          29.67               58.2          71.486  195       missing          1.0            -0.437            -0.734        0.139       -0.254       -0.182  $420,377
   42 │ Kiradech Aphibarnrat       51.0               65.12   2018         294.7       62.2           28.93               55.59         70.629  missing   missing    missing               0.138             0.511        0.485       -0.247        0.135  missing
   43 │ Pat Perez                  82.0               65.09   2018         290.9       67.78          29.29               55.63         70.594  1,116           1.0        4.0             0.115            -0.048       -0.063       -0.138        0.037  $2,962,641
   44 │ Lucas Glover               65.0               65.06   2018         297.7       67.12          29.64               59.83         71.066  324       missing          1.0            -0.186             0.276        0.514       -0.043       -0.008  $789,382
   45 │ Richy Werenski             98.0               65.06   2018         291.8       66.97          29.36               56.49         71.185  498       missing          2.0            -0.109            -0.124        0.077       -0.169        0.077  $1,081,283
   46 │ Hunter Mahan               67.0               65.04   2018         296.9       68.96          29.02               58.81         71.135  234       missing          1.0             0.346             0.209        0.465       -0.149       -0.453  $457,337
  ⋮   │          ⋮                ⋮              ⋮             ⋮         ⋮            ⋮             ⋮                ⋮                 ⋮           ⋮          ⋮          ⋮             ⋮                 ⋮               ⋮            ⋮            ⋮           ⋮
 2268 │ Carlos Franco         missing            missing      2010     missing    missing        missing             missing       missing      92        missing    missing         missing           missing      missing      missing      missing      123,232
 2269 │ Tom Watson            missing            missing      2010     missing    missing        missing             missing       missing      92        missing    missing         missing           missing      missing      missing      missing      149,371
 2270 │ Graeme McDowell       missing            missing      2010     missing    missing        missing             missing       missing      91              1.0        2.0       missing           missing      missing      missing      missing      1,589,337
 2271 │ Matt Weibring         missing            missing      2010     missing    missing        missing             missing       missing      83        missing    missing         missing           missing      missing      missing      missing      128,328
 2272 │ Marco Dawson          missing            missing      2010     missing    missing        missing             missing       missing      82        missing    missing         missing           missing      missing      missing      missing      108,160
 2273 │ Jason Gore            missing            missing      2010     missing    missing        missing             missing       missing      81        missing    missing         missing           missing      missing      missing      missing      77,213
 2274 │ Rich Beem             missing            missing      2010     missing    missing        missing             missing       missing      80        missing    missing         missing           missing      missing      missing      missing      128,877
 2275 │ Frank Lickliter II    missing            missing      2010     missing    missing        missing             missing       missing      67        missing    missing         missing           missing      missing      missing      missing      74,721
 2276 │ Todd Hamilton         missing            missing      2010     missing    missing        missing             missing       missing      64        missing    missing         missing           missing      missing      missing      missing      77,608
 2277 │ Shane Bertsch         missing            missing      2010     missing    missing        missing             missing       missing      53        missing    missing         missing           missing      missing      missing      missing      57,108
 2278 │ Guy Boros             missing            missing      2010     missing    missing        missing             missing       missing      48        missing    missing         missing           missing      missing      missing      missing      54,833
 2279 │ Robert Gamez          missing            missing      2010     missing    missing        missing             missing       missing      47        missing          1.0       missing           missing      missing      missing      missing      101,700
 2280 │ Fred Funk             missing            missing      2010     missing    missing        missing             missing       missing      44        missing    missing         missing           missing      missing      missing      missing      77,803
 2281 │ John Huston           missing            missing      2010     missing    missing        missing             missing       missing      40        missing    missing         missing           missing      missing      missing      missing      69,249
 2282 │ Dicky Pride           missing            missing      2010     missing    missing        missing             missing       missing      39        missing    missing         missing           missing      missing      missing      missing      40,120
 2283 │ Jonathan Kaye         missing            missing      2010     missing    missing        missing             missing       missing      34        missing    missing         missing           missing      missing      missing      missing      38,989
 2284 │ Parker McLachlin      missing            missing      2010     missing    missing        missing             missing       missing      32        missing    missing         missing           missing      missing      missing      missing      53,291
 2285 │ Mark Brooks           missing            missing      2010     missing    missing        missing             missing       missing      29        missing    missing         missing           missing      missing      missing      missing      46,360
 2286 │ Chris Smith           missing            missing      2010     missing    missing        missing             missing       missing      27        missing    missing         missing           missing      missing      missing      missing      23,400
 2287 │ Fran Quinn            missing            missing      2010     missing    missing        missing             missing       missing      26        missing    missing         missing           missing      missing      missing      missing      45,096
 2288 │ Robert Damron         missing            missing      2010     missing    missing        missing             missing       missing      24        missing    missing         missing           missing      missing      missing      missing      17,446
 2289 │ Len Mattiace          missing            missing      2010     missing    missing        missing             missing       missing      22        missing    missing         missing           missing      missing      missing      missing      22,200
 2290 │ Shigeki Maruyama      missing            missing      2010     missing    missing        missing             missing       missing      20        missing    missing         missing           missing      missing      missing      missing      22,440
 2291 │ Michael Clark II      missing            missing      2010     missing    missing        missing             missing       missing      19        missing    missing         missing           missing      missing      missing      missing      21,045
 2292 │ John Morse            missing            missing      2010     missing    missing        missing             missing       missing      19        missing    missing         missing           missing      missing      missing      missing      28,998
 2293 │ Jim Carter            missing            missing      2010     missing    missing        missing             missing       missing      18        missing    missing         missing           missing      missing      missing      missing      29,285
 2294 │ J.L. Lewis            missing            missing      2010     missing    missing        missing             missing       missing      14        missing    missing         missing           missing      missing      missing      missing      12,960
 2295 │ Phil Tataurangi       missing            missing      2010     missing    missing        missing             missing       missing      12        missing    missing         missing           missing      missing      missing      missing      19,449
 2296 │ Tom Byrum             missing            missing      2010     missing    missing        missing             missing       missing      11        missing    missing         missing           missing      missing      missing      missing      13,420
 2297 │ Chris Baryla          missing            missing      2010     missing    missing        missing             missing       missing      9         missing    missing         missing           missing      missing      missing      missing      24,254
 2298 │ Tommy Armour III      missing            missing      2010     missing    missing        missing             missing       missing      6         missing    missing         missing           missing      missing      missing      missing      11,130
 2299 │ Eric Axley            missing            missing      2010     missing    missing        missing             missing       missing      6         missing    missing         missing           missing      missing      missing      missing      24,124
 2300 │ Willie Wood           missing            missing      2010     missing    missing        missing             missing       missing      5         missing    missing         missing           missing      missing      missing      missing      6,540
 2301 │ Robin Freeman         missing            missing      2010     missing    missing        missing             missing       missing      2         missing    missing         missing           missing      missing      missing      missing      13,062
 2302 │ Brad Adamonis         missing            missing      2010     missing    missing        missing             missing       missing      2         missing    missing         missing           missing      missing      missing      missing      missing
 2303 │ Paul Azinger          missing            missing      2010     missing    missing        missing             missing       missing      1         missing    missing         missing           missing      missing      missing      missing      9,486
 2304 │ Spike McRoy           missing            missing      2010     missing    missing        missing             missing       missing      missing   missing    missing         missing           missing      missing      missing      missing      6,840
 2305 │ Jon Rahm              missing            missing      2016     missing    missing        missing             missing       missing      missing   missing    missing         missing           missing      missing      missing      missing      $1,004,035
 2306 │ Byeong Hun An         missing            missing      2016     missing    missing        missing             missing       missing      missing   missing    missing         missing           missing      missing      missing      missing      $926,797
 2307 │ Joey Snyder III       missing            missing      2012     missing    missing        missing             missing       missing      missing   missing    missing         missing           missing      missing      missing      missing      $112,800
 2308 │ Carl Paulson          missing            missing      2012     missing    missing        missing             missing       missing      missing   missing    missing         missing           missing      missing      missing      missing      $16,943
 2309 │ Peter Tomasulo        missing            missing      2012     missing    missing        missing             missing       missing      missing   missing    missing         missing           missing      missing      missing      missing      $12,827
 2310 │ Marc Turnesa          missing            missing      2010     missing    missing        missing             missing       missing      missing   missing    missing         missing           missing      missing      missing      missing      10,159
 2311 │ Jesper Parnevik       missing            missing      2010     missing    missing        missing             missing       missing      missing   missing    missing         missing           missing      missing      missing      missing      9,165
 2312 │ Jim Gallagher, Jr.    missing            missing      2010     missing    missing        missing             missing       missing      missing   missing    missing         missing           missing      missing      missing      missing      6,552
                                                                                                                                                                                                                                                     2221 rows omitted

Wrangling Data

str(pga_clean_r)
'data.frame':	2312 obs. of  18 variables:
 $ player_name   : Factor w/ 526 levels "Aaron Baddeley",..: 196 413 108 416 75 73 293 159 411 234 ...
 $ rounds        : int  60 109 93 78 103 103 93 94 77 50 ...
 $ fairway_pct   : num  75.2 73.6 72.2 71.9 71.4 ...
 $ year          : int  2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
 $ avg_distance  : int  291 283 286 289 278 282 295 295 293 280 ...
 $ gir           : num  73.5 68.2 68.7 68.8 67.1 ...
 $ avg_putts     : num  29.9 29.3 29.1 29.2 29.1 ...
 $ avg_scrambling: num  60.7 60.1 62.3 64.2 59.2 ...
 $ avg_score     : num  69.6 70.8 70.4 70 71 ...
 $ points        : num  868 1006 1020 795 421 ...
 $ wins          : num  0 1 0 0 0 0 0 0 0 0 ...
 $ top_10        : num  5 3 3 5 3 6 5 5 3 2 ...
 $ avg_sg_putts  : num  -0.207 -0.058 0.192 -0.271 0.164 0.442 0.037 0.546 0.167 0.389 ...
 $ avg_sg_total  : num  1.153 0.337 0.674 0.941 0.062 ...
 $ sg_ott        : num  0.427 -0.012 0.183 0.406 -0.227 -0.166 0.378 0.364 0.093 -0.392 ...
 $ sg_apr        : num  0.96 0.213 0.437 0.532 0.099 0.036 0.298 0.345 0.467 0.179 ...
 $ sg_arg        : num  -0.027 0.194 -0.137 0.273 0.026 0.253 -0.027 -0.122 -0.186 0.235 ...
 $ money         : num  2680487 2485203 2700018 1986608 1089763 ...

Univariate Analysis

univariate_box_and_whisker_plot_rounds_r <- ggplot2::ggplot(pga_clean_r, aes(x=rounds)) +
    geom_boxplot() +
    xlim(40, 120) +
    theme_michaelmallari_r() +
    theme(
        panel.grid.major.y=element_blank(),
        axis.line.x.bottom=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.x=element_blank(),
        axis.title.x=element_blank(),
        axis.line.y=element_blank(),
        axis.ticks.y=element_blank(),
        axis.text.y=element_blank(),
        axis.title.y=element_blank()
    )

univariate_histogram_rounds_r <- ggplot2::ggplot(pga_clean_r, aes(x=rounds)) +
    geom_histogram() +
    xlim(40, 120) +
    scale_y_continuous(expand=c(0, 0), position="right") +  # Scale
    labs(
        x="Rounds",
        y=NULL,
        caption="Data Source: https://www.kaggle.com/jmpark746/pga-tour-data-2010-2018"
    ) +
    theme_michaelmallari_r()

cowplot::plot_grid(
    univariate_box_and_whisker_plot_rounds_r,
    univariate_histogram_rounds_r, 
    ncol=1,
    rel_heights=c(0.2, 1),
    align="v",
    axis="lr"
)

Bivariate Analysis

bivariate_scatterplot_points_money_r <- ggplot2::ggplot(pga_clean_r, aes(x=points, y=money)) +  # Data, aesthetics
    geom_point(color=palette_michaelmallari_r[19], alpha=0.3) +  # Geometric object
    geom_smooth(method=lm, colour=palette_michaelmallari_r[2]) +  # Geometric object
    scale_y_continuous(expand=c(0, 0), position="right") +  # Scale
    labs(
        title="TikTok Not a One-Trick Pony",
        alt="TikTok Not a One-Trick Pony",
        subtitle="Prize money ($) based on points scored, n = 2,312",
        x="Points Scored",
        y=NULL,
        caption="Data Source: https://www.kaggle.com/jmpark746/pga-tour-data-2010-2018"
    ) +
    theme_michaelmallari_r()
    
bivariate_scatterplot_points_money_r

Multivariate Analysis

Correlation Matrix

correlation_pearson_r <- cor(
    subset(pga_clean_r, select=-c(player_name, year)),
    method="pearson"
)
correlation_pearson_r
               rounds fairway_pct avg_distance gir avg_putts avg_scrambling avg_score points wins top_10 avg_sg_putts avg_sg_total sg_ott sg_apr sg_arg money
rounds              1          NA           NA  NA        NA             NA        NA     NA   NA     NA           NA           NA     NA     NA     NA    NA
fairway_pct        NA           1           NA  NA        NA             NA        NA     NA   NA     NA           NA           NA     NA     NA     NA    NA
avg_distance       NA          NA            1  NA        NA             NA        NA     NA   NA     NA           NA           NA     NA     NA     NA    NA
gir                NA          NA           NA   1        NA             NA        NA     NA   NA     NA           NA           NA     NA     NA     NA    NA
avg_putts          NA          NA           NA  NA         1             NA        NA     NA   NA     NA           NA           NA     NA     NA     NA    NA
avg_scrambling     NA          NA           NA  NA        NA              1        NA     NA   NA     NA           NA           NA     NA     NA     NA    NA
avg_score          NA          NA           NA  NA        NA             NA         1     NA   NA     NA           NA           NA     NA     NA     NA    NA
points             NA          NA           NA  NA        NA             NA        NA   1.00 0.72     NA           NA           NA     NA     NA     NA  0.96
wins               NA          NA           NA  NA        NA             NA        NA   0.72 1.00     NA           NA           NA     NA     NA     NA  0.72
top_10             NA          NA           NA  NA        NA             NA        NA     NA   NA      1           NA           NA     NA     NA     NA    NA
avg_sg_putts       NA          NA           NA  NA        NA             NA        NA     NA   NA     NA            1           NA     NA     NA     NA    NA
avg_sg_total       NA          NA           NA  NA        NA             NA        NA     NA   NA     NA           NA            1     NA     NA     NA    NA
sg_ott             NA          NA           NA  NA        NA             NA        NA     NA   NA     NA           NA           NA      1     NA     NA    NA
sg_apr             NA          NA           NA  NA        NA             NA        NA     NA   NA     NA           NA           NA     NA      1     NA    NA
sg_arg             NA          NA           NA  NA        NA             NA        NA     NA   NA     NA           NA           NA     NA     NA      1    NA
money              NA          NA           NA  NA        NA             NA        NA   0.96 0.72     NA           NA           NA     NA     NA     NA  1.00
multivariate_correlation_matrix_r <- ggcorrplot::ggcorrplot(
    corr=correlation_pearson_r,
    method="square",
    type="full",
    show.legend=TRUE,
    legend.title="Pearson Correlation (r)",
    colors=c(palette_michaelmallari_r[3], palette_michaelmallari_r[1], palette_michaelmallari_r[2]),
    lab=TRUE,
    lab_size=2,
    digits=2
)
multivariate_correlation_matrix_r

Bubble Plot

relationship_bubble_plot_win_money_points_r <- ggplot2::ggplot(pga_clean_r, aes(x=points, y=money)) +  # Data, aesthetics
    geom_point(aes(size=wins), color=palette_michaelmallari_r[19], alpha=0.3) +  # Geometric object, aesthetic
    geom_smooth(method=lm, colour=palette_michaelmallari_r[2]) +  # Geometric object
    scale_y_continuous(expand=c(0, 0), position="right") +  # Scale
    labs(
        title="Separation From the Pack With 1750+ Points",
        alt="Separation From the Pack With 1750+ Points",
        subtitle="PGA prize money ($) based on points scored and wins, n = 2,312",
        x="Points Scored",
        y=NULL,
        size="Wins",
        caption="Source: https://www.kaggle.com/jmpark746/pga-tour-data-2010-2018"
    ) +
    theme_michaelmallari_r()
    
relationship_bubble_plot_win_money_points_r


References

Applied Advanced Analytics & AI in Sports