Data-Informed Thinking + Doing

Exploratory Data Analysis on Categorical Data

Univariate, bivariate, and multivariate analyses on the Marvel Comics dataset—using R, Python, and Julia.

When I was 13, my younger brother and I collected the 1990 Marvel Universe (series 1) cards. Ten years later, when I was an interface engineer at KPE (a top interactive agency in the late 1990s/early 2000s based in Silicon Alley), I had the priviledge to work on a Wolverine interactive Flash/ActionScript game—a Marvel Comics advergaming project, in conjunction with the release of the first X-Men movie (2000).

Today, with access to FiveThirtyEight’s Marvel Universe dataset, we’ll perform exploratory data anaysis (or EDA), as a continuation of my previous post (on Univariate, Bivariate, and Multivariate Analyses on Numerical Data).

Before we do (and to recap my previous post), why is EDA important? Univariate analysis allows us to explore the distribution and frequencies of individual categorical variables, revealing the prevalence of different categories. Bivariate analysis helps us examine relationships and associations between two categorical variables, enabling comparisons and identifying patterns. Multivariate analysis extends this further by considering interactions and dependencies among multiple categorical variables simultaneously.

EDA provides valuable insights into the composition, relationships, and dependencies within categorical data, allowing us to make informed decisions, identify trends, and gain a deeper understanding of various phenomena across domains like marketing, social sciences, and customer segmentation.

Getting Started

If you are interested in reproducing this work, here are the versions of R, Python, and Julia used (as well as the respective packages for each). Additionally, Leland Wilkinson’s approach to data visualization (Grammar of Graphics) has been adopted for this work. Finally, my coding style here is verbose, in order to trace back where functions/methods and variables are originating from, and make this a learning experience for everyone—including me.

cat(R.version$version.string, R.version$nickname)
R version 4.2.3 (2023-03-15) Shortstop Beagle
require(devtools)
devtools::install_version("dplyr", version="1.1.2", repos="http://cran.us.r-project.org")
devtools::install_version("ggplot2", version="3.4.2", repos="http://cran.us.r-project.org")
devtools::install_github("davidsjoberg/ggstream", dependencies=FALSE)
library(dplyr)
library(ggplot2)
library(ggstream)
import sys
print(sys.version)
3.11.4 (v3.11.4:d2340ef257, Jun  6 2023, 19:15:51) [Clang 13.0.0 (clang-1300.0.29.30)]
!pip install numpy==1.25.1
!pip install pandas==2.0.3
!pip install plotnine==0.12.2
import numpy
import pandas
import plotnine
using InteractiveUtils
InteractiveUtils.versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 8 virtual cores
Environment:
  DYLD_FALLBACK_LIBRARY_PATH = /Library/Frameworks/R.framework/Resources/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home/jre/lib/server
using Pkg
Pkg.add(name="CSV", version="0.10.11")
Pkg.add(name="DataFrames", version="1.6.1")
Pkg.add(name="CategoricalArrays", version="0.10.8")
Pkg.add(name="Colors", version="0.12.10")
Pkg.add(name="Cairo", version="1.0.5")
Pkg.add(name="Gadfly", version="1.3.4")
using DataFrames
using CSV
using CategoricalArrays
using Colors
using Cairo
using Gadfly

Importing, Wrangling, and Examining Data

As we can see from the data types below, most of the variables are categorical—some ordinal, and others non-ordinal (or nominal).

For the analyses to follow, I won’t need page_id, urlslug—those can be removed. The name of the character should appropriately be converted to a string data type. FIRST.APPEARANCE should be appropriately converted to a timestamp object. Any ordinal, categorical variable should be defined in the correct order. And finally, the variable names should follow a consistent naming convention (lower case with words separated by an underscore).

After data wrangling, here is what a clean data looks like:

str(object=marvel_clean_r)
'data.frame':	16376 obs. of  11 variables:
 $ name                  : chr  "Spider-Man (Peter Parker)" "Captain America (Steven Rogers)" "Wolverine (James \\\"Logan\\\" Howlett)" "Iron Man (Anthony \\\"Tony\\\" Stark)" ...
 $ identity              : Factor w/ 5 levels "Unknown","Known to Authorities",..: 5 4 4 4 3 4 4 4 4 4 ...
 $ align                 : Ord.factor w/ 5 levels "Bad"<"Reformed Criminal"<..: 5 5 4 5 5 5 5 5 4 5 ...
 $ eye                   : Factor w/ 25 levels "Unknown","Amber",..: 11 5 5 5 5 5 6 6 6 5 ...
 $ hair                  : Factor w/ 26 levels "Unknown","Auburn Hair",..: 8 25 4 4 5 15 8 8 8 5 ...
 $ gender                : Factor w/ 5 levels "Unknown","Agender",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ gender_sexual_minority: Factor w/ 7 levels "Non-GSM","Bisexual",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ living_status         : Factor w/ 3 levels "Unknown","Deceased",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ appearances           : int  4043 3360 3061 2961 2258 2255 2072 2017 1955 1934 ...
 $ first_appearance_month: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
 $ first_appearance_year : int  1962 1941 1974 1963 1950 1961 1961 1962 1963 1961 ...
head(x=marvel_clean_r, n=8)
                                   name identity   align   eye    hair gender gender_sexual_minority living_status appearances first_appearance_month first_appearance_year
1             Spider-Man (Peter Parker)   Secret    Good Hazel   Brown   Male                Non-GSM         Alive        4043                                         1962
2       Captain America (Steven Rogers)   Public    Good  Blue   White   Male                Non-GSM         Alive        3360                                         1941
3 Wolverine (James \\"Logan\\" Howlett)   Public Neutral  Blue   Black   Male                Non-GSM         Alive        3061                                         1974
4   Iron Man (Anthony \\"Tony\\" Stark)   Public    Good  Blue   Black   Male                Non-GSM         Alive        2961                                         1963
5                   Thor (Thor Odinson)  No Dual    Good  Blue   Blond   Male                Non-GSM         Alive        2258                                         1950
6            Benjamin Grimm (Earth-616)   Public    Good  Blue No Hair   Male                Non-GSM         Alive        2255                                         1961
7             Reed Richards (Earth-616)   Public    Good Brown   Brown   Male                Non-GSM         Alive        2072                                         1961
8            Hulk (Robert Bruce Banner)   Public    Good Brown   Brown   Male                Non-GSM         Alive        2017                                         1962
tail(x=marvel_clean_r, n=8)
                                      name identity   align     eye    hair  gender gender_sexual_minority living_status appearances first_appearance_month first_appearance_year
16369 Marcy (Offer's employee) (Earth-616)   Public Neutral Unknown   Brown  Female                Non-GSM         Alive          NA                                           NA
16370           Melanie Kapoor (Earth-616)   Public    Good    Blue   Black  Female                Non-GSM         Alive          NA                                           NA
16371         Phoenix's Shadow (Earth-616)  Unknown Neutral Unknown Unknown Unknown                Non-GSM         Alive          NA                                           NA
16372                   Ru'ach (Earth-616)  No Dual     Bad   Green No Hair    Male                Non-GSM         Alive          NA                                           NA
16373      Thane (Thanos' son) (Earth-616)  No Dual    Good    Blue    Bald    Male                Non-GSM         Alive          NA                                           NA
16374        Tinkerer (Skrull) (Earth-616)   Secret     Bad   Black    Bald    Male                Non-GSM         Alive          NA                                           NA
16375       TK421 (Spiderling) (Earth-616)   Secret Neutral Unknown Unknown    Male                Non-GSM         Alive          NA                                           NA
16376                Yologarch (Earth-616)  Unknown     Bad Unknown Unknown Unknown                Non-GSM         Alive          NA                                           NA
levels(marvel_clean_r$identity)
[1] "Unknown"              "Known to Authorities" "No Dual"              "Public"               "Secret"              
levels(marvel_clean_r$align)
[1] "Bad"               "Reformed Criminal" "Unknown"           "Neutral"           "Good"             
levels(marvel_clean_r$eye)
 [1] "Unknown"         "Amber"           "Black Eyeballs"  "Black"           "Blue"            "Brown"           "Compound Eyes"   "Gold"            "Green"           "Grey"            "Hazel"           "Magenta Eyes"    "Multiple Eyes"   "No Eyes"         "One Eye"         "Orange"          "Pink"            "Purple"          "Red"             "Silver Eyes"     "Variable Eyes"   "Violet"          "White"           "Yellow Eyeballs" "Yellow"         
levels(marvel_clean_r$hair)
 [1] "Unknown"            "Auburn Hair"        "Bald"               "Black"              "Blond"              "Blue"               "Bronze Hair"        "Brown"              "Dyed Hair"          "Gold"               "Green"              "Grey"               "Light Brown Hair"   "Magenta Hair"       "No Hair"            "Orange"             "Orange-brown Hair"  "Pink"               "Purple"             "Red"                "Reddish Blond Hair" "Silver"             "Strawberry Blond"   "Variable Hair"      "White"              "Yellow Hair"       
levels(marvel_clean_r$sex)
NULL
levels(marvel_clean_r$gender_sexual_minority)
[1] "Non-GSM"       "Bisexual"      "Gender Fluid"  "Homosexual"    "Pansexual"     "Transgender"   "Transvestites"
levels(marvel_clean_r$living_status)
[1] "Unknown"  "Deceased" "Alive"   
marvel_clean_py.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16376 entries, 0 to 16375
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   name                     16376 non-null  object 
 1   url_slug                 16376 non-null  object 
 2   id                       12606 non-null  object 
 3   align                    13564 non-null  object 
 4   eye                      6609 non-null   object 
 5   hair                     12112 non-null  object 
 6   gender                   15522 non-null  object 
 7   gender_sexual_minority   90 non-null     object 
 8   living_status            16373 non-null  object 
 9   appearances              15280 non-null  float64
 10  first_appearance_month   15561 non-null  object 
 11  first_appearance_year    15561 non-null  float64
dtypes: float64(2), object(10)
memory usage: 1.5+ MB
marvel_clean_py.head(n=8)
                                  name                                 url_slug                id               align         eye        hair           gender gender_sexual_minority       living_status  appearances first_appearance_month  first_appearance_year
0            Spider-Man (Peter Parker)              \/Spider-Man_(Peter_Parker)   Secret Identity     Good Characters  Hazel Eyes  Brown Hair  Male Characters                     NaN  Living Characters       4043.0                 Aug-62                 1962.0
1      Captain America (Steven Rogers)        \/Captain_America_(Steven_Rogers)   Public Identity     Good Characters   Blue Eyes  White Hair  Male Characters                     NaN  Living Characters       3360.0                 Mar-41                 1941.0
2  Wolverine (James \"Logan\" Howlett)  \/Wolverine_(James_%22Logan%22_Howlett)   Public Identity  Neutral Characters   Blue Eyes  Black Hair  Male Characters                     NaN  Living Characters       3061.0                 Oct-74                 1974.0
3    Iron Man (Anthony \"Tony\" Stark)    \/Iron_Man_(Anthony_%22Tony%22_Stark)   Public Identity     Good Characters   Blue Eyes  Black Hair  Male Characters                     NaN  Living Characters       2961.0                 Mar-63                 1963.0
4                  Thor (Thor Odinson)                    \/Thor_(Thor_Odinson)  No Dual Identity     Good Characters   Blue Eyes  Blond Hair  Male Characters                     NaN  Living Characters       2258.0                 Nov-50                 1950.0
5           Benjamin Grimm (Earth-616)             \/Benjamin_Grimm_(Earth-616)   Public Identity     Good Characters   Blue Eyes     No Hair  Male Characters                     NaN  Living Characters       2255.0                 Nov-61                 1961.0
6            Reed Richards (Earth-616)              \/Reed_Richards_(Earth-616)   Public Identity     Good Characters  Brown Eyes  Brown Hair  Male Characters                     NaN  Living Characters       2072.0                 Nov-61                 1961.0
7           Hulk (Robert Bruce Banner)             \/Hulk_(Robert_Bruce_Banner)   Public Identity     Good Characters  Brown Eyes  Brown Hair  Male Characters                     NaN  Living Characters       2017.0                 May-62                 1962.0
marvel_clean_py.tail(n=8)
                                       name                                  url_slug                id               align         eye        hair             gender gender_sexual_minority       living_status  appearances first_appearance_month  first_appearance_year
16368  Marcy (Offer's employee) (Earth-616)  \/Marcy_(Offer%27s_employee)_(Earth-616)   Public Identity  Neutral Characters         NaN  Brown Hair  Female Characters                     NaN  Living Characters          NaN                    NaN                    NaN
16369            Melanie Kapoor (Earth-616)              \/Melanie_Kapoor_(Earth-616)   Public Identity     Good Characters   Blue Eyes  Black Hair  Female Characters                     NaN  Living Characters          NaN                    NaN                    NaN
16370          Phoenix's Shadow (Earth-616)          \/Phoenix%27s_Shadow_(Earth-616)               NaN  Neutral Characters         NaN         NaN                NaN                     NaN  Living Characters          NaN                    NaN                    NaN
16371                    Ru'ach (Earth-616)                    \/Ru%27ach_(Earth-616)  No Dual Identity      Bad Characters  Green Eyes     No Hair    Male Characters                     NaN  Living Characters          NaN                    NaN                    NaN
16372       Thane (Thanos' son) (Earth-616)       \/Thane_(Thanos%27_son)_(Earth-616)  No Dual Identity     Good Characters   Blue Eyes        Bald    Male Characters                     NaN  Living Characters          NaN                    NaN                    NaN
16373         Tinkerer (Skrull) (Earth-616)           \/Tinkerer_(Skrull)_(Earth-616)   Secret Identity      Bad Characters  Black Eyes        Bald    Male Characters                     NaN  Living Characters          NaN                    NaN                    NaN
16374        TK421 (Spiderling) (Earth-616)          \/TK421_(Spiderling)_(Earth-616)   Secret Identity  Neutral Characters         NaN         NaN    Male Characters                     NaN  Living Characters          NaN                    NaN                    NaN
16375                 Yologarch (Earth-616)                   \/Yologarch_(Earth-616)               NaN      Bad Characters         NaN         NaN                NaN                     NaN  Living Characters          NaN                    NaN                    NaN
marvel_clean_jl
16376×11 DataFrame
   Row │ name                               identity  align    eye         hair        gender   gender_sexual_minority  living_status  appearances  first_appearance_month  first_appearance_year
       │ String                             Cat…?     Cat…?    Cat…?       Cat…?       Cat…?    Cat…?                   Cat…?          Int64?       String7?                Int64?
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     1 │ Spider-Man (Peter Parker)          Secret    Good     Hazel Eyes  Brown Hair  Male     Non-GSM                 Living                4043  Aug-62                                   1962
     2 │ Captain America (Steven Rogers)    Public    Good     Blue Eyes   White Hair  Male     Non-GSM                 Living                3360  Mar-41                                   1941
     3 │ Wolverine (James \\"Logan\\" How…  Public    Neutral  Blue Eyes   Black Hair  Male     Non-GSM                 Living                3061  Oct-74                                   1974
     4 │ Iron Man (Anthony \\"Tony\\" Sta…  Public    Good     Blue Eyes   Black Hair  Male     Non-GSM                 Living                2961  Mar-63                                   1963
     5 │ Thor (Thor Odinson)                No Dual   Good     Blue Eyes   Blond Hair  Male     Non-GSM                 Living                2258  Nov-50                                   1950
     6 │ Benjamin Grimm (Earth-616)         Public    Good     Blue Eyes   No Hair     Male     Non-GSM                 Living                2255  Nov-61                                   1961
     7 │ Reed Richards (Earth-616)          Public    Good     Brown Eyes  Brown Hair  Male     Non-GSM                 Living                2072  Nov-61                                   1961
     8 │ Hulk (Robert Bruce Banner)         Public    Good     Brown Eyes  Brown Hair  Male     Non-GSM                 Living                2017  May-62                                   1962
   ⋮   │                 ⋮                     ⋮         ⋮         ⋮           ⋮          ⋮               ⋮                   ⋮             ⋮                 ⋮                       ⋮
 16370 │ Melanie Kapoor (Earth-616)         Public    Good     Blue Eyes   Black Hair  Female   Non-GSM                 Living             missing  missing                               missing
 16371 │ Phoenix's Shadow (Earth-616)       Unknown   Neutral  Unknown     Unknown     Unknown  Non-GSM                 Living             missing  missing                               missing
 16372 │ Ru'ach (Earth-616)                 No Dual   Bad      Green Eyes  No Hair     Male     Non-GSM                 Living             missing  missing                               missing
 16373 │ Thane (Thanos' son) (Earth-616)    No Dual   Good     Blue Eyes   Bald        Male     Non-GSM                 Living             missing  missing                               missing
 16374 │ Tinkerer (Skrull) (Earth-616)      Secret    Bad      Black Eyes  Bald        Male     Non-GSM                 Living             missing  missing                               missing
 16375 │ TK421 (Spiderling) (Earth-616)     Secret    Neutral  Unknown     Unknown     Male     Non-GSM                 Living             missing  missing                               missing
 16376 │ Yologarch (Earth-616)              Unknown   Bad      Unknown     Unknown     Unknown  Non-GSM                 Living             missing  missing                               missing
                                                                                                                                                                                16361 rows omitted

Summary Statistics

summary(marvel_clean_r)
     name                           identity                  align           eye            hair               gender        gender_sexual_minority  living_status    appearances   first_appearance_month first_appearance_year
 Length:16376       Unknown             :3770   Bad              :6720   Unknown:9767   Unknown:4264   Unknown     :  854   Non-GSM      :16286      Unknown :    3   Min.   :   1   :16376                 Min.   :1939         
 Class :character   Known to Authorities:  15   Reformed Criminal:   0   Blue   :1962   Black  :3755   Agender     :   45   Bisexual     :   19      Deceased: 3765   1st Qu.:   1                          1st Qu.:1974         
 Mode  :character   No Dual             :1788   Unknown          :2812   Brown  :1924   Brown  :2339   Female      : 3837   Gender Fluid :    1      Alive   :12608   Median :   3                          Median :1990         
                    Public              :4528   Neutral          :2208   Green  : 613   Blond  :1582   Gender Fluid:    2   Homosexual   :   66                       Mean   :  17                          Mean   :1985         
                    Secret              :6275   Good             :4636   Black  : 555   No Hair:1176   Male        :11638   Pansexual    :    1                       3rd Qu.:   8                          3rd Qu.:2000         
                                                                         Red    : 508   Bald   : 838                        Transgender  :    2                       Max.   :4043                          Max.   :2013         
                                                                         (Other):1047   (Other):2422                        Transvestites:    1                       NA's   :1096                          NA's   :815          
marvel_clean_py.describe()
        appearances  first_appearance_year
count  15280.000000           15561.000000
mean      17.033377            1984.951803
std       96.372959              19.663571
min        1.000000            1939.000000
25%        1.000000            1974.000000
50%        3.000000            1990.000000
75%        8.000000            2000.000000
max     4043.000000            2013.000000
describe(marvel_clean_jl)
11×7 DataFrame
 Row │ variable                mean     min                   median  max                            nmissing  eltype
     │ Symbol                  Union…   Any                   Union…  Any                            Int64     Type
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ name                             'Spinner (Earth-616)          \\u00c4kr\\u00e4s (Earth-616)         0  String
   2 │ identity                         Unknown                       Secret                                0  Union{Missing, CategoricalValue{…
   3 │ align                            Bad                           Good                                  0  Union{Missing, CategoricalValue{…
   4 │ eye                              Amber Eyes                    Unknown                               0  Union{Missing, CategoricalValue{…
   5 │ hair                             Auburn Hair                   Unknown                               0  Union{Missing, CategoricalValue{…
   6 │ gender                           Agender                       Unknown                               0  Union{Missing, CategoricalValue{…
   7 │ gender_sexual_minority           Bisexual                      Non-GSM                               0  Union{Missing, CategoricalValue{…
   8 │ living_status                    Deceased                      Unknown                               0  Union{Missing, CategoricalValue{…
   9 │ appearances             17.0334  1                     3.0     4043                               1096  Union{Missing, Int64}
  10 │ first_appearance_month           Apr-00                        Sep-99                              815  Union{Missing, String7}
  11 │ first_appearance_year   1984.95  1939                  1990.0  2013                                815  Union{Missing, Int64}

Univariate Analysis

# Categorical distribution via bar chart
univariate_bar_gender_r <- ggplot2::ggplot(data=marvel_clean_r, aes(x=gender, y=(..count..))) +
    ggplot2::geom_bar(
        stat="count",
        colour=palette_michaelmallari_r[19],
        fill=palette_michaelmallari_r[19]
    ) +
    ggplot2::geom_text(
        aes(label=..count..),
        stat="count",
        vjust=1.5,
        colour=palette_michaelmallari_r[1]
    ) +
    ggplot2::scale_x_discrete(limits=c("Male", "Female", "Unknown", "Agender", "Gender Fluid")) +
    ggplot2::scale_y_continuous(expand=c(0, 0), position="right") +  # Scale
    ggplot2::guides(fill=guide_legend(reverse=TRUE)) +
    ggplot2::labs(
        title="Male-Dominated Characters",
        alt="Male-Dominated Characters",
        subtitle="Distribution by gender, Marvel Universe characters = 16,376",
        x=NULL,
        y=NULL,
        fill=NULL,
        caption="Source: FiveThirtyEight"
    ) +
    theme_michaelmallari_r()

univariate_bar_gender_r

Summary Statistics

summary(object=marvel_clean_r$gender)
     Unknown      Agender       Female Gender Fluid         Male 
         854           45         3837            2        11638 
univariate_bar_gender_py = (
    plotnine.ggplot(marvel_clean_py, plotnine.aes(x="gender"))
        + plotnine.geom_bar()
)

univariate_bar_gender_py
<Figure Size: (1280 x 960)>

univariate_bar_gender_jl = Gadfly.plot(
    marvel_clean_jl,
    x=:gender,
    Gadfly.Geom.bar(),
    Gadfly.Scale.x_discrete,
    Gadfly.Scale.y_continuous(format=:plain),
    Gadfly.Guide.title("Male-Dominated Characters")
);

Bivariate Analysis

# Comparing categories via stacked bar
multivariate_stacked_bar_time_gender_align_r <- ggplot2::ggplot(data=marvel_clean_r, aes(x=gender, y=(..count..), fill=align)) +
    ggplot2::geom_bar(position="stack", stat="count") +
    ggplot2::scale_x_discrete(limits=c("Male", "Female", "Unknown", "Agender", "Gender Fluid")) +
    ggplot2::scale_y_continuous(expand=c(0, 0), position="right") +  # Scale
    ggplot2::scale_fill_manual(values=c("Good"=palette_michaelmallari_r[19], "Neutral"=palette_michaelmallari_r[20], "Unknown"=palette_michaelmallari_r[21], "Bad"=palette_michaelmallari_r[2])) +
    ggplot2::guides(fill=guide_legend(reverse=TRUE)) +
    ggplot2::labs(
        title="Male-Dominated Characters (With Roughly Half as Villains)",
        alt="Male-Dominated Characters (With Roughly Half as Villains)",
        subtitle="Count of alignment by gender, Marvel Universe characters = 16,376",
        x=NULL,
        y=NULL,
        fill=NULL,
        caption="Source: FiveThirtyEight"
    ) +
    theme_michaelmallari_r()

multivariate_stacked_bar_time_gender_align_r

2-Way Contingency Table

# Comparing categories via 2-way contingency table
table(marvel_clean_r$gender, marvel_clean_r$align)
              
                Bad Reformed Criminal Unknown Neutral Good
  Unknown       386                 0     232     114  122
  Agender        20                 0       2      13   10
  Female        976                 0     684     640 1537
  Gender Fluid    0                 0       0       1    1
  Male         5338                 0    1894    1440 2966

Multivariate Analysis

multivariate_line_chart_time_align_count_r <- ggplot2::ggplot(data=marvel_clean_r, aes(x=first_appearance_year, y=(..count..), color=align)) +
    ggplot2::geom_line(stat="count") +
    ggplot2::scale_y_continuous(expand=c(0, 0), position="right") +  # Scale
    ggplot2::scale_color_manual(values=c("Good"=palette_michaelmallari_r[19], "Neutral"=palette_michaelmallari_r[20], "Unknown"=palette_michaelmallari_r[21], "Bad"=palette_michaelmallari_r[2])) +
    ggplot2::guides(color=guide_legend(reverse=TRUE)) +
    ggplot2::labs(
        title="Disproportionate Rise of Villains in the 1990s",
        alt="Disproportionate Rise of Villains in the 1990s",
        subtitle="First appearances between 1939 and 2013, Marvel Universe characters = 16,376",
        x=NULL,
        y=NULL,
        color=NULL,
        caption="Source: FiveThirtyEight"
    ) +
    theme_michaelmallari_r()

multivariate_line_chart_time_align_count_r

marvel_mean_appearances_align_first_year_r <- marvel_clean_r %>%
    group_by(align, first_appearance_year) %>%
    summarize(
        appearances_mean=mean(appearances)
    )

marvel_mean_appearances_align_first_year_r$appearances_mean[is.na(marvel_mean_appearances_align_first_year_r$appearances_mean)] <- 0

multivariate_streamgraph_time_appearances_align_r <- ggplot2::ggplot(
    data=marvel_mean_appearances_align_first_year_r,
    aes(x=first_appearance_year, y=appearances_mean, fill=align)) +
        ggstream::geom_stream() +
        ggstream::geom_stream_label(aes(label=align)) +
        ggplot2::scale_y_continuous(expand=c(0, 0), position="right") +  # Scale
        ggplot2::scale_fill_manual(values=c("Good"=palette_michaelmallari_r[19], "Neutral"=palette_michaelmallari_r[20], "Unknown"=palette_michaelmallari_r[21], "Bad"=palette_michaelmallari_r[2])) +
        ggplot2::guides(fill=guide_legend(reverse=TRUE)) +
        ggplot2::labs(
            title="Getting Mileage From the OGs",
            alt="Getting Mileage From the OGs",
            subtitle="Average appearances based on alignment, Marvel Universe characters = 16,376",
            x="Year of First Appearance",
            y=NULL,
            color=NULL,
            caption="Source: FiveThirtyEight"
        ) +
        theme_michaelmallari_r() +
        ggplot2::theme(legend.position="none")

multivariate_streamgraph_time_appearances_align_r


References

Applied Advanced Analytics & AI in Sports