Pokemon Prediction

December 25, 2020 - 4 minutes
R Classification Tidymodels Tidyverse

Overview

I grew up playing the original version of Pokemon on my gray brick of a Gameboy, so what would be a better way to re-live my childhood other than digging through the data. Let’s try predicting whether or not a Pokemon is legendary based off their attributes. Historically, I have used the Caret package as my modeling framework, but we will use the more modern Tidymodels suite of packages made by Max Kuhn & Rstudio Team.

What we are going to cover:

  • Set Up
  • Data Wrangling
  • Exploratory Data Analysis (EDA)
  • Modeling
  • Conclusion



Set Up

We are going to be loading two meta packages Tidyverse for data manipulation, cleaning & visualization and Tidymodels for model training & evaluation.

# If not installed unncomment below and run
# install.packages("tidyverse", dependencies = TRUE)
# install.packages("tidymodels", dependencies = TRUE)

# if installed load
library(tidyverse)  # plotting and data manipulation
library(tidymodels) # tidy model framework



Data Wrangling

We are going to need to load the data, so I store a csv of seven generations of Pokemon. The block below will import the data and then check to see if it worked.

# import data from github
pokemon_df <- read_csv("https://raw.githubusercontent.com/Jordan-Krogmann/pokemon/master/data/pokemon.csv")

# check top rows
head(pokemon_df, 3)
## # A tibble: 3 x 41
##   abilities against_bug against_dark against_dragon against_electric
##   <chr>           <dbl>        <dbl>          <dbl>            <dbl>
## 1 ['Overgr~           1            1              1              0.5
## 2 ['Overgr~           1            1              1              0.5
## 3 ['Overgr~           1            1              1              0.5
## # ... with 36 more variables: against_fairy <dbl>, against_fight <dbl>,
## #   against_fire <dbl>, against_flying <dbl>, against_ghost <dbl>,
## #   against_grass <dbl>, against_ground <dbl>, against_ice <dbl>,
## #   against_normal <dbl>, against_poison <dbl>, against_psychic <dbl>,
## #   against_rock <dbl>, against_steel <dbl>, against_water <dbl>, attack <dbl>,
## #   base_egg_steps <dbl>, base_happiness <dbl>, base_total <dbl>,
## #   capture_rate <chr>, classfication <chr>, defense <dbl>,
## #   experience_growth <dbl>, height_m <dbl>, hp <dbl>, japanese_name <chr>,
## #   name <chr>, percentage_male <dbl>, pokedex_number <dbl>, sp_attack <dbl>,
## #   sp_defense <dbl>, speed <dbl>, type1 <chr>, type2 <chr>, weight_kg <dbl>,
## #   generation <dbl>, is_legendary <dbl>


Next I am going to add/clean a few columns that I will need later, so don’t focus on the why. There is a probably a smarter way to construct the two_types_flag, but I got brain lazy (I will clean that up later).

pokemon_df <- pokemon_df %>% 
  mutate(type2 = case_when(is.na(type2) ~ "none", TRUE ~ type2)) %>%
  mutate(two_types_flag = case_when(type2 == "none" ~ 0, TRUE ~ 1)) %>% 
  mutate(bug_type = case_when(type1  == "bug" | type2 == "bug" ~ 1, TRUE ~ 0),
         dark_type = case_when(type1  == "dark" | type2 == "dark" ~ 1, TRUE ~ 0),
         dragon_type = case_when(type1  == "dragon" | type2 == "dragon" ~ 1, TRUE ~ 0),
         electric_type = case_when(type1  == "electric" | type2 == "electric" ~ 1, TRUE ~ 0),
         fairy_type = case_when(type1  == "fairy" | type2 == "fairy" ~ 1, TRUE ~ 0),
         fighting_type = case_when(type1  == "fighting" | type2 == "fighting" ~ 1, TRUE ~ 0),
         fire_type = case_when(type1  == "fire" | type2 == "fire" ~ 1, TRUE ~ 0),
         flying_type = case_when(type1  == "flying" | type2 == "flying" ~ 1, TRUE ~ 0),
         ghost_type = case_when(type1  == "ghost" | type2 == "ghost" ~ 1, TRUE ~ 0),
         grass_type = case_when(type1  == "grass" | type2 == "grass" ~ 1, TRUE ~ 0),
         ground_type = case_when(type1  == "ground" | type2 == "ground" ~ 1, TRUE ~ 0),
         ice_type = case_when(type1  == "ice" | type2 == "ice" ~ 1, TRUE ~ 0),
         normal_type = case_when(type1  == "normal" | type2 == "normal" ~ 1, TRUE ~ 0),
         poison_type = case_when(type1  == "poison" | type2 == "poison" ~ 1, TRUE ~ 0),
         psychic_type = case_when(type1  == "psychic" | type2 == "psychic" ~ 1, TRUE ~ 0),
         rock_type = case_when(type1  == "rock" | type2 == "rock" ~ 1, TRUE ~ 0),
         steel_type = case_when(type1  == "steel" | type2 == "steel" ~ 1, TRUE ~ 0),
         water_type = case_when(type1  == "water" | type2 == "water" ~ 1, TRUE ~ 0))


Here we will be needing a training & testing set for modeling. I am going to use the first six generations of Pokemon to train our models and hold out the seventh generation to test them.

train_df <- pokemon_df %>% filter(generation != 7)
test_df <- pokemon_df %>% filter(generation == 7)



EDA

Plot 1

1

Plot 2

2

Modeling

Conclusion