PokeAPI Data Analysis

This project is meant to serve as a requests and pandas warmup for similar projects under the HackForLA repository. This specific notebook is a showcase of what we can do with the Pokemon REST API! In this project, we go over - Accessing data via an API GET request - Transforming the output data from json into a pandas dataframe -Quering and aggregating the data with pandas to answer some sample questions We are going to be using the pokemon endpoint for the PokeAPI found here. Documentation on this and the other endpoints are found here.

JSON -> DataFrame Setup

# starter code for restapi request # source: https://github.com/hackforla/data-science/wiki/Beginner-Project-(Pokemon) import requests import json import pandas as pd url = "https://pokeapi.co/api/v2/pokemon?limit=151&offset=0" res = requests.get(url) data = res.json() output = {} for pokemon in data["results"]: res = requests.get(pokemon["url"]) output[pokemon["name"]] = res.json()

Run to view results

What does the JSON actually look like?

The JSONs are too large to efficiently output in a notebook, but here are the links to visualize its nested structure (tick 'Pretty-print' at the top left for a more organized view)

Outer JSON aka 'name': https://pokeapi.co/api/v2/pokemon?limit=151&offset=0

Example of Inner (each Pokemon's personal) JSON aka 'url': https://pokeapi.co/api/v2/pokemon/1/

# check types and take a look at "output"'s structure print(f"Types:\ntype(res): {type(res)}") print(f"type(data): {type(data)}") print(f"type(output): {type(output)}\n") # first_item = list(output.items())[0] # print(first_item)

Run to view results

Normalize JSON into a pandas df

pd.json_normalize(data=output)

Run to view results

This fully normalized df isn't particularly useful as there's a sea of information about attributes that are not useful yet. Therefore, we'll filter how we normalize the JSON based on the attributes we need to answer each specific question.

Source: https://www.geeksforgeeks.org/converting-nested-json-structures-to-pandas-dataframes/

1. Which pokemon that is a grass type has the largest hp stat?

Here, the relevant (sub)fields would be 'name' (the Pokemon), 'types.type.name' (its type), and 'stats.stat.name' (its hp).

First, output needs to be converted to a list of dictionaries, so that we can have each Pokemon be a record when we normalize

# Convert the nested dictionary into a list of dictionaries nestedList = [{'pokemon': key, **value} for key, value in output.items()] #nestedList[0] # check structure of bulbasaur's entry using first item

Run to view results

Since the JSON is nested, we'll want to use json_normalize's record_path and meta parameters to specify how to flatten the JSON.

df1 = pd.json_normalize(nestedList, record_path=['stats'], meta = ['id', 'name'], meta_prefix = 'pokemon_') # filter rows where stat.name == 'hp' df1 = df1[df1['stat.name'] == 'hp'] # sort by hp value df1 = df1.sort_values(by='base_stat', ascending=False) df1

Run to view results

df2 = pd.json_normalize(nestedList, record_path=['types'], meta = ['id', 'name'], meta_prefix = 'pokemon_') # filter rows where type.name == 'grass' df2 = df2[df2['type.name'] == 'grass'] df2

Run to view results

Now that we have two resulting df's that have been normalized and filtered to the desired attributes (type and hp), we can merge the tables, joining on pokemon_id, to produce a table that can answer the question

df3 = pd.merge(df1, df2, on='pokemon_id') df3

Run to view results

Specifically, the answer to "what Grass type Pokemon has the highest hp?" is given by the first record:

df3['pokemon_name_x'].head(1)

Run to view results

Answer: Exeggutor

2. How many pokemon have poison as one of their types?

Since we already found a way to normalize a df of all Pokemon types, we can simply filter to only those with type 'poison'.

df2 = pd.json_normalize(nestedList, record_path=['types'], meta = ['id', 'name'], meta_prefix = 'pokemon_') # filter to where type == 'poison' df2 = df2[df2['type.name'] == 'poison'] df2

Run to view results

df2.count()

Run to view results

Answer: 33

3. Which pokemon has the fewest available moves?

Let's create another df to normalize the 'moves' dict

df3 = pd.json_normalize(nestedList, record_path=['moves'], meta = ['id', 'name'], meta_prefix = 'pokemon_') df3

Run to view results

# count how many times each pokemon_name appears and sort by highest count df3['pokemon_name'].value_counts(ascending=True)

Run to view results

Answer: ditto has the lowest number of moves (1).

4. Which pokemon type has the fewest members?

Again, we can reuse the normalized type df

df2 = pd.json_normalize(nestedList, record_path=['types'], meta = ['id', 'name'], meta_prefix = 'pokemon_') # count occurrence of each type.name value df4 = df2['type.name'].value_counts(ascending=True) df4

Run to view results

Answer: Steel type

5. How many pokemon are in all 8 generations (yes there are 9 generations but only 8 in this API)?

We should be able to use the sprites.versions sub-dictionary to check whether a given Pokemon has dicts for generation-i to generation-viii. However, unlike the previous questions, we cannot use normalization directly since the 'versions' field contains nested dict's instead of lists, thus record_path won't work.

from collections import Counter generation_counter = Counter() # Iterate through each Pokémon's data for pokemon_data in output.values(): # Navigate to the 'versions' dictionary within 'sprites' versions = pokemon_data.get('sprites', {}).get('versions', {}) # Update the counter with the generation keys generation_counter.update(versions.keys()) # Convert the Counter to a DataFrame for better readability generation_df = pd.DataFrame.from_dict(generation_counter, orient='index', columns=['count']).reset_index() generation_df = generation_df.rename(columns={'index': 'generation'}) print(generation_df)

Run to view results

Answer: Upon further inspection, it seems each Pokemon in the Gen 1 dataset has a folder for each of the other generations, implying that all 151 are in all 8 generations. However, it's possible that some Pokemon's generation folders are empty or that the question was misinterpreted.

Bonus Question: What's the distribution of types across pokemon with the 50 highest HPs?

To answer this, we can normalize the stats and types dictionaries again.

# 25 highest HP pokemon df1.head(25)

Run to view results

# normalized pokemon types df2 = pd.json_normalize(nestedList, record_path=['types'], meta = ['id', 'name'], meta_prefix = 'pokemon_') df2.head()

Run to view results

df6 = pd.merge(df1.head(50), df2, on='pokemon_id', how='left') df6

Run to view results

Despite left joining, there are more rows due to the fact that there are pokemon with more than one type. Still, we can check the value counts to see which type has the most members of the top-20 hp club:

# distribution of type.name's across df6 # turn series into df to be able to visualize df7 = df6['type.name'].value_counts().to_frame() df7['type'] = df7.index df7

Run to view results

Answer:

Run to view results

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}PokeAPI Data Analysis

JSON -> DataFrame Setup

What does the JSON actually look like?

Normalize JSON into a pandas df

1. Which pokemon that is a grass type has the largest hp stat?

2. How many pokemon have poison as one of their types?

3. Which pokemon has the fewest available moves?

4. Which pokemon type has the fewest members?

5. How many pokemon are in all 8 generations (yes there are 9 generations but only 8 in this API)?

Bonus Question: What's the distribution of types across pokemon with the 50 highest HPs?

PokeAPI Data Analysis