# Posenet Tutorial

Tensorflow.js model `PoseNet`

allows one to detect 2-dimensional human pose in real-time right in the browser. Check out their github repo for details.

If you use it in your project and "record" a session of the detected keypoints, you will end up with an array of "frames" where each frame is a pose. This is what we will analyze in this tutorial. To make this self-contained, we are going to use a public dataset as described below.

data_url = (
"https://raw.githubusercontent.com/maddyonline/posenet-frames/main/frames.json"
)

import requests
import json
data = json.loads(requests.get(data_url).text)

# data is essentially of the following shape
# {"frames": [{},{}, ..., {}]}
data.keys(), len(data['frames'])

# Here each array element represents a frame and contains keypoints of detected pose
# together with score which represents the confidence of prediction
data['frames'][10]['score'], data['frames'][10]['keypoints'][:5]

# there are a total of 17 keypoints
len(data['frames'][10]['keypoints'])

[kp['part'] for kp in data['frames'][10]['keypoints']]

# let us focus on a specific frame namely the 10th frame
# let us assemble (confidence) score for each of the 17 parts that is detectec
# We use the variable 'kps' to refer to keypoints-scores
kps = kps = [kp['score'] for kp in data['frames'][10]['keypoints']]
len(kps), kps

import numpy as np
# Let us create a list of these keypoints and which part connects to which
# Taken from github.com//...
PART_NAMES = [
"nose", "leftEye", "rightEye", "leftEar", "rightEar", "leftShoulder",
"rightShoulder", "leftElbow", "rightElbow", "leftWrist", "rightWrist",
"leftHip", "rightHip", "leftKnee", "rightKnee", "leftAnkle", "rightAnkle"
]
NUM_KEYPOINTS = len(PART_NAMES)
PART_IDS = {pn: pid for pid, pn in enumerate(PART_NAMES)}
CONNECTED_PART_NAMES = [
("leftHip", "leftShoulder"), ("leftElbow", "leftShoulder"),
("leftElbow", "leftWrist"), ("leftHip", "leftKnee"),
("leftKnee", "leftAnkle"), ("rightHip", "rightShoulder"),
("rightElbow", "rightShoulder"), ("rightElbow", "rightWrist"),
("rightHip", "rightKnee"), ("rightKnee", "rightAnkle"),
("leftShoulder", "rightShoulder"), ("leftHip", "rightHip")
]
CONNECTED_PART_INDICES = np.array([(PART_IDS[a], PART_IDS[b]) for a, b in CONNECTED_PART_NAMES])
# CONNECTED_PART_INDICES gives the indices of which keypoint to connect to which keypoint
# So for exmpale, (11, 5) means that 11th keypoint is adjacent to 5th keypoint.
# We see that there are potentially 12 adjacent keypoints or "edges" in our representation#
CONNECTED_PART_INDICES, CONNECTED_PART_INDICES.shape

# Out of these 12, we only keep those edges for which we have high confidence.
# For every edge such as ("leftHip", "leftShoulder") we want to connect them
# if and only if both score("leftHip") and score("leftShoulder") are high.

scores_on_adjacent_points = np.vstack([
np.take(kps, np.array(CONNECTED_PART_INDICES)[:, 0]),
np.take(kps, np.array(CONNECTED_PART_INDICES)[:, 1])
]).T

# this variable replaces CONNECTED_PART_INDICES pairs with the scores
# So for example, (11, 5) => [score(11-th keypoint), score(5-th keypoint)]
scores_on_adjacent_points, scores_on_adjacent_points.shape

# Recall we want to only keep rows in `scores_on_adjacent_points` where both entries are high
# That is minimum of the two entries is above a threshold (say 0.5)
np.min(scores_on_adjacent_points, axis=1)

np.min(scores_on_adjacent_points, axis=1) > 0.5

# selecting relevant indices (we see only 10 out of 12 rows remain)
relevant = np.min(scores_on_adjacent_points, axis=1) > 0.5
high_confidence_visible_adjacent_parts = CONNECTED_PART_INDICES[relevant]
high_confidence_visible_adjacent_parts, high_confidence_visible_adjacent_parts.shape

# Now our goal is to replace each of these keypoint indices with their actual x,y coordinates
# That is, we want to replace [11, 5] => with [[x1,y1], [x2, y2]]
# where (x1, y1) are coordinaes for joint-11 and (x2, y2) are coordinates for joint-5
# Let us call [x1, y1] as "start" and [x2, y2] as "end" representing start and end of a line-segment

# These are the joint/keypoint indices for "start" points
high_confidence_visible_adjacent_parts[:, 0]

# These are the joint/keypoint indices for "end" points
high_confidence_visible_adjacent_parts[:, 1]

# Before this let us get all frames keypoints in an array
def get_landmark_coordinates(keypoints):
return np.array([[landmark['position']['x'], landmark['position']['y']] for landmark in keypoints])
frames_arr = np.array([get_landmark_coordinates(frame['keypoints']) for frame in data['frames']])

# this represents the 17 keypoints x-y coordinates for all 357 frames
frames_arr.shape

# Coming back to start/end array described above
start = np.take(frames_arr[10], high_confidence_visible_adjacent_parts[:, 0], axis=0)
start, start.shape

end = np.take(frames_arr[10], high_confidence_visible_adjacent_parts[:, 1], axis=0)
end, end.shape

# Line segments that we need to join
segments = np.hstack([start, end]).reshape((10, 2, 2))
segments, segments.shape

from matplotlib.lines import Line2D
from matplotlib.collections import LineCollection
import matplotlib.pyplot as plt
collection = LineCollection(segments)
fig, ax = plt.subplots()
ax.add_collection(collection)
ax.plot(frames_arr[10][:, 0], frames_arr[10][:, 1], 'ro')
ax.set_xlim(0, 800)
ax.set_ylim(-100, 800)
plt.gca().invert_yaxis()
plt.show()