Neurons, Pixels, and Decisions: The Art of Neural Networks

Chapter 1. Neurons 101: From Decisions to Outputs

Imagine you're Mr. Sampy... you've got hundreds of students, everyone turning in assignments at different times, contributing to class and different ways, all while trying to run your beloved theatre class. Now, of course, to track each student's progress, you've got your trusty rubric--a system to make decisions objectively. But, what if, instead of grading your students manually, you designed a "neural instructor" to help decide who passes and who doesn’t?

The Input

Let's say that to pass English class, we want to analyze three parts of a student's performance:

1. Class Participation: A score that reflects how often a student raises their hand or adds value to class discussions.

2. Assignments: The backbone of the class. Students' ability to demonstrate their understanding through projects and homework.

3. Projects: How students apply their knowledge creatively and practically, showcasing deeper understanding.

ENGAGEMENT: To test how well Han is doing in his class, use the sliders below!

Class Participation

/ 100

Assignments

/ 100

Projects

/ 100

But you know, not all categories are created equal. Assignments weigh the most heavily because they prove hard work and comprehension. Participation matters too, but not as much. And projects, while still important, are more about application and creativity.

So... how do we do this?

Well, out of 10, let's weigh the importance of each category accordingly!

Importance of Class Participation

/ 10

Importance of Assignments

/ 10

Importance of Projects

/ 10

The Bias

But wait... what's a "bias slider?" Well, that kind of depends on how Sampy is feeling today. Is he feeling a bit more lenient today? For us students, that's great! Has he been having a bad day today? Uh-oh? Being cooked isn't ideal, is it?

Positive Bias: A lenient instructor who gives everyone a little boost.

Negative Bias: A strict instructor who sets higher expectations, making it harder to pass.

Remember, the more positive bias you give, the better the curve in the class. The more negative bias you give, the worse the curve. I certainly hope there won't be a negative bias, but I guess if a teacher theoretically thought all the grades in her class were too high, they could "push them back down," right?

How difficult is the grading?

/ 10

Play with this yourself. If Sampy just had a great sandwich, there might be a positive bias. Suppose, Han's total score from class participation, turning in assignments, and

The Activation Functions

What are activation functions? In simple, they determine how the final weighted sum (percentage score) is interpreted into a decision (pass or fail).

There are 3 types:

1. ReLU:

What It Does: Outputs the final score directly if it’s positive, but outputs 0 if the score is negative.

In the Classroom Context: ReLU acts like a strict "you pass or fail" decision:

If the score is above 0%, it stays as is. If the score is below 0%, it becomes 0, which effectively means the student fails. If the final score is 73%, ReLU outputs 73% (Han passes). If the final score is 50%, ReLU outputs 0% (Han fails outright).

Why Use It: It’s simple and mirrors hard thresholds often used in grading systems.

2. Sigmoid:

What It Does: Maps the final score to a probability between 0 and 1 (or 0% to 100%). The higher the score, the closer it is to 100%.

In the Classroom Context: Sigmoid gives you a confidence level for passing. Instead of a hard pass/fail, it estimates the likelihood:

A final score of 85% might map to 97%, meaning Han has a 97% confidence of passing. A final score of 69% might map to 75%, meaning Han is borderline.

Why Use It: Sigmoid introduces nuance, useful for decisions requiring probabilities instead of absolutes. It’s like saying, "Alex has a high chance of passing but isn’t guaranteed."

3. Linear Activation:

What It Does: Outputs the final score as-is, with no transformation or thresholds.

In the Classroom Context: Linear simply returns Han’s raw percentage (e.g., 73%), leaving it up to the instructor to decide if the score meets the passing threshold.

If Alex scores 73%, Linear outputs 73%. If Alex scores 50%, Linear outputs 50%.

Why Use It: It’s straightforward and keeps the score in a format you can compare directly against the passing threshold.

Choosing the Right Activation

Each activation function serves a different purpose, depending on how you want to interpret the scores:

Check out this cell below!

import numpy as np import matplotlib.pyplot as plt # Combine inputs, weights, and bias dynamically (scaled properly) inputs = np.array([input_1, input_2, input_3]) # Inputs from sliders (0-100) weights = np.array([weight_1 / 10, weight_2 / 10, weight_3 / 10]) # Weights scaled (0-1) # Normalize weights so they sum to 1 weights_normalized = weights / np.sum(weights) # Scale bias dynamically (-1 to +1) bias = bias_slider / 10 # Calculate the weighted sum using normalized weights weighted_sum = np.dot(inputs, weights_normalized) + bias # Define activation functions x = np.linspace(-10, 100, 100) # Expanded range to reflect percentages relu = np.maximum(0, x) # ReLU function sigmoid = 1 / (1 + np.exp(-x / 20)) # Sigmoid function scaled to match percentage context linear = x # Linear function # Plot activation functions with the weighted sum highlighted plt.figure(figsize=(8, 6)) plt.plot(x, relu, label="ReLU") plt.plot(x, sigmoid * 100, label="Sigmoid (as %)", linestyle="--") # Convert Sigmoid to percentage plt.plot(x, linear, label="Linear") # Add the weighted sum as a vertical line plt.axvline(weighted_sum, color='red', linestyle='--', label=f'Han\'s Score = {weighted_sum:.2f}') # Add labels and title plt.title("Activation Functions Based on Han's Weighted Sum") plt.xlabel("Input (Weighted Sum)") plt.ylabel("Output (After Activation)") plt.legend() plt.grid(True) plt.show()

Run to view results

How do you think Sampy will decide if Han passes the class if he gets a different weighted sum (grade) in the class? Play around below!

activation_function

Calculations:

Will Han pass this class?

import numpy as np # Function for a single neuron def neuron_output(inputs, weights, bias, activation): z = np.dot(weights, inputs) + bias # Linear transformation if activation == "ReLU": return max(0, z) # ReLU elif activation == "Sigmoid": return 1 / (1 + np.exp(-z)) # Sigmoid elif activation == "Linear": return z # Linear activation else: return "Invalid activation!" # Fetch variables from Deepnote's UI input1 = input_1 input2 = input_2 input3 = input_3 weight1 = weight_1 / 10 weight2 = weight_2 / 10 weight3 = weight_3 / 10 bias = bias_slider activation = activation_function # Dropdown for activation function # Inputs and weights inputs = np.array([input1, input2, input3]) weights = np.array([weight1, weight2, weight3]) # Normalize weights so they sum to 1 weights_normalized = weights / np.sum(weights) # Calculate neuron output using normalized weights output = neuron_output(inputs, weights_normalized, bias, activation) # Display results print(f"Inputs: {inputs}") print(f"Weights (Normalized): {weights_normalized}") print(f"Bias: {bias}") print(f"Activation Function: {activation}") print(f"Neuron Output: {output}")

Run to view results

What is the threshold for passing the class?

/ 100

print(f"The weighted sum is {output:.2f}, which corresponds to a final grade of {output:.2f}%. Based on the passing threshold of {passing_threshold}, {'Han passes!' if output >= passing_threshold else 'Han fails.'}")

Run to view results

Chapter 2. Neural Networks: Connecting the Dots

Once again, let's imagine you're Sampy, grading a student on their performance in three categories: class participation, assignments, and projects. You might weigh each category differently—maybe participation is less important, while assignments and projects carry more weight. Then you add a bit of leniency (a bias) based on how generous you’re feeling today, and finally, you decide whether the student passes or fails. This is what we've discussed in chapter 1. But more importantly, this is, at its heart, what a neuron in a neural network does: it combines inputs (scores) with weights (importance), adjusts the result with a bias, and produces an output.

But what if you’re grading not just one student but a whole class? What if each student has scores in multiple subjects, and you want to make connections between their performances in these subjects? That’s where neural networks come in—they’re layers of interconnected neurons working together to process information.

Section 1. The Neuron as a Building Block

A neuron in a neural network is a mathematical function that combines inputs (𝑥) with weights (𝑤) and biases (𝑏), applies an activation function, and produces an output (𝑧).

For example:

Inputs: Class Participation = 90, Assignments = 85, Projects = 80. Weights: Importance of Participation = 0.4, Assignments = 0.3, Projects = 0.3. Bias: +5 (leniency).

The neuron thus calculates:

Assuming the ReLU activation function (see ch. 1), the final output is (0,𝑧), so this student would score 90.5 (thanks to that 5 extra leniency points!)

Section 2. Connecting Neurons to Build a Network

How do neurons work together?

In a neural network, a single neuron is powerful but limited—it processes one set of inputs and produces one output. To solve more complex problems, we connect multiple neurons into layers to build a network. These layers are the backbone of neural networks, allowing them to learn patterns and relationships in the data.

Note - Each dot in the diagram is a neuron, the mathematical function which, in our example from chapter 1, calculated the final grade of one student.

What Are Layers in a Neural Network?

A layer is a group of neurons that work together to process inputs and produce outputs:

1. Input Layer: This is the starting point, where raw data (like test scores or pixel intensities from an image) enters the network. Each input corresponds to one "node" in the input layer.

2. Hidden Layers: These layers sit between the input and output layers. Each hidden layer transforms the data further, enabling the network to detect patterns or features in the inputs. For example:

The first hidden layer might detect edges in an image.

The second hidden layer might combine those edges into shapes.

3. Output Layer: This layer produces the final result of the network's computations, such as:

A prediction of whether a student passes or fails.

The probability that an image contains a cat.

Each layer takes the output of the previous layer as its input, processes it using its neurons, and passes the result to the next layer.

How Layers Interact: A Simple Example

Let’s extend our “grading a class” example:

Inputs: A student has scores for participation, assignments, and projects. Class Participation = 90 Assignments = 85 Projects = 80

Hidden Layer: Imagine we have three neurons in this layer, each evaluating a different aspect of the student's performance:

For a single layer, the output of each neuron is:

For multiple neurons, the outputs form a vector:

Now imagine multiple layers! What you see above is a single layer, but the next layer would be the same, except, its input is the output, Z, of the previous layer!

Real-World Application: Grading All Students

Going back to our example, imagine you're grading an entire class, not just one student. Each student's scores are inputs to the network. Here's how a neural network would process this data:

1. Input Layer: Each student's scores are represented as a vector, e.g., [90,85,80].

2. Hidden Layer: Each neuron in the hidden layer focuses on detecting patterns or features in the students' scores.

Neuron 1 might identify students excelling in participation.

Neuron 2 might focus on those who perform well in assignments.

3. Output Layer: Produces a final score or decision (e.g., pass/fail) for each student.

So what's the point?

With a neural network, essentially a combination of layers, the scores of many students can be evaluated simultaneously, finding complex relationships between their performance in different categories. The neural network essentially creates a set of mathematical patterns that help us determine the combination of weights that will produce a certain output.

Chapter 3. Building the Dataset: Life of the Neural Network

In the previous chapters, we explored how neurons process inputs to produce outputs and how layers of neurons work together to create a neural network. But neural networks are nothing without data—it's the fuel that makes them work! In this chapter, we’ll create a simple, made-up dataset and use it to show how a neural network can process multiple entries (like student grades) and dynamically adjust outputs based on the data.

Creating the Dataset

Let’s say we’re grading students in a class. Again, each student has three performance metrics:

1. Class Participation

2. Assignments

3. Projects

We’ll generate a dataset of 10 students with random scores (from 0 to 100) for each metric. Each student will also be assigned weights based on the importance of these metrics.

import numpy as np import pandas as pd # Generate random dataset np.random.seed(42) # For consistent random results students = [f"Student {i+1}" for i in range(30)] participation_scores = np.random.randint(50, 100, size=30) assignment_scores = np.random.randint(50, 100, size=30) project_scores = np.random.randint(50, 100, size=30) # Combine into a DataFrame data = pd.DataFrame({ "Student": students, "Participation": participation_scores, "Assignments": assignment_scores, "Projects": project_scores }) # Display the dataset print("Initial Dataset:") print(data)

Run to view results

Using this dataset, let's simulate a neural network that calculates each student’s final grade based on their scores. The weights and biases are still the same (as you've set above in chapter 1). Let's see how these parameters affect the final grades.

# Neural network function def calculate_grades(data, w1, w2, w3, b): # Normalize weights so they sum to 1 total_weight = w1 + w2 + w3 w1, w2, w3 = w1 / total_weight, w2 / total_weight, w3 / total_weight # Calculate grades for each student data["Final Grade"] = ( data["Participation"] * w1 + data["Assignments"] * w2 + data["Projects"] * w3 + b ).clip(0, 100) # Clamp grades to range [0, 100] return data # Calculate grades with interactive inputs final_data = calculate_grades(data, weight_1, weight_2, weight_3, bias_slider) # Display updated dataset with final grades print(final_data)

Run to view results

But how much does each feature (e.g. participation, assignments, projects) contribute to the final grade? Using scatter plots with lines of best fit, we can identify the relationship between these features and the resulting grades. (assuming a Linear activation).

import matplotlib.pyplot as plt import numpy as np # Function to plot scatter plots with a line of best fit def plot_input_vs_grades_with_fit(data, metric): plt.figure(figsize=(8, 6)) x = data[metric] y = data["Final Grade"] # Scatter plot plt.scatter(x, y, color="darkorange", alpha=0.8, label="Data Points") # Line of best fit m, b = np.polyfit(x, y, 1) # Linear regression: slope (m) and intercept (b) plt.plot(x, m*x + b, color="blue", label=f"Best Fit Line: y = {m:.2f}x + {b:.2f}") # Chart details plt.title(f"Final Grade vs {metric}") plt.xlabel(metric) plt.ylabel("Final Grade (%)") plt.grid(True, linestyle="--", alpha=0.7) plt.ylim(0, 100) # Clamp grades to range [0, 100] plt.legend() plt.show() # Plot for each input metric with best fit lines for metric in ["Participation", "Assignments", "Projects"]: plot_input_vs_grades_with_fit(final_data, metric)

Run to view results

Checkpoint 1

Normalization and Contribution

Sometimes, one student may do better in one category that "carries" their grade, other times, it might be a different category. Some might even do moderately well across all three categories. The heatmap below visualizes the normalized contributions of three grading categories—Participation, Assignments, and Projects—to the final grades of 30 students. The colors range from red to blue, with darker red indicating higher contributions and darker blue reflecting lower contributions. Although the grading weights for these categories are constant across all students, the differences observed in normalized contributions arise due to variations in individual performance distributions. It's hard to understand this at first, run the cell below!

import seaborn as sns def plot_contribution_heatmap(data, weights): contributions = { "Participation Contribution": data["Participation"] * weights[0], "Assignments Contribution": data["Assignments"] * weights[1], "Projects Contribution": data["Projects"] * weights[2], } contribution_df = pd.DataFrame(contributions) contribution_df["Student"] = data["Student"] contribution_df.set_index("Student", inplace=True) # Normalize contributions for visual clarity normalized_contributions = contribution_df.div(contribution_df.sum(axis=1), axis=0) plt.figure(figsize=(10, 6)) sns.heatmap(normalized_contributions, annot=True, fmt=".2f", cmap="coolwarm", cbar=True) plt.title("Normalized Contribution of Inputs to Final Grades") plt.xlabel("Metrics") plt.ylabel("Students") plt.show() # Calculate weights for contributions weights_array = np.array([weight_1 / (weight_1 + weight_2 + weight_3), weight_2 / (weight_1 + weight_2 + weight_3), weight_3 / (weight_1 + weight_2 + weight_3)]) plot_contribution_heatmap(final_data, weights_array)

Run to view results

One key feature of neural networks is the process of normalization. The normalization process scales each category's raw score relative to the total scores of that student, creating a proportional representation of how much each category influenced their final grade. For instance, if one student excelled in participation but underperformed in other areas, their normalized "Participation Contribution" would appear significantly higher. Conversely, a student with balanced scores across all categories would have more uniform contributions. This does not reflect a change in the grading weights but rather highlights the impact of performance variability within the framework of consistent weights.

This phenomenon illustrates an important principle often encountered in neural network models: while the underlying weights remain consistent, the outputs (in this case, the normalized contributions) can vary depending on the input data. This chart demonstrates how, in both neural networks and grading systems, proportional relationships emerge naturally from the interactions between input features and their distribution. Such variability reinforces the significance of careful preprocessing and the contextual interpretation of normalized data.

Neural Networks: How Inputs are Related to Outputs

From the scatter plots above, we've seen the correlations of individual inputs to the output--some features are weaker at predictions, others are stronger. Neural networks, which you've so far learned are just layers of mathematical functions containing inputs manipulated by weights and biases, are used to model the relationship between various inputs and an output. In chapter 1, we've learned about how every neuron can have different weights. In this chapter, we realize that there is not one feature (participation, assignments, or grades) that can predict our final outcome: the final grade. The job of a neural network is to find the unique combination of each of the features and its relationship to the final output.

Let's see a model that shows the relationships of all of our features (participation, assignments, projects), and our output (final grade).

def plot_3d_grades_with_direction(data): # Extract the data x = data["Participation"] y = data["Assignments"] z = data["Projects"] c = data["Final Grade"] # Use the final grade as the color key # Create the 3D plot fig = plt.figure(figsize=(10, 8)) ax = fig.add_subplot(111, projection='3d') # Scatter plot sc = ax.scatter(x, y, z, c=c, cmap='viridis', s=100, alpha=0.8) # Add color bar to indicate final grade colorbar = plt.colorbar(sc, ax=ax, pad=0.1) colorbar.set_label("Final Grade (%)", fontsize=12) # Add vector arrow to show direction of increasing grade mean_x = np.mean(x) mean_y = np.mean(y) mean_z = np.mean(z) max_x, max_y, max_z = np.max(x), np.max(y), np.max(z) ax.quiver(mean_x, mean_y, mean_z, max_x - mean_x, max_y - mean_y, max_z - mean_z, color='red', linewidth=2, label="Direction of Higher Grades", arrow_length_ratio=0.2) # Set labels and title ax.set_title("3D Scatter Plot of Inputs vs Final Grade with Direction", fontsize=15) ax.set_xlabel("Participation", fontsize=12) ax.set_ylabel("Assignments", fontsize=12) ax.set_zlabel("Projects", fontsize=12) # Add legend and grid ax.legend(loc="upper left") ax.grid(True, linestyle="--", alpha=0.7) plt.show() # Call the function to plot with direction plot_3d_grades_with_direction(final_data)

Run to view results

The 3D scatter plot helps us visualize how the combination of Participation, Assignments, and Projects contributes to the final grade. The red vector arrow indicates the direction where grades are highest, pointing toward the maximum contributions of all three features.

This visualization answers a key question: How do these features interact?

Concentration of points: Students who perform consistently well in all three metrics (Participation, Assignments, Projects) will cluster near the higher grades.

Spread of data: Variability in one metric (e.g., low Participation but high Assignments and Projects) may result in students achieving diverse final grades.

Grade trajectory: The red arrow suggests that excelling in all three features leads to the highest possible grades, but a lower grade in any one feature could pull the trajectory downward.

What've We Learned?

In Chapter 3, we explored the foundational elements of neural networks using the classroom example application. We've looked at how networks learn by adjusting weights and biases to minimize error through backpropagation and gradient descent. We also discussed the importance of activation functions, which introduce non-linearity to model complex relationships in data, and how the architecture of a neural network, such as the number of layers and neurons, influences its ability to generalize patterns.

One of the most significant takeaways from this chapter was the role of data preprocessing, particularly normalization. We learned that normalization ensures that input features are scaled consistently, improving the efficiency of training and preventing issues like exploding or vanishing gradients. This step not only speeds up convergence but also ensures stability and equal feature importance, laying the groundwork for a well-functioning neural network.

We also examined practical examples, such as the heatmap visualization, to connect these concepts to real-world scenarios. This showed how neural networks can model the relationships between inputs and outputs, even when those relationships are influenced by variability in data distribution. The important point is: a neural network is used to connect inputs and outputs together through a set of layers that contains the weights. Using these established weights, the neural network can calculate and compute additional outputs based on additional inputs.

Chapter 4. The Human Implications of Neural Networks

In Chapters 1-3, we explored how to train neural networks through a set of weights However, when we step into the real-world application of neural networks, the situation changes: we don’t explicitly know what the weights are. Instead, we start with raw data and rely on the network to uncover the optimal set of weights that connect inputs to outputs. Using our previous example,

The Training Process

What does it mean when someone says they are "training" a neural network? Let's say that we have a series of inputs and a series of outputs. In chapters 1-3, we fed our neural network (weights) so that it could use the inputs and weights to compute an output. However, in most neural network applications, we don't actually know what the weights are, we only know what the inputs and outputs are. Here, instead of giving the neural network inputs and weights and asking it for outputs, we are giving the neural network inputs and outputs, and asking it to compute the weights.

Let's say that we have one dimension of inputs (one characteristic) which produces the outputs. We want to find the "weights" of the neural network, which is represented mathematically.

Epochs refer to the amount of data that is being trained by the neural network. Essentially, just know that as the epoch increases, more inputs and outputs are being fed into the neural network model, which will cause the model to fine-tune these "weights" to best model the relationship between inputs and outputs.

This is a simple example, where we have one characteristic of inputs and one characteristic of outputs (e.g. modeling the relationship between participation and final grades), so we use a 2D model. From the chapter 3 example, if we want to measure 3 characteristics and their relationship with one final output (final grade), we will need a 4D model.

Example 1: Predicting Crime

Let's generate a random sample of crime data, which will include inputs such as minority demographics, income level, neighborhood, past history, and education level. The output is whether or not there is low, medium, or high crime risk. How does this neural network impact society?

import pandas as pd import numpy as np # Generating synthetic data for 100 neighborhoods np.random.seed(42) # Possible values for each feature income_levels = ["High", "Medium", "Low"] education_levels = ["College Graduate", "Some College", "High School", "Middle School"] minority_status = ["Yes", "No"] # Generate data data = { "Neighborhood": [f"N{i}" for i in range(1, 101)], "Income Level": np.random.choice(income_levels, 100, p=[0.3, 0.4, 0.3]), "Education Level": np.random.choice(education_levels, 100, p=[0.3, 0.3, 0.3, 0.1]), "Minority": np.random.choice(minority_status, 100, p=[0.6, 0.4]), "Historical Arrests": np.random.randint(1, 101, 100), "Predicted Crime Risk": np.random.choice(["Low", "Medium", "High"], 100, p=[0.4, 0.3, 0.3]), } # Convert to DataFrame crime_data = pd.DataFrame(data) # Adjusting the dataset to ensure strong correlations # Define a function to determine crime risk with 80% correlation based on features def determine_crime_risk(row): if row["Income Level"] == "High" and row["Education Level"] == "College Graduate": return "Low" elif row["Income Level"] == "Low" or row["Minority"] == "Yes": return "High" else: return "Medium" # Apply the function to create a highly correlated "Predicted Crime Risk" crime_data["Predicted Crime Risk"] = crime_data.apply(determine_crime_risk, axis=1) # Display the entire DataFrame print(crime_data)

Run to view results

# Generating more samples for the "Low" category while maintaining overall correlation above 80% # Define a function to generate additional "Low" samples def generate_low_samples(n): low_samples = pd.DataFrame({ "Neighborhood": [f"New_Low_{i}" for i in range(1, n + 1)], "Income Level": ["High"] * n, "Education Level": ["College Graduate"] * n, "Minority": ["No"] * n, "Historical Arrests": np.random.randint(0, 6, size=n), # Low arrests "Predicted Crime Risk": ["Low"] * n }) return low_samples # Generate 30 additional "Low" samples new_low_samples = generate_low_samples(30) # Append the new "Low" samples to the original dataset balanced_crime_data = pd.concat([crime_data, new_low_samples], ignore_index=True) # Check the distribution of "Predicted Crime Risk" distribution = balanced_crime_data["Predicted Crime Risk"].value_counts() # Display the updated dataset and distribution print("Updated Distribution of Predicted Crime Risk:") print(distribution) # Display the first few rows of the updated dataset balanced_crime_data.head()

Run to view results

To find the weights that connect inputs to output, we program a neural network with the data above. Once the set of weights are determined, the neural network can take additional inputs, apply its weights, and produce a new predicted output.

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder, MinMaxScaler from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense # Generating synthetic data with strong correlation np.random.seed(42) income_levels = ["High", "Medium", "Low"] education_levels = ["College Graduate", "Some College", "High School", "Middle School"] minority_status = ["Yes", "No"] # Generate data data = { "Income Level": np.random.choice(income_levels, 100, p=[0.3, 0.4, 0.3]), "Education Level": np.random.choice(education_levels, 100, p=[0.3, 0.3, 0.3, 0.1]), "Minority": np.random.choice(minority_status, 100, p=[0.6, 0.4]), "Historical Arrests": np.random.randint(1, 101, 100), } crime_data = pd.DataFrame(data) # Define target variable with high correlation def determine_crime_risk(row): if row["Income Level"] == "High" and row["Education Level"] == "College Graduate": return "Low" elif row["Income Level"] == "Low" or row["Minority"] == "Yes": return "High" else: return "Medium" crime_data["Predicted Crime Risk"] = crime_data.apply(determine_crime_risk, axis=1) # Preprocessing # One-hot encode categorical features encoder = OneHotEncoder(sparse=False) categorical_data = encoder.fit_transform(crime_data[["Income Level", "Education Level", "Minority"]]) # Normalize numerical features scaler = MinMaxScaler() numerical_data = scaler.fit_transform(crime_data[["Historical Arrests"]]) # Combine processed features X = np.hstack([categorical_data, numerical_data]) # Encode target variable target_encoder = OneHotEncoder(sparse=False) y = target_encoder.fit_transform(crime_data[["Predicted Crime Risk"]]) # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Define the neural network model = Sequential([ Dense(32, activation='relu', input_shape=(X_train.shape[1],)), # Increased neurons for more learning capacity Dense(16, activation='relu'), Dense(3, activation='softmax') # 3 output classes: Low, Medium, High ]) # Compile the model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # Train the model for 20 epochs model.fit(X_train, y_train, epochs=20, batch_size=8, validation_split=0.2, verbose=1) # Evaluate the model loss, accuracy = model.evaluate(X_test, y_test, verbose=0) print(f"Test Accuracy: {accuracy:.2f}") # Make predictions predictions = model.predict(X_test) # Ensure predictions and y_test are properly formatted true_classes = target_encoder.inverse_transform(y_test) # Convert true labels back to their original class predicted_classes = target_encoder.inverse_transform(predictions) # Convert predictions back to original class # Create DataFrame for comparison sample_results = pd.DataFrame({ "True": true_classes.flatten(), "Predicted": predicted_classes.flatten() }) # Display sample predictions print(sample_results.head())

Run to view results

Now, we've trained the model to optimize all the weights, or mathematical relationships, between our inputs and outputs. We can now use these weights, take an input, and predict an output.

Let's play around with this model by putting in random inputs (characteristics) for a person and testing if they are low, middle, or high crime risk.

income level

education level

minority status

historical charges or arrests

/ 100

# Convert Deepnote inputs into a DataFrame input_data = pd.DataFrame({ "Income Level": [income.strip().capitalize()], # Capitalize the first letter and remove extra spaces "Education Level": [education.strip().title()], # Title case for multi-word categories "Minority": [minority.strip().capitalize()], # Capitalize the first letter and remove extra spaces "Historical Arrests": [arrests] }) # Encode the categorical features using the same encoder as during training try: categorical_data = encoder.transform(input_data[["Income Level", "Education Level", "Minority"]]) except ValueError as e: print(f"Encoding error: {e}") print("Please ensure the input values match the categories used during training.") raise # Normalize the numerical features numerical_data = scaler.transform(input_data[["Historical Arrests"]]) # Combine all features into a single array processed_input = np.hstack([categorical_data, numerical_data]) # Use the trained model to make a prediction prediction = model.predict(processed_input) predicted_class = target_encoder.inverse_transform(prediction) # Output the predicted crime risk print(f"Predicted Crime Risk: {predicted_class[0][0]}")

Run to view results

Checkpoint 2

# Extract weights from the first layer of the neural network weights = model.layers[0].get_weights()[0] # Get the weights of the input layer # Sum the absolute weights for each input feature feature_importance = np.sum(np.abs(weights), axis=1) # Map feature importance to feature names feature_names = list(encoder.get_feature_names_out()) + ["Historical Arrests"] importance_df = pd.DataFrame({ "Feature": feature_names, "Importance": feature_importance }).sort_values(by="Importance", ascending=False) # Display feature importance print("Feature Importance (Model Weights):") print(importance_df)

Run to view results

Here, the distribution of importance highlights that the model heavily relies on socio-economic and demographic factors, such as income, education level, and minority status, rather than purely numerical inputs like historical arrests. Let's think: is this ethical?

Example 2: Predicting Heart Disease

Let's find another application of neural networks: healthcare. Every year, neural network applications are becoming increasingly popular and widespread in the medical field, yet, it lacks diversity in terms of demographics. Although the dataset used below is synthetically generated, we are trying to mimick a real-world scenario where the medical data is more significant amongst majority populations (especially male), as compared to underrepresented or female populations. Using the model below, we can explore how a lack of demographic diversity can influence how neural networks interact with society.

# Import necessary libraries import pandas as pd import numpy as np # Set random seed for reproducibility np.random.seed(42) # Define possible feature values genders = ["Male", "Female"] socioeconomic_status = ["Low", "Medium", "High"] # Generate synthetic data data = { "Age": np.random.randint(25, 85, 500), # Ages between 25 and 85 "Gender": np.random.choice(genders, 500, p=[0.7, 0.3]), # 70% Male, 30% Female "Cholesterol Levels": np.random.randint(100, 300, 500), # Cholesterol levels in mg/dL "Blood Pressure": np.random.randint(80, 180, 500), # Blood pressure in mmHg "Socioeconomic Status": np.random.choice(socioeconomic_status, 500, p=[0.33, 0.33, 0.34]) } # Create DataFrame healthcare_data = pd.DataFrame(data) # Define binary target variable: High or Low def determine_heart_attack_risk_binary(row): if row["Cholesterol Levels"] > 240 and row["Blood Pressure"] > 140: return "High" else: return "Low" # Apply the function to assign risk levels healthcare_data["Heart Attack Risk"] = healthcare_data.apply(determine_heart_attack_risk_binary, axis=1) # Balance the dataset by oversampling the minority class high_class = healthcare_data[healthcare_data["Heart Attack Risk"] == "High"] low_class = healthcare_data[healthcare_data["Heart Attack Risk"] == "Low"] # Oversample the minority class ("High") to match the majority class ("Low") high_oversampled = high_class.sample(n=len(low_class), replace=True, random_state=42) # Combine the oversampled "High" class with the "Low" class balanced_healthcare_data = pd.concat([high_oversampled, low_class]).sample(frac=1, random_state=42)

Run to view results

Notice how the amount of data for particularly females and low-income populations are less than those of other demographics.

# Filter out female data points female_data = balanced_healthcare_data[balanced_healthcare_data["Gender"] == "Female"] # Reduce female data points to 30 female_data_reduced = female_data.sample(n=30, random_state=42) # Keep all male data points male_data = balanced_healthcare_data[balanced_healthcare_data["Gender"] == "Male"] # Combine the reduced female data with all male data balanced_healthcare_data_limited_females = pd.concat([female_data_reduced, male_data]).sample(frac=1, random_state=42) # Display the updated dataset and new class distribution gender_distribution_updated = balanced_healthcare_data_limited_females["Gender"].value_counts() class_distribution_updated = balanced_healthcare_data_limited_females["Heart Attack Risk"].value_counts() balanced_healthcare_data_limited_females.head(), gender_distribution_updated, class_distribution_updated

Run to view results

Once again, using the following sets of inputs and outputs, we can train a model of weights that are responsible for output predictions:

# Step 1: Adjust the dataset to limit female data points # Filter out female data points female_data = balanced_healthcare_data[balanced_healthcare_data["Gender"] == "Female"] # Reduce female data points to 30 female_data_reduced = female_data.sample(n=30, random_state=42) # Keep all male data points male_data = balanced_healthcare_data[balanced_healthcare_data["Gender"] == "Male"] # Combine the reduced female data with all male data limited_female_data = pd.concat([female_data_reduced, male_data]).sample(frac=1, random_state=42) # Step 2: Preprocessing the limited dataset # Encode categorical features (Gender and Socioeconomic Status) categorical_features = limited_female_data[['Gender', 'Socioeconomic Status']] categorical_encoder = OneHotEncoder(sparse=False) categorical_data = categorical_encoder.fit_transform(categorical_features) # Normalize numerical features (Age, Cholesterol Levels, Blood Pressure) numerical_features = limited_female_data[['Age', 'Cholesterol Levels', 'Blood Pressure']] numerical_scaler = MinMaxScaler() numerical_data = numerical_scaler.fit_transform(numerical_features) # Combine processed categorical and numerical data X = np.hstack([numerical_data, categorical_data]) # Encode target variable (Heart Attack Risk) label_encoder = OneHotEncoder(sparse=False) y = label_encoder.fit_transform(limited_female_data[["Heart Attack Risk"]]) # Step 3: Splitting the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Step 4: Build and compile the neural network model model = Sequential([ Dense(16, activation='relu', input_shape=(X_train.shape[1],)), Dense(8, activation='relu'), Dense(2, activation='softmax') # Output layer for binary classification (High, Low) ]) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # Step 5: Train the model model.fit(X_train, y_train, epochs=30, batch_size=8, validation_split=0.2, verbose=1) # Step 6: Evaluate the model loss, accuracy = model.evaluate(X_test, y_test, verbose=0) print(f"Test Accuracy: {accuracy:.2f}") # Step 7: Prediction function with "Medium" logic def predict_heart_attack_risk(age, gender, cholesterol, blood_pressure, ses, threshold=0.1): # Create input DataFrame input_data = pd.DataFrame({ "Age": [age], "Gender": [gender.capitalize()], "Cholesterol Levels": [cholesterol], "Blood Pressure": [blood_pressure], "Socioeconomic Status": [ses.capitalize()] }) # Encode categorical input categorical_input = categorical_encoder.transform(input_data[['Gender', 'Socioeconomic Status']]) numerical_input = numerical_scaler.transform(input_data[['Age', 'Cholesterol Levels', 'Blood Pressure']]) processed_input = np.hstack([numerical_input, categorical_input]) # Make prediction prediction = model.predict(processed_input)[0] prob_high, prob_low = prediction # Adjust classification with "Medium" logic if abs(prob_high - prob_low) <= threshold: return "Medium" elif prob_high > prob_low: return "High" else: return "Low" # Step 8: Test the prediction function print(predict_heart_attack_risk(80, "Male", 300, 180, "Low")) print(predict_heart_attack_risk(55, "Female", 200, 120, "High"))

Run to view results

Let's play around! Use the sliders below to choose the demographic of a patient and their characteristics. How does this affect the prediction?

print(predict_heart_attack_risk(age, gender, cholesterol, blood, socioeconomic))

Run to view results

from sklearn.metrics import accuracy_score # Split the data into male and female subsets male_data = balanced_healthcare_data[balanced_healthcare_data["Gender"] == "Male"] female_data = balanced_healthcare_data[balanced_healthcare_data["Gender"] == "Female"] # Prepare inputs and targets for males X_male_categorical = categorical_encoder.transform(male_data[["Gender", "Socioeconomic Status"]]) X_male_numerical = numerical_scaler.transform(male_data[["Age", "Cholesterol Levels", "Blood Pressure"]]) X_male = np.hstack([X_male_numerical, X_male_categorical]) y_male = label_encoder.transform(male_data[["Heart Attack Risk"]]) # Prepare inputs and targets for females X_female_categorical = categorical_encoder.transform(female_data[["Gender", "Socioeconomic Status"]]) X_female_numerical = numerical_scaler.transform(female_data[["Age", "Cholesterol Levels", "Blood Pressure"]]) X_female = np.hstack([X_female_numerical, X_female_categorical]) y_female = label_encoder.transform(female_data[["Heart Attack Risk"]]) # Make predictions for males and females y_male_pred = model.predict(X_male) y_female_pred = model.predict(X_female) # Convert predictions to class indices y_male_pred_classes = np.argmax(y_male_pred, axis=1) y_female_pred_classes = np.argmax(y_female_pred, axis=1) # Calculate accuracy for males and females male_accuracy = accuracy_score(np.argmax(y_male, axis=1), y_male_pred_classes) female_accuracy = accuracy_score(np.argmax(y_female, axis=1), y_female_pred_classes) # Print the results print(f"Accuracy for Male Predictions: {male_accuracy:.2f}") print(f"Accuracy for Female Predictions: {female_accuracy:.2f}")

Run to view results

Conclusion

What are the societal implications of Neural Networks on Society?

The intersection of the mathematical and scientific principles behind neural networks with ethical, social, and humanistic concerns presents a profound challenge. While neural networks excel at identifying patterns and making predictions based on large datasets, their reliance on training data inherently reflects the biases and limitations of those datasets. These biases can have far-reaching implications, particularly when applied to socially sensitive contexts such as crime prediction and healthcare.

In our crime prediction example, the neural network was trained on a dataset that encoded racial bias. By disproportionately associating certain racial groups with higher crime risk, the model perpetuated systemic inequities. The math driving the neural network—its optimization of weights to reduce error—does not inherently recognize the ethical implications of these associations. While the network might achieve high accuracy by mirroring patterns in the training data, the resulting predictions reinforce discriminatory practices, potentially leading to unjust surveillance or policing of marginalized communities. This highlights a critical clash: the objective function of minimizing error in neural networks versus the societal imperative to ensure fairness and equity.

Conversely, in the heart attack prediction example, the training dataset excluded key societal demographics such as sex and instead reflected a population that was disproportionately male. This created a different kind of bias—one of exclusion rather than misrepresentation. The network's predictions were highly accurate for the majority demographic (males) but faltered when applied to the minority group (females). This failure underscores the ethical dilemma of deploying models trained on incomplete or unrepresentative data. It raises concerns about whether models can serve diverse populations equitably if their training is limited to one dominant group.

The mathematical underpinnings of neural networks—such as the optimization of loss functions and the tuning of weights—are agnostic to the social contexts in which they operate. However, their outputs have real-world consequences, particularly when the data reflects societal inequities or lacks inclusivity. In the case of crime prediction, the lack of ethical safeguards allowed the model to perpetuate racial biases. In healthcare, the absence of representative data created a system that prioritized accuracy for one group at the expense of another. These examples illustrate how the pursuit of mathematical accuracy often clashes with the need for ethical responsibility.

The implications extend beyond the immediate predictions of neural networks. They raise questions about accountability: who is responsible for ensuring that models are trained on fair and inclusive data? They also challenge us to consider the societal impact of these technologies—whether they are amplifying systemic inequities or working toward greater equity. By critically examining these issues, we can develop frameworks that integrate ethical considerations into the design and deployment of neural networks, ensuring that the powerful tools of math and science are used in ways that align with societal values and human dignity.

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}Neurons, Pixels, and Decisions: The Art of Neural Networks

Chapter 1. Neurons 101: From Decisions to Outputs

The Input

The Bias

The Activation Functions

Choosing the Right Activation

Calculations:

Chapter 2. Neural Networks: Connecting the Dots

Section 1. The Neuron as a Building Block

Section 2. Connecting Neurons to Build a Network

How Layers Interact: A Simple Example

Real-World Application: Grading All Students

Chapter 3. Building the Dataset: Life of the Neural Network

Creating the Dataset

Normalization and Contribution

Neural Networks: How Inputs are Related to Outputs

What've We Learned?

Chapter 4. The Human Implications of Neural Networks

The Training Process

Example 1: Predicting Crime

Example 2: Predicting Heart Disease

Conclusion

Neurons, Pixels, and Decisions: The Art of Neural Networks