Monte Carlo Methods in Python

In this notebook, I use a Monte Carlo method to estimate Pi, and then to create a simple model of disease spread based on random simulation.

import math import random import numpy as np import pandas as pd import matplotlib.pyplot as plt import plotly.graph_objects as go import plotly.express as px

Estimating Pi

How can we estimate Pi with random numbers?

We know that the area of a unit circle/area of a unit square = π/4

So in the below diagram, the probability of a randomly selected point (or 'dart') within the blue square being also inside the red circle is: P(hit) = π/4

and as P(hit) = n(hit)/n(darts),

π = 4*n(hit)/n(darts)

rectangle = plt.Rectangle((-1,-1), 2, 2,fc = "None", ec="blue") circle = plt.Circle((0,0), 1,fc = "None", ec="red") plt.gca().add_patch(rectangle) plt.gca().add_patch(circle) plt.axis('scaled') plt.axis('off') plt.show()

Lets calculate this via an easy to follow brute force method:

n = 10000000 def isInTheCircle(x,y) : #this function works out if the cooridinates x,y are in a unit circle if x*x + y*y < 1 : return True else : return False nhit = 0 for i in range(n) : if isInTheCircle(random.uniform(-1,1),random.uniform(-1,1)) == True : nhit += 1 print(4*nhit/n)

Now more efficiently with numpy arrays:

#generate an 2d array of x & y coordinates darts = np.random.uniform(-1, 1, (n,2)) #sum the squares of the x & y coordinates, and then find the total number of sums less than 1 hits = (np.sum(np.square(darts), axis=1) < 1).sum() print(4*hits/n)

Let's make a function to initialise a dataframe for n darts:

def createFrame(n) : np.random.seed(314159) darts = np.random.uniform(-1, 1, (n,2)) #x & y coordinate guesses inCircle = np.sum(np.square(darts), axis=1)<1 #returns true or false if guess in circle rollingEst = 4*np.cumsum(inCircle[1:])/np.arange(1,n) #calculates a rolling estimate of pi rollingEst = np.insert(rollingEst,0,0) #adding 0 as first value to avoid dividing by 0 data = {'x':darts[:,0],'y':darts[:,1],'In Circle?':inCircle,'Rolling Est':rollingEst} df = pd.DataFrame(data) return df

And plot the darts for a visual representation:

df = createFrame(1000) fig = go.Figure(data=go.Scatter( x = df['x'], y = df['y'], mode='markers', marker=dict( color=np.where(df['In Circle?'],'red','blue'), size = 3 ) ) ) fig.add_shape(type="circle", xref="x", yref="y", x0=-1, y0=-1, x1=1, y1=1, line_color="red", ) fig.add_shape(type="rect", xref="x", yref="y", x0=-1, y0=-1, x1=1, y1=1, line_color="blue" ) fig.update_yaxes( scaleanchor = "x", scaleratio = 1, ) fig.show()

Finally, lets show how the accuracy improves with the number of guesses:

fig = go.Figure() fig.add_trace(go.Scatter(x=df.index, y=df['Rolling Est'], mode='lines',name = 'estimate')) fig.add_hline(y=math.pi, line_width=1, line_dash="dash", line_color="green", name = 'pi') fig.update_layout( xaxis_title="Number of darts", yaxis_title="Value", hovermode="x unified" ) fig.show()

SIR model of Disease Spread

The Susceptible-Infected-Recovered (SIR) model for spread of disease simulates a population by placing all individuals into three classes at any one time: susceptible, infected, or recovered.

The model parameters are the infection rate and the recovery rate of the disease (sometimes called beta and gamma respectively). beta/gamma gives the basic reproductive number, R0, which is the average number of secondary infections caused by an infected host.

In the simple approach below, we assume that reinfection is not possible, and recovered individuals are classified as immune. We start with 1% of the population infected.

def SIRmodel(repeats=30,numDays=50,sample=500,initialImmune=0.1,recoveryRate=0.1,infectionRate=0.4) : np.random.seed(44) #initialise the arrays to store results S = np.zeros((repeats,numDays)) I = np.zeros((repeats,numDays)) R = np.zeros((repeats,numDays)) for repeat in range(repeats) : #create array that simulates population population = np.ones(sample) #assigning: susceptible = 1, infected = 2, immune = 0 #initialise the immune population population[np.random.choice(sample,int(initialImmune*sample),replace=False)]=0 #infect 1% of population randomly n = 0 while n < sample*0.01 : r = np.random.randint(sample) if population[r] == 1 : population[r] = 2 n+=1 for day in range(numDays) : #update the result arrays with daily values S[repeat,day] = (population==1).sum() I[repeat,day] = (population==2).sum() R[repeat,day] = (population==0).sum() #simulate the infections for i in range(int(I[repeat,day])) : if np.random.rand() < infectionRate : r = np.random.randint(sample) if population[r] == 1 : population[r] = 2 #simulate recovery for i in range(int(I[repeat,day])) : if np.random.rand() < recoveryRate : m = 0 while m < 1 : r = np.random.randint(sample) if population[r] == 2 : population[r] = 0 m+=1 #output the data in a dataframe that contains the daily information as percentages data = {'Day':np.arange(numDays), 'Immune':np.mean(R/sample*100, axis=0),'Infected':np.mean(I/sample*100, axis=0),'Susceptible':np.mean(S/sample*100, axis=0)} df = pd.DataFrame(data) return df

df = SIRmodel() fig = px.line(df, x='Day', y=['Immune','Infected','Susceptible'], labels={'value':'% of population'},markers=True, title='SIR model for 10% intially immune and R0 = 3' ) fig.show()

Let's briefly compare the various inputs.

As one might expect, increasing R0 (by increasing infection rate or decreasing recovery rate) makes the infection peak faster and higher:

df1 = SIRmodel(recoveryRate=0.1,infectionRate=0.25,initialImmune=0) df1['R0'] = '2.5' df2 = SIRmodel(recoveryRate=0.1,infectionRate=0.9,initialImmune=0) df2['R0'] = '9' df= df1.append(df2) fig = px.line(df,facet_col="R0", x='Day', y=['Immune','Infected','Susceptible'], labels={'value':'% of population'} ) fig.show()

Increasing the size of the initially immune population slows the spread of disease, as it increases the amount of unsuccessful transmissions/collisions (decreasing the effective R rate):

df1 = SIRmodel(initialImmune=0) df1['Immune Percentage'] = '0%' df2 = SIRmodel(initialImmune=0.3) df2['Immune Percentage'] = '30%' df= df1.append(df2) fig = px.line(df,facet_col="Immune Percentage", x='Day', y=['Immune','Infected','Susceptible'], labels={'value':'% of population'}) fig.show()

Finally, as is typical with Monte Carlo approaches, the behaviour tends towards the 'expected' values as the number of repeats increases:

df1 = SIRmodel(repeats=1) df1['Repeats'] = '1' df2 = SIRmodel(repeats=60) df2['Repeats'] = '60' df= df1.append(df2) fig = px.line(df,facet_col='Repeats', x='Day', y=['Immune','Infected','Susceptible'], labels={'value':'% of population'} ) fig.show()

Limitations

This is a simple model with a number of limitations. For example, it doesn't take into account proximity or movement of population. It also doesn't consider that some individuals may transmit the disease to more people than others, instead just using a single 'average' value. Where real diseases have windows in which they can be passed on, in this model anyone infected can pass it to anyone susceptible, at any time. We have also assumed that reinfection is not possible, which is generally not the case. If we allow for reinfection through a subtle change in the model, the variables will tend towards equilibrium values (endemic).

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}Monte Carlo Methods in Python

Estimating Pi

SIR model of Disease Spread

Monte Carlo Methods in Python