An Interactive Monty Hall Simulation

nbinteract was designed to make interactive explanations easy to create. In this tutorial, we will show the process of writing a simulation from scratch and visualizing the results interactively.

In this section, you will create an interactive simulation of the Monty Hall Problem. You may continue writing code in the notebook from the previous section or create a new one for this section.

The Monty Hall Problem

The Monty Hall Problem (Wikipedia) is a famous probability problem that has stumped many, mathematicians included. The problem goes something like this:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to switch your choice to door No. 2?" Is it to your advantage to switch your choice?

Perhaps unintuitively, you will win the prize about twice as often if you switch doors. We can show this through simulation.

Simulating a Game

One way to write an interactive explanation is to write functions and create interactions for each one as applicable. Composing the functions allows you to create more complicated processes. nbinteract also provides tools for interactive visualizations as we will soon see.

Let's start with defining a function to simulate one round of the Monty Hall Problem.

from ipywidgets import interact
import numpy as np
import random

PRIZES = ['Car', 'Goat 1', 'Goat 2']

def monty_hall(example_num=0):
    '''
    Simulates one round of the Monty Hall Problem. Outputs a tuple of
    (result if stay, result if switch, result behind opened door) where
    each results is one of PRIZES.
    '''
    pick = random.choice(PRIZES)
    opened = random.choice(
        [p for p in PRIZES if p != pick and p != 'Car']
    )
    remainder = next(p for p in PRIZES if p != pick and p != opened)
    return (pick, remainder, opened)

Note that the example_num argument is passed in but not used in the monty_hall function. Although it's unneeded for the function, it is easier to use interact to call functions when they have arguments to manipulate:

interact(monty_hall, example_num=(0, 100));

By interacting with the function above, we are able to informally verify that the function never allows the host to open a door with a car behind it. Even though the function is random we are able to use interaction to examine its long-term behavior!

We'll continue by defining a function to simulate a game of Monty Hall and output the winning strategy for that game:

def winner(example_num=0):
    '''
    Plays a game of Monty Hall. If staying with the original door wins
    a car, return 'stay'. Otherwise, the remaining door contains the car
    so 'switch' would have won.
    '''
    picked, _, _ = monty_hall()
    return 'stay' if picked == 'Car' else 'switch'

interact(winner, example_num=(0, 100));

Again, a bit of interaction lets us quickly examine the behavior of winner. We can see that switch appears more often than stay.

Brief Introduction to Plotting with nbinteract

Let's create an interactive bar chart of the number of times each strategy wins. We'll use nbinteract's plotting functionality.

nbi.bar creates a bar chart:

import nbinteract as nbi

nbi.bar(['a', 'b'], [4, 6])

To make an interactive chart, pass a response function in place of one or both of bar's arguments.

# This function generates the x-values
def categories(n):
    return list('abcdefg')[:n]

# This function generates the y-values (heights of bars)
# The y response function always takes in the x-values as its
# first argument
def offset_y(xs, offset):
    num_categories = len(xs)
    return np.arange(num_categories) + offset

# Each argument of the response functions is passed in as a keyword
# argument to `nbi.bar` in the same format as `interact`
nbi.bar(categories, offset_y, n=(1, 7), offset=(0, 10))

Visualizing the Winners

Now, let's turn back to our original goal: plotting the winners as games are played.

We can call winner many times and use nbi.bar to show the bar chart as it's built over the trials.

Note that we compute the results before defining our function won. This has two benefits over running the simulation directly in won:

  1. It gives us consistency in our interaction. If we run a random simulation in won, moving the slider from 500 to a different number back to 500 will give us a slightly different bar chart.
  2. It makes the interaction smoother since less work is being done each time the slider is moved.
categories = ['stay', 'switch']

winners = [winner() for _ in range(1000)]

# Note that the the first argument to the y response function
# will be the x-values which we don't need
def won(_, num_games):
    '''
    Outputs a 2-tuple of the number of times each strategy won
    after num_games games.
    '''
    return (winners[:num_games].count('stay'),
            winners[:num_games].count('switch'))

nbi.bar(categories, won, num_games=(1, 1000))

Note that by default the plot will adjust its y-axis to match the limits of the data. We can manually set the y-axis limits to better visualize this plot being built up. We will also add labels to our plot:

options = {
    'title': 'Number of times each strategy wins',
    'xlabel': 'Strategy',
    'ylabel': 'Number of wins',
    'ylim': (0, 700),
}

nbi.bar(categories, won, options=options, num_games=(1, 1000))

We can get even fancy and use the Play widget from ipywidgets to animate the plot.

from ipywidgets import Play

nbi.bar(categories, won, options=options,
        num_games=Play(min=0, max=1000, step=10, value=0, interval=17))

Now we have an interactive, animated bar plot showing the distribution of wins over time for both Monty Hall strategies. This is a convincing argument that switching is better than staying. In fact, the bar plot above suggests that switching is about twice as likely to win as staying!

Simulating Sets of Games

Is switching actually twice as likely to win? We can again use simulation to answer this question by simulating sets of 50 games at a time. recording the proportion of times switch wins.

def prop_wins(sample_size):
    '''Returns proportion of times switching wins after sample_size games.'''
    return sum(winner() == 'switch' for _ in range(sample_size)) / sample_size

interact(prop_wins, sample_size=(10, 100));

We can then define a function to play sets of games and generate a list of win proportions for each set:

def generate_proportions(sample_size, repetitions):
    '''
    Returns an array of length reptitions. Each element in the list is the
    proportion of times switching won in sample_size games.
    '''
    return np.array([prop_wins(sample_size) for _ in range(repetitions)])

interact(generate_proportions, sample_size=(10, 100), repetitions=(10, 100));

Interacting with generate_proportions shows the relationship between its arguments sample_size and repetitions more quickly than reading the function itself!

Visualizing Proportions

We can then use nbi.hist to show these proportions being computed over runs.

Again, we pre-compute the simulations and interact with a function that takes a slice of the simulations to make the interaction faster.

# Play the game 10 times, recording the proportion of times switching wins.
# Repeat 100 times to record 100 proportions
proportions = generate_proportions(sample_size=10, repetitions=100)

def props_up_to(num_sets):
    return proportions[:num_sets]

nbi.hist(props_up_to, num_sets=Play(min=0, max=100, value=0, interval=50))

As with last time, it's illustrative to specify the limits of the axes:

options = {
    'title': 'Distribution of win proportion over 100 sets of 10 games when switching',
    'xlabel': 'Proportions',
    'ylabel': 'Percent per area',
    'xlim': (0.3, 1),
    'ylim': (0, 3),
    'bins': 7,
}

nbi.hist(props_up_to, options=options, num_sets=Play(min=0, max=100, value=0, interval=50))

We can see that the distribution of wins is centered at roughly 0.66 but the distribution almost spans the entire x-axis. Will increasing the sample size make our distribution more narrow? Will increasing repetitions do the trick? Or both? We can find out through simulation and interaction.

We'll start with increasing the sample size:

varying_sample_size = [generate_proportions(sample_size, repetitions=100)
                       for sample_size in range(10, 101)]

def props_for_sample_size(sample_size):
    return varying_sample_size[sample_size - 10]

changed_options = {
    'title': 'Distribution of win proportions as sample size increases',
    'ylim': (0, 6),
    'bins': 20,
}

nbi.hist(props_for_sample_size,
         options={**options, **changed_options},
         sample_size=Play(min=10, max=100, value=10, interval=50))

So increasing the sample size makes the distribution narrower. We can now see more clearly that the distribution is centered at 0.66.

We can repeat the process for the number of repetitions:

varying_reps = [generate_proportions(sample_size=10, repetitions=reps) for reps in range(10, 101)]

def props_for_reps(reps):
    return varying_reps[reps - 10]

changed_options = {
    'title': 'Distribution of win proportions as repetitions increase',
    'ylim': (0, 5),
}

nbi.hist(props_for_reps,
         options={**options, **changed_options},
         reps=Play(min=10, max=100, value=10, interval=50))

Our distribution gets smoother as the number of repetitions increases since we have more proportions to plot. However, the spread of the distribution remains the same.

Results

Through simulation, we've shown that increasing the sample size causes our distribution of proportions to become more narrowly distributed. Increasing the number of repetitions makes our distribution look smoother when plotted but doesn't affect the distribution spread.

We've also seen through increasing the sample size shows that our distribution is centered at 0.66. This implies that the probability of winning the Monty Hall game when switching is around 0.66, and thus the probability of winning the Monty Hall game when staying is around 0.33.

Conclusion

Through this extended example, we've shown how interaction and visualization can help explore ideas and convincingly explain concepts. Congrats on making it through the tutorial! Let's now publish your results to the web.

Like the previous section of the tutorial, you can paste this code in a cell and fill in your Binder spec in order to export your notebook as a HTML file (also replacing tutorial.ipynb with the name of your notebook if needed):

nbi.publish('<replace with binder spec>', 'tutorial.ipynb')

The code above will output a link to download an HTML file.

You may now open your GitHub repo and upload your HTML file by dragging onto the page. Commit your changes, and you now have an interactive explanation of the Monty Hall problem to show the world!

results matching ""

    No results matching ""