Integrals in Probability

Notes on the definite integral and probability modeling, with a mix of formal notation, plotly visuals, and sympy computation.
R
Calculus
Probability
Author

Peter Amerkhanian

Published

April 14, 2024

Code
library(plotly)
library(dplyr)

x_vec <- seq(0, 25)
y_vec <- seq(0, 25)

vline <- function(x = 0, color = "black") {
  list(
    type = "line",
    y0 = 0,
    y1 = 1,
    yref = "paper",
    x0 = x,
    x1 = x,
    line = list(color = color, dash="dot")
  )
}

The following are notes on using integration for finding cumulative probability distribution functions in the univariate and joint settings. The exercises and citations are all coming out of a textbook for a course in vector calculus (Strang and Herman 2016).

Learning Goals

  1. Get comfortable with leveraging \(u\) subsitition to compute single and double integrals over general regions, by hand.

  2. Connect the by-hand approach to a more practically useful approach that leverages a computer algebra system. We will cover examples in sympy.

  3. Connect integration and probability models to policy analysis. These methods are of particular use in logistics problems that involve wait times.

Modeling a single waiting time

Consider the following question:

At a drive-thru restaurant, customers spend, on average, 3 minutes placing their orders […] find the probability that customers spend 6 minutes or less placing their order. […] Waiting times are mathematically modeled by exponential density functions.

(Strang and Herman 2016, chap. 5.2)

To model this and calculate our target quantity, we’ll start by referring to the process of placing one’s order as a random variable, \(X\). We will model that random variable using an exponential probability density function (pdf): \[ f(x) = \begin{cases} \frac{1}{\beta} e^{-\frac{x}{\beta}} \quad &x \geq 0 \\ 0 \quad &x < 0 \end{cases} \tag{1}\]

Where \(\beta\) is the average waiting time (Strang and Herman 2016, chap. 5.2).

We set the equation up in R and plot it so that we can understand what it represents.

P_x <- function(x, beta=3) {
  case_when(x < 0 ~ 0,
            x >= 0 ~ (1/beta) * exp(-x/beta)
            )
}
Code
data <- data.frame(x = x_vec, y = P_x(x_vec))
ggplot(data, aes(x = x, y = y)) +
  geom_point(color = "royalblue", size=3.1, alpha=.8) +
  geom_line(color = "royalblue", size=1.2, alpha=.7) +
  labs(x = "x (Minutes)", y = "f(x)", title = "An exponential probability density function (pdf)") +
  theme_minimal()

At each \(x\), this function outputs the probability that it takes \(x\) minutes placing an order in the drive-thru. For example, the probability that it takes exactly six minutes to place an order is as follows:

P_x(6)
[1] 0.04511176

That’s what a pdf gives us and that’s cool. However, our question asks for the probability that it takes less than or equal to six minutes to order. That involves adding up all of the probabilities for times \([0,6]\) over this continuous function – sounds like an integral. Indeed, we’ll need to integrate over the following region:

Code
plot_ly(
  x = x_vec,
  y = P_x(x_vec),
  type = "scatter",
  mode="line",
  name = "P(x)") %>% add_trace(
    x = 0:6,
    y = P_x(0:6, 3),
    type = "scatter",
    mode="none",
    fill = "tozeroy",
    name = "P(x) <= 6"
) %>% 
  layout(plot_bgcolor = "#e5ecf6", shapes = list(vline(6))
         )

We can formally define that integral as follows:

\[ \begin{align*} P(X \leq 6) &= \int_0 ^6 f(x)dx \\ &= \int_0 ^6 \frac{1}{\beta} e^{-\frac{x}{\beta}}dx \end{align*} \] Now that we’ve defined the problem, we’ll solve it by-hand, then using symbolic computation software.

Solving by hand

After setting the problem up, we can solve it step-by-step using \(u\) substitution to compute the integral: \[ \begin{align*} P(X \leq 6) &= \int_0 ^6 \frac{1}{\beta} e^{-\frac{x}{\beta}}dx \\ \text{Substitute}\, u &= -\frac{x}{\beta} \\ \text{And substitute}\, dx \rightarrow \, \frac{du}{dx} &= -\frac{1}{\beta} \rightarrow dx = -\beta du \\ P(X \leq 6) &= \int_0 ^6 \frac{1}{\beta} e^u (-\beta du) \\ &= -1\int_0 ^6 e^u du \\ &= - e^u \Big|_0^6 \\ P(X \leq 6) &= - e^{-\frac{x}{\beta}} \Big|_0^6 \\ \end{align*} \]

The last line gives us the following algebra: \(- e^{-\frac{x}{\beta}} \Big|_0^6 = - e^{-\frac{6}{3}} - (- e^{-\frac{0}{3}}) = 1 - e^{-\frac{6}{3}}\) Leading us to our answer:

(-exp(-6/3)) + 1
[1] 0.8646647

Which is the probability that it takes six minutes or less to order in this drive-thru.

Solving with sympy

Now we’ll overview a more practically useful way of solving the integral using python’s sympy library. To switch to python, I’ll first load the reticulate library in R and specify that I want to use my conda install of python, which comes pre-loaded with sympy.

library(reticulate)
use_condaenv('base')

Now I’ll switch to python code and set up some preliminaries – I define a custom render function so that I can output the sympy equations I create in LaTeX:

python
import sympy as sp
from IPython.display import Markdown
render = lambda x: Markdown(sp.latex(x, mode='equation'))

Now I’ll define the variables, the constant \(e\),1 and Equation 1 in sympy. sympy will automatically apply some simplification to the expression, so don’t worry if it doesn’t look exactly like Equation 1.

python
# Define the variables
x, beta = sp.symbols('x beta')
# Define the constant, e
e = sp.exp(1)
# Define the equation
pdf = (1/beta) * e**(-x/beta)
# Render the equation
render(pdf)

\[\begin{equation}\frac{e^{- \frac{x}{\beta}}}{\beta}\end{equation}\]

In our specific problem, we know that the average time it takes to place an order is 3, so we’ll substitute in \(\beta=3\).

python
f_x = pdf.subs({beta: 3})
render(f_x)

\[\begin{equation}\frac{e^{- \frac{x}{3}}}{3}\end{equation}\]

Integration using sympy is trivially easy – to get the indefinite integral, we can just call .integrate() and supply the variable of integration.

python
render(f_x.integrate(x)) 

\[\begin{equation}- e^{- \frac{x}{3}}\end{equation}\]

The problem requires the definite integrate, which is just as straightforward – we supply the variable to integrate over, and the upper and lower limits of integration.

python
render(f_x.integrate( (x, 0, 6) )) 

\[\begin{equation}1 - e^{-2}\end{equation}\]

We’ll retrieve the decimal output by running .evalf() on the integral.

python
f_x.integrate( (x, 0, 6) ).evalf()
0.864664716763387

Much less painful than \(u\)-substitution!

Cumulative Probability

When we derived the indefinite integral, \(\int \frac{1}{\beta} e^{-\frac{x}{\beta}}dx = - e^{-\frac{x}{\beta}}\), in the process of solving our definite integral problem, we actually derived another useful function – the cumulative distribution function (cdf). The cumulative distribution function describes the probability that \(x\) is some number, say \(a\), or less, and it is generally the integral of the probability density function (“Cumulative Distribution Function” 2024). In the case of the exponential function, it is as follows:

\[ F(a) = P(x \leq a) = \int_0^a P(x)dx =- e^{-\frac{x}{\beta}} \Big|_0^a \]

Or, in code:

cdf_x <- function(x, beta=3){
  -1 * exp(-x/beta)
}

We’ll confirm that \(F(6) = e^{-\frac{x}{\beta}} \Big|_0^6\)

1 + cdf_x(6)
[1] 0.8646647

Here are the pdf and cdf functions plotted together for the exponential distribution with an average of 3 (\(\beta=3\)). At each point on the orange line (the pdf), you can hover over and find out the probability that the drive-thru takes exactly \(x\) minutes, whereas on the blue line (the cdf), you’ll find the probability that the drive-thru takes less than or equal to \(x\) minutes.

Code
plot_ly(
  x = x_vec,
  y = 1 + cdf_x(x_vec),
  type = "scatter",
  mode="lines+markers",
  name = "CDF: F(a)") %>% 
  add_trace(
    x = x_vec,
    y = P_x(x_vec),
    type = "scatter",
    mode="lines+markers",
    name = "PDF: P(x)"
)

Modeling multiple waiting times

At Sydney’s Restaurant, customers must wait an average of 15 minutes for a table. From the time they are seated until they have finished their meal requires an additional 40 minutes, on average. What is the probability that a customer spends less than an hour and a half at the diner, assuming that waiting for a table and completing the meal are independent events?

(Strang and Herman 2016, chap. 5.2)

Since we are given two random variables and asked to find the probability of an event where something specific happens to each one, we can conceive of this problem using a joint density function. We are given that these two waiting times are independent (not correlated), so we can model joint density as follows:

The variables \(X\) and \(Y\) are said to be independent random variables if their joint density function is the product of their individual density functions: \[ f(x, y) = f_1(x)f_2(y) \]

(Strang and Herman 2016, chap. 5.2)

The two functions given in the problem are:

  1. an exponential density function with mean 15 (waiting for the table)

\[ f_1(x) = \begin{cases} \frac{1}{15} e^{-\frac{x}{15}} \quad &x \geq 0 \\ 0 \quad &x < 0 \end{cases} \]

  1. an exponential density function with mean 40 (eating at the table)

\[ f_2(y) = \begin{cases} \frac{1}{40} e^{-\frac{y}{40}} \quad &y \geq 0 \\ 0 \quad &y < 0 \end{cases} \]

This gives us the joint density function:

\[ \begin{align*} f(x, y) &= f_1(x)f_2(y) \\ &= \left[\begin{cases} \frac{1}{40} e^{-\frac{x}{40}} \quad &x \geq 0 \\ 0 \quad &x < 0 \end{cases} \right] \times \left[ \begin{cases} \frac{1}{15} e^{-\frac{y}{15}} \quad &y \geq 0 \\ 0 \quad &y < 0 \end{cases} \right] \\ &= \begin{cases} \frac{1}{600} e^{-\frac{x}{40}}e^{-\frac{y}{15}} \quad &x,y \geq 0 \\ 0 \quad &x < 0 \,\text{or}\, y < 0 \end{cases} \end{align*} \tag{2}\]

We can set that up in code as well

pdf_function <- function(x, y) {
  table_wait <- P_x(x, 15)
  eating_time <- P_x(y, 40)
  total_time <- table_wait * eating_time
  return(total_time)
}

This function allows us to find the probability of our table wait + eating time being a specific amount of minutes. For example, what’s the probability that it takes us 15 minutes to secure a table and 40 minutes to eat?

pdf_function(15, 40)
[1] 0.0002255588

This is a really small number, which makes some level of intuitive sense – it seems remote that the whole restaurant experience will take exactly 55 minutes, no more no less.

Now, returning to the question – we want to find the probability that the whole restaurant experience (waiting for a table and eating) takes an hour and a half (90 minutes) or less. Again, we’ll need to add up all of the probabilities from 0 to 90 minutes, implying that we will solve this with an integral. However, we are in a multivariate setting, so this will require multiple integration.

To set up the problem, we’ll plot the joint density function, the \(xy\) plane, and this 90 minute constraint.

Code
scene <- list(
  camera = list(eye = list(
    x = 2, y = 1.1, z = 1.2
  )),
  xaxis = list(title = "X"),
  yaxis = list(title = "Y"),
  zaxis = list(title = "P(X,Y)")
)
# Define range for x and y
x_range <- seq(0, 50, length.out = 100)
y_range <- seq(0, 50, length.out = 100)
xy_grid <- expand.grid(x = x_range, y = y_range)

z_values <- pdf_function(xy_grid$x, xy_grid$y)
z_plane <- rep(0, nrow(xy_grid))

# x y constraint
x_range_constraint <- seq(40, 50, length.out = 100)
y_constraint <- 90 - x_range_constraint
xy_constraint_grid <- expand.grid(x = x_range_constraint, y = y_constraint)
z_constraint <- rep(0, nrow(xy_grid))


# Reshape z values to create a matrix for plotting
z_matrix <- matrix(z_values, nrow = length(x_range), ncol = length(y_range))
z_constraint_matrix <- matrix(z_constraint, nrow = length(x_range), ncol = length(y_constraint))
z_plane_matrix <- matrix(z_plane, nrow = length(x_range), ncol = length(y_range))


# Create 3D plot using plot_ly
plot_ly(x = x_range, y = y_range, z = z_plane_matrix, type = "surface") %>% hide_colorbar() %>%
  add_trace(
    x = x_range,
    y = y_range,
    z = z_matrix,
    type = "surface",
    colorscale = "Cividis",
    colorbar = list(title = "P(X,Y)")
  ) %>%
  add_trace(
    x = x_range_constraint,
    y = y_constraint,
    z = 0,
    type = "scatter3d",
    name = "X + Y <= 90",
    mode="lines+markers"
  ) %>% layout(scene = scene)

Let’s first define a region of integration, \(D\). We are given \(x+y \leq 90\), and given that we are talking about minutes, which are bounded \([0, \infty)\), we can make that \(0 \leq x+y \leq 90\). With some algebraic manipulation, we can define the region as:
\[ D = \{(x,y) \,|\, 0 \leq x \leq90, 0 \leq y \leq 90-x \} \]

Setting that up as a definite integral, we get the following:
\[ P(X+Y \leq 90) = P( (X,Y) \in D) = \iint_D f(x,y) dA \]

Given that there is no density for this function when \(x\) or \(y\) is less than zero, we can ignore the piece wise structure of the joint pdf and set up the integral using only the portion that defines the function on the domain \(x \in [0, \infty)\). This gives us:

\[ \begin{align*} &= \int_0^{90} \int_0^{90-x} \frac{1}{600} e^{-\frac{x}{40}}e^{-\frac{y}{15}} dydx \\ &= \frac{1}{600} \int_0^{90} \int_0^{90-x} e^{-\frac{x}{40}-\frac{y}{15}} dydx \end{align*} \]

Solving by hand

Computing this integral is difficult, but we’ll walk through it. We will again use \(u\) substitution.

\[ \begin{align*} u &= -\frac{x}{40}-\frac{y}{15} \\ \frac{du}{dy} &= -\frac{1}{15} \rightarrow dy=-15du\\ \end{align*} \] We can plug that in to finish the first integral:
\[ \begin{align*} &= \frac{1}{600}\int_0^{90} \int_0^{90-x} e^{u} (-15)dudx \\ &= \frac{-15}{600}\int_0^{90} \int_0^{90-x} e^{u} dudx \\ &= \frac{-15}{600}\int_0^{90} e^{u} \Big|_0^{90-x} dx \\ &= \frac{-15}{600}\int_0^{90} e^{-\frac{x}{40}-\frac{y}{15}} \Big|_0^{90-x} dx \\ &= \frac{-15}{600}\int_0^{90} e^{-\frac{x}{40}-\frac{90-x}{15}} - e^{-\frac{x}{40}-\frac{0}{15}} dx \\ &= \frac{-15}{600}\int_0^{90} e^{-\frac{3x}{120}-\frac{8(90-x)}{120}} - e^{-\frac{x}{40}}dx \\ &= \frac{-15}{600}\int_0^{90} e^{\frac{-3x-720+8x}{120}} - e^{-\frac{x}{40}}dx \\ &= \frac{-15}{600}\int_0^{90} e^{\frac{5x-720}{120}} - e^{-\frac{x}{40}}dx \\ &= \frac{-15}{600}\int_0^{90} e^{\frac{x}{24}-6} - e^{-\frac{x}{40}}dx \end{align*} \]

That was pretty difficult! Now we can continue on to the second integral and split it into two:

\[ \begin{align*} &= \frac{-15}{600} \left[ \int_0^{90} e^{\frac{x}{24}-6}dx - \int_0^{90} e^{-\frac{x}{40}}dx \right] \end{align*} \]

We’ll solve these integrals with \(u\) and \(v\) substitution, with our terms defined as follows:

\[ \begin{align*} u &= \frac{x}{24}-6 \\ \frac{du}{dx} &= \frac{1}{24} \rightarrow dx=24du\\ v &= -\frac{x}{40} \\ \frac{dv}{dx} &= -\frac{1}{40} \rightarrow dx=-40dv\\ \end{align*} \]
When we plug these terms back in, we can finally compute the definite integral.

\[ \begin{align*} &= \frac{-15}{600} \left[ 24 \int_0^{90} e^{u}du + 40\int_0^{90} e^v dv \right] \\ &= \frac{-15}{600} \left[ 24 e^{u}\Big|_0^{90} + 40 e^v \Big|_0^{90} \right] \\ &= \frac{-15}{600} \left[ 24 e^{\frac{x}{24}-6}\Big|_0^{90} + 40 e^{-\frac{x}{40}}\Big|_0^{90} \right] \\ &= \frac{-15}{600} \left[24(e^{\frac{90}{24}-6} - e^{\frac{0}{24}-6}) + 40(e^{-\frac{90}{40}}-e^{-\frac{0}{40}}) \right] \\ &= \frac{-15}{600} \left[24(e^{-2.25} - e^{-6}) + 40(e^{-2.25}-1) \right] \\ &= \frac{-15}{600} \left[ -33.3139396802807 \right] \\ &= 0.832848492007017 \end{align*} \]

Thus, the probability that the total experience is 90 minutes or less is about 83%.

Solving with sympy

Computing that double integral by hand provides us with a lot of opportunities to make a small error and break the whole computation. In practice, we can solve problems like this in a much faster and less error-prone manner using symbolic computation software. We’ll again switch to python and use sympy.

Returning to the problem,

\[ P(X+Y \leq 90) = \frac{1}{600} \int_0^{90} \int_0^{90-x} e^{-\frac{x}{40}-\frac{y}{15}} dydx \]
We’ll first define the function that we are integrating over.

python
# Define the variables
x, y, = sp.symbols("x y")
# Define the function
f_x_y = e ** (-x/40 - y/15)
render(f_x_y)

\[\begin{equation}e^{- \frac{x}{40} - \frac{y}{15}}\end{equation}\]

Now, we’ll compute the entire double integral in one line. Note that we chain the .integrate() method, each time over a different variable, to perform the multiple integration.

python
(1/600) * (
  f_x_y
  .integrate( (y, 0, 90-x) )
  .integrate( (x, 0, 90) )
  .evalf()
  )
0.832848492007017

We confirm that this is the same answer we arrived at by hand.

Deriving a generalizable approach

Let’s look at another problem, but now, instead of solving just one specific problem, we’ll derive an equation that can easily solve any question structured like this so that don’t have to do so many integrals…

[…] At a drive-thru restaurant, customers spend, on average, 3 minutes placing their orders and an additional 5 minutes paying for and picking up their meals. Assume that placing the order and paying for/picking up the meal are two independent events \(X\) and \(Y\). If the waiting times are modeled by the exponential probability densities. Find \(P[X+Y \leq 6]\) and interpret the result.

(Strang and Herman 2016, chap. 5.2) - Question 119

We’ll again set up the joint density function: \[ \begin{align*} f(x, y) &= f_1(x)f_2(y) \\ &= \left[\begin{cases} \frac{1}{3} e^{-\frac{x}{3}} \quad &x \geq 0 \\ 0 \quad &x < 0 \end{cases} \right] \times \left[ \begin{cases} \frac{1}{5} e^{-\frac{y}{5}} \quad &y \geq 0 \\ 0 \quad &y < 0 \end{cases} \right] \\ &= \begin{cases} \frac{1}{15} e^{-\frac{x}{3}}e^{-\frac{y}{5}} \quad &x,y \geq 0 \\ 0 \quad &x < 0 \,\text{or}\, y < 0 \end{cases} \end{align*} \] and the region of integration appropriate for answering the question:
\[ D = \{(x,y) \,|\, 0 \leq x \leq6, 0 \leq y \leq 6-x \} \] This leaves us with the integral: \[ \begin{align*} \frac{1}{15} \int_0^{6} \int_0^{6-x} e^{-\frac{x}{3}-\frac{y}{5}} dydx \end{align*} \]

However, rather than solve this specifically, lets take the opportunity to derive a more general solution that can give us the target probability for any question phrased like this.

We’ll replace all the specific values:

  • 3, which is the average amount of minutes in the drive-thru, with \(\beta_1\),
  • 5, which is the average amount of minutes spent ordering, with \(\beta_2\), and
  • 6, which is the maximum number of minutes for the whole experience, with \(c\).

That gives us: \[ \begin{align*} &= \frac{1}{\beta_1\beta_2} \int_0^{c} \int_0^{c-x} e^{-\frac{x}{\beta_1}-\frac{y}{\beta_2}} dydx \end{align*} \]

Solving by hand

Let’s walk through one more painful double integral by-hand. We’ll start with \(u\) substitution:
\[ \begin{align*} u &= -\frac{x}{\beta_1}-\frac{y}{\beta_2} \\ \frac{du}{dy} &= -\frac{1}{\beta_2} \rightarrow dy=-\beta_2du\\ \end{align*} \]

And plug those in to solve the first definite integral:

\[ \begin{align*} &= \frac{-1}{\beta_1} \int_0^{c} \int_0^{c-x} e^{u} dudx \\ &= \frac{-1}{\beta_1}\int_0^{c} e^{-\frac{x}{\beta_1}-\frac{y}{\beta_2}} \Big|_0^{c-x} dx \\ &= \frac{-1}{\beta_1}\int_0^{c} e^{-\frac{x}{\beta_1}-\frac{c-x}{\beta_2}} - e^{-\frac{x}{\beta_1}-\frac{0}{\beta_2}} dx \\ &= \frac{-1}{\beta_1}\int_0^{c} e^{-\frac{\beta_2x}{\beta_1\beta_2}-\frac{\beta_1(c-x)}{\beta_1\beta_2}} - e^{-\frac{x}{\beta_1}}dx \\ &= \frac{-1}{\beta_1}\int_0^{c} e^{\frac{-\beta_2x - \beta_1c+\beta_1x}{\beta_1\beta_2}} - e^{-\frac{x}{\beta_1}}dx \\ &= \frac{-1}{\beta_1} \left[ \int_0^{c} e^{\frac{(\beta_1-\beta_2)x - \beta_1c}{\beta_1\beta_2}}dx - \int_0^{c} e^{-\frac{x}{\beta_1}}dx \right] \\ \end{align*} \] Now we’ll use \(u\) and \(v\) substitution to solve the last definite integral:

\[ \begin{align*} u &= \frac{(\beta_1-\beta_2)x - \beta_1c}{\beta_1\beta_2} \\ \frac{du}{dx} &= \frac{(\beta_1-\beta_2)}{\beta_1\beta_2} \rightarrow dx=\frac{\beta_1\beta_2}{(\beta_1-\beta_2)}du\\ v &= -\frac{x}{\beta_1} \\ \frac{dv}{dx} &= -\frac{1}{\beta_1} \rightarrow dx=-\beta_1dv\\ \end{align*} \]

Plugging those in, we arrive at a general function of \(\beta_1\), \(\beta_2\), and \(c\).

\[ \begin{align*} &= \frac{-1}{\beta_1} \left[ \int_0^{c} e^{u}\frac{\beta_1\beta_2}{(\beta_1-\beta_2)}du - \int_0^{c} e^{v}(-\beta_1dv) \right] \\ &= \frac{-1}{\beta_1} \left[ \frac{\beta_1\beta_2}{(\beta_1-\beta_2)}\int_0^{c} e^{u}du - (-\beta_1)\int_0^{c} e^{v}dv \right] \\ &= \frac{-1}{\beta_1} \left[ \frac{\beta_1\beta_2}{(\beta_1-\beta_2)} (e^{\frac{(\beta_1-\beta_2)x - \beta_1c}{\beta_1\beta_2}}\Big|_0^{c}) + \beta_1 (e^{-\frac{x}{\beta_1}}\Big|_0^{c}) \right] \\ &=\left[ \frac{-\beta_2}{(\beta_1-\beta_2)} (e^{\frac{(\beta_1-\beta_2)c - \beta_1c}{\beta_1\beta_2}} - e^{\frac{-\beta_1c}{\beta_1\beta_2}}) - (e^{-\frac{c}{\beta_1}} - e^{-\frac{0}{\beta_1}}) \right] \\ &= \frac{-\beta_2}{(\beta_1-\beta_2)} \left[e^{\frac{(\beta_1-\beta_2)c - \beta_1c}{\beta_1\beta_2}} - e^{\frac{-\beta_1c}{\beta_1\beta_2}}\right] - e^{-\frac{c}{\beta_1}} + 1 \\ f(\beta_1, \beta_2, c)&= \frac{-\beta_2}{(\beta_1-\beta_2)} \left[e^{\frac{-c}{\beta_1}} - e^{\frac{-c}{\beta_2}}\right] - e^{-\frac{c}{\beta_1}} + 1 \\ \end{align*} \]

Now we’ll code that equation up in R or the calculator of your choice.

cdf_wait <- function(beta1, beta2, c) {
  (-beta2/(beta1-beta2)) * 
  (exp((-c)/beta1) - exp((-c)/beta2)) -
  exp((-c)/beta1) + 1 
}

Going back to our question – let’s see what’s the probability that the drive-thru experience takes 6 minutes or less if waiting to order takes 3 minutes on average and ordering takes 5 minutes on average. We can very easily calculate it with the flexible function \(f(\beta_1, \beta_2, c)\) that we just derived:

cdf_wait(3, 5, 6)
[1] 0.4500174

There’s a ~45% chance that a customer will spend 6 minutes or less in the drive-thru. We can also use our function, \(f(\beta_1, \beta_2, c)\), to easily compute and visualize the probability for any total drive-thru time given these two average times.

Code
plot_ly(
  x = x_vec,
  y = cdf_wait(3, 5, x_vec),
  type = "scatter",
  mode="lines+markers",
  name = "CDF: F(a)")

And, going back to the first multiple integration question about Sydney’s restaurant Equation 2, we can now answer that with the same function.

cdf_wait(15, 40, 90)
[1] 0.8328485

We again can find the cumulative probability for any waiting time given the average time to get a table (15) and eat (40).

Code
plot_ly(
  x = seq(0, 150),
  y = cdf_wait(15, 40, seq(0, 150)),
  type = "scatter",
  mode="lines+markers",
  name = "CDF: F(a)")

Solving with sympy

For the sake of completeness, we’ll derive that same generizable equation using sympy. Returning to the problem,

\[ \begin{align*} P(X + Y \leq x) &= \frac{1}{\beta_1\beta_2} \int_0^{c} \int_0^{c-x} e^{-\frac{x}{\beta_1}-\frac{y}{\beta_2}} dydx \end{align*} \]

we’ll define the constants and function in sympy.

python
x, y = sp.symbols('x y')
beta1, beta2, c = sp.symbols('beta1 beta2 c')
f_x_y_general = e**(-x/beta1 - y/beta2)
render(f_x_y_general)

\[\begin{equation}e^{- \frac{y}{\beta_{2}} - \frac{x}{\beta_{1}}}\end{equation}\]

We compute the double integral, and define our generalizable function as f_b1_b2_c.

python
f_b1_b2_c = ( 1/(beta1 * beta2) ) * (
  f_x_y_general
  .integrate( (y, 0, c - x) )
  .integrate( (x, 0, c) )
)
render(f_b1_b2_c.simplify())

\[\begin{equation}\begin{cases} \frac{\beta_{1} - \beta_{1} e^{- \frac{c}{\beta_{1}}} - \beta_{2} + \beta_{2} e^{- \frac{c}{\beta_{2}}}}{\beta_{1} - \beta_{2}} & \text{for}\: \beta_{1} \neq \beta_{2} \\1 - e^{- \frac{c}{\beta_{1}}} - \frac{c}{\beta_{1}} & \text{otherwise} \end{cases}\end{equation}\]

Note that for more complex equations like this, sympy’s simplified output might not look like what you’d reach by hand. We can be more confident that this is indeed the same \(f(\beta_1, \beta_2, c)\) as the one we computed by hand above by testing it with a few problems and confirming the expected output.

Returning to the drive-thru experience, where waiting to order takes 3 minutes on average and ordering takes 5, we can compute the probability that the whole experience takes 6 minutes or less as follows:

python
f_b1_b2_c.subs({beta1:  3, beta2: 5, c:6}).evalf()
0.450017395074414

This is the same as what we get using the by-hand function.

cdf_wait(3, 5, 6)
[1] 0.4500174

In the case of the first problem, with a restaurant where ordering takes 40 minutes on average and eating takes 15, we can compute the probability that the whole experience takes 90 minutes or less just as easily:

python
f_b1_b2_c.subs({beta1:  40, beta2: 15, c:90}).evalf()
0.832848492007017

Again, let’s check this with the by-hand version.

cdf_wait(15, 40, 90)
[1] 0.8328485

Great!

References

“Cumulative Distribution Function.” 2024. Wikipedia. https://en.wikipedia.org/w/index.php?title=Cumulative_distribution_function&oldid=1215839028.
Strang, Gilbert, and Edwin Herman. 2016. Calculus Volume 3. OpenStax.

Footnotes

  1. This is not strictly necessary – I do it purely so that I can reference e in my code and match the problem notation↩︎

Citation

BibTeX citation:
@online{amerkhanian2024,
  author = {Amerkhanian, Peter},
  title = {Integrals in {Probability}},
  date = {2024-04-14},
  url = {https://peter-amerkhanian.com/posts/multiple-integration/},
  langid = {en}
}
For attribution, please cite this work as:
Amerkhanian, Peter. 2024. “Integrals in Probability.” April 14, 2024. https://peter-amerkhanian.com/posts/multiple-integration/.