\(t\) | \(\mathbb{P}{\left[W_{\mathcal{T}} \in [t]\right]}\) |
---|
\(\mathbb{E}{\left[L_{\mathcal{T}}\right]} \approx\) -
\(t\) | \(\mathbb{E}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t]\right]}\) |
---|
\(\mathrm{Var}{\left[L_{\mathcal{T}}\right]} \approx\) -
\(t\) | \(\mathrm{Var}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t]\right]}\) |
---|
\(t\) | \(\mathbb{P}{\left[W_{\mathcal{T}} \in [t]\right]}\) |
---|
\(\mathbb{E}{\left[L_{\mathcal{T}}\right]} = \) -
\(t\) | \(\mathbb{E}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t]\right]}\) |
---|
\(\mathrm{Var}{\left[L_{\mathcal{T}}\right]} = \) -
\(t\) | \(\mathrm{Var}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t]\right]}\) |
---|
We first generate \(n\) samples of \(W_{\mathcal{T}}\), which we collect in a multiset \(W\): \[W = \left\{W_1, W_2, \dots, W_n\right\}.\] We then group these samples by the terminator \(t \in \mathcal{T}\) that they end with. This gives us several multi-subsets, which we call \(S_t\): \[S_t = \left\{ W_i \mid W_i \in [t] \right\}.\]
The unbiased point estimator \(\hat p\) for the population proportion \(p\) is given by \(\hat p = x/n\), where \(x\) is the number of successes in the sample and \(n\) is the sample size. In our case, \(x\) is simply \(\lvert S_t \rvert\), the number of words that terminate with \(t\), so \[\mathbb{P}{\left[W_{\mathcal{T}} \in [t]\right]} \approx \frac{\lvert S_t \rvert}{n}.\]
The unbiased point estimate for the population mean is simply the sample mean. We hence have \[\mathbb{E}{\left[L_{\mathcal{T}}\right]} \approx \frac{1}{n} \sum_{w \in W} \lvert w \rvert.\] The formula for conditional expectations follows similarly: \[\mathbb{E}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t]\right]} \approx \frac{1}{\lvert S_t \rvert} \sum_{w \in S_t} \lvert w \rvert.\]
Using the unbiased point estimate formula for the population variance, we have \[\mathrm{Var}{\left[L_{\mathcal{T}}\right]} \approx \frac{1}{n-1} \sum_{w \in W} \left(\lvert w \rvert - \mathbb{E}{\left[L_{\mathcal{T}}\right]}\right)^2\] Similarly, \[\mathrm{Var}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t]\right]} \approx \frac{1}{n-1} \sum_{w \in S_t} \left(\lvert w \rvert - \mathbb{E}{\left[L_{\mathcal{T}} \mid W_{\mathcal T} \in [t]\right]}\right)^2\]
Let \(\mathcal{T} = \left\{t_1, t_2, \dots, t_n\right\}\) be our terminators and \(\mathcal{A}\) be our alphabet.
As defined in the workshop notes (Definition 21), the correlation matrix \(\mathbf M_{\mathcal T} = (m)_{i,j}\) has entries given by \[(m)_{i,j} = (t_j : t_i)_{\mathcal A} = \sum_{k = 1}^{\min \left\{\lvert t_i \rvert, \lvert t_j \rvert \right\}} \lvert \mathcal{A}\rvert^k \, \mathbf{1}{\left\{R_k(t_j) = L_k(t_i)\right\}},\] which we can easily compute.
As derived in the workshop notes (Theorem 26), the probability vector \(\mathbf p_{\mathcal T}\) has formula \[\mathbf{p}_{\mathcal T} = \frac{1}{\mathbf{1}^{\mathsf{T}} \mathbf{M}_{\mathcal T}^{-1} \mathbf{1}} \mathbf{M}_{\mathcal T}^{-1} \mathbf{1},\] where \(\mathbf{1}\) is the \(n \times 1\) vector that is all ones.
As derived in the workshop notes, the expected length is given by \[\mathbb{E}{\left[L_{\mathcal{T}}\right]} = \frac{1}{\mathbf{1}^{\mathsf{T}} \mathbf{M}_{\mathcal T}^{-1} \mathbf{1}}.\]
Unfortunately, we were unable to find a closed form for the conditional expectation \(\mathbb{E}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t]\right]}\) using a martingale argument. However, not all hope is lost. Theorem 2.1 of Guibas and Odylzko's 1981 paper gives us a viable (albeit clumsy) approach towards computing conditional expecations.
First, define \(f(m)\) to be the number of words of length \(m\) that do not contain any terminators. Define also \(f_i(m)\) to be the number of words of length \(m\) that terminate with \(t_i\). Now, define \(F(z)\) and \(F_i(z)\) to be the generating functions of \(f(m)\) and \(f_i(m)\) respectively: \[F(z) = \sum_{m = 0}^\infty f(m) z^{-m}, \quad F_i(z) = \sum_{m = 0}^\infty f_i(m) z^{-m}.\] Then Theorem 2.1 of the aforementioned paper states that \(F(z)\) and \(F_i(z)\) obey the following system of equations: \[\begin{aligned}(z-\lvert \mathcal{A} \rvert) \, F(z) + z F_1(z) + zF_2(z) + \dots + zF_n(z) &= z \\ -F(z) + (t_1:t_1)_z F_1(z) + (t_2:t_1)_z F_2(z) + \dots + (t_n:t_1)_z F_n(z) &= 0 \\ -F(z) + (t_1:t_2)_z F_1(z) + (t_2:t_2)_z F_2(z) + \dots + (t_n:t_2)_z F_n(z) &= 0 \\ \vdots \\ -F(z) + (t_1:t_n)_z F_1(z) + (t_2:t_n)_z F_2(z) + \dots + (t_n:t_n)_z F_n(z) &= 0 \end{aligned}\] This can be rewritten as the matrix equation \[ \begin{pmatrix}z - \lvert \mathcal{A} \rvert & z & z & \cdots & z \\ -1 & (t_1:t_1)_z & (t_2:t_1)_z & \cdots & (t_n:t_1)_z \\ -1 & (t_1:t_2)_z & (t_2:t_2)_z & \cdots & (t_n:t_2)_z \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ -1 & (t_1:t_n)_z & (t_2:t_n)_z & \cdots & (t_n:t_n)_z \end{pmatrix} \begin{pmatrix}F(z) \\ F_1(z) \\ F_2(z) \\ \vdots \\ F_n(z) \end{pmatrix} = \begin{pmatrix} z \\ 0 \\ 0 \\ \vdots \\ 0 \end{pmatrix}. \] Let the matrix on the left be \(\mathbf A \). Rearranging, we get \[\begin{pmatrix}F(z) \\ F_1(z) \\ F_2(z) \\ \vdots \\ F_n(z) \end{pmatrix} = \mathbf A^{-1} \begin{pmatrix} z \\ 0 \\ 0 \\ \vdots \\ 0 \end{pmatrix}. \] Let \(\mathbf C = (c)_{i,j}\) be the cofactor matrix associated with \(\mathbf A\). From the above equation, one obtains \[\begin{pmatrix}F(z) \\ F_1(z) \\ F_2(z) \\ \vdots \\ F_n(z) \end{pmatrix} = \frac{1}{\text{det}{(\mathbf{A})}} \mathbf C^{\mathsf T} \begin{pmatrix} z \\ 0 \\ 0 \\ \vdots \\ 0 \end{pmatrix} = \frac{1}{\text{det}{(\mathbf{A})}} \begin{pmatrix}z c_{1,1} \\ z c_{1,2} \\ z c_{1, 3} \\ \vdots \\ z c_{1, n+1} \end{pmatrix},\] which gives \[F_i(z) = \frac{z c_{1, i+1}}{\text{det}{(\mathbf{A})}}.\] We now link \(F_i(z)\) to the conditional expectation. Using the definition of expecation and invoking the formula for conditional probability, one has \[\mathbb{E}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t_i]\right]} = \sum_{m = 0}^\infty m \frac{\mathbb{P}{\left[L_{\mathcal T} = m \land W_{\mathcal T} \in [t_i] \right]}}{\mathbb{P}{\left[W_{\mathcal T} \in [t_i] \right]}}.\] Observe that the numerator counts the proportion of words of length \(m\) that end in \(t_i\). Since there are \(f_i(m)\) such words, along with \(\lvert \mathcal{A} \rvert^m\) total words of length \(m\), the numerator is simply \(f_i(m) \, \lvert \mathcal{A} \rvert^{-m}\). Thus, \[\mathbb{E}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t_i]\right]} = \frac1{\mathbb{P}{\left[W_{\mathcal T} \in [t_i] \right]}} \sum_{m = 0}^\infty m f_i(m) \lvert \mathcal{A} \rvert^{-m}.\] Now notice that the RHS looks like the derivative of \(F_i(z)\) evaluated at \(\lvert \mathcal{A} \rvert\). Indeed, one can show that \[\sum_{m = 0}^\infty m f_i(m) \lvert \mathcal{A} \rvert^{-m} = -\lvert \mathcal{A} \rvert F_i'{\left(\lvert \mathcal{A} \rvert\right)}.\] Putting everything together, once we compute the cofactor \(c_{1, i+1}\), we can easily compute the conditional expectation with the following formula: \[\mathbb{E}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t_i]\right]} = \frac{-\lvert \mathcal{A} \rvert}{\mathbb{P}{\left[W_{\mathcal T} \in [t_i] \right]}} \left[\frac{\mathrm{d}}{\mathrm{d} z} \frac{z c_{1, i+1}}{\text{det}{(\mathbf{A})}} \right]_{z = \lvert \mathcal{A} \rvert}.\] Not as pretty and not as elegant as our martingale approach, but it works.
We begin by discussing our calculation of the conditional variance \(\mathrm{Var}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t]\right]}\). Recall that \(\mathrm{Var}\left[X\right] = \mathbb{E}\left[X^2\right] - \mathbb{E}\left[X\right]^2\). Using an almost identical argument as above, one can show that \[\mathbb{E}{\left[L_{\mathcal{T}}^2\mid W_{\mathcal{T}} \in [t_i]\right]} = \frac{\lvert \mathcal{A} \rvert^2 F_i''{\left(\lvert \mathcal{A} \rvert\right)} + \lvert \mathcal{A} \rvert F_i'{\left(\lvert \mathcal{A} \rvert\right)}}{\mathbb{P}{\left[W_{\mathcal T} \in [t_i] \right]}}.\] Hence, \[\mathrm{Var}{\left[L_{\mathcal{T}}\mid W_{\mathcal{T}} \in [t_i]\right]} = \frac{\lvert \mathcal{A} \rvert^2 F_i''{\left(\lvert \mathcal{A} \rvert\right)} + \lvert \mathcal{A} \rvert F_i'{\left(\lvert \mathcal{A} \rvert\right) - \lvert \mathcal{A} \rvert^2 \left[F_i'\left(\lvert \mathcal{A} \rvert\right)\right]^2}}{\mathbb{P}{\left[W_{\mathcal T} \in [t_i] \right]}}.\]
We now discuss the total variance \(\mathrm{Var}{\left[L_{\mathcal{T}}\right]}\). For simplicity, let \(W_i\) be the event \(W_{\mathcal T} \in [t_i]\). By the law of total variance, one has \[\mathrm{Var}{\left[L_{\mathcal T}\right]} = \mathbb{E}{\left[\mathrm{Var}{\left(L_{\mathcal T} \mid W_{\mathcal{T}}\right)}\right]} + \mathrm{Var}{\left[\mathbb{E}{\left(L_{\mathcal T} \mid W_{\mathcal{T}}\right)}\right]}.\] By the law of total expectation, the first term becomes \[\mathbb{E}{\left[\mathrm{Var}{\left(L_{\mathcal T} \mid W_{\mathcal{T}}\right)}\right]} = \sum_{i = 1}^n \mathbb{P}{\left[W_i\right]} \mathrm{Var}{\left[L_{\mathcal T} \mid W_i\right]},\] which we can calculate from previous parts. Meanwhile, the second term becomes \[\mathrm{Var}{\left[\mathbb{E}{\left(L_{\mathcal T} \mid W_{\mathcal{T}}\right)}\right]} = \mathbb{E}{\left[\mathbb{E}{\left(L_{\mathcal T} \mid W_{\mathcal{T}}\right)}^2\right]} - \mathbb{E}{\left[\mathbb{E}{\left(L_{\mathcal T} \mid W_{\mathcal{T}}\right)}\right]}^2.\] By the law of total expectation, the two terms can be written as \[\mathrm{Var}{\left[\mathbb{E}{\left(L_{\mathcal T} \mid W_{\mathcal{T}}\right)}\right]} = \sum_{i = 1}^n \mathbb{P}{\left[W_i\right]} \mathbb{E}{\left(L_{\mathcal T} \mid W_i\right)}^2 - \mathbb{E}{\left[L_{\mathcal T}\right]}^2,\] which we also know how to calculate from previous parts. By computing the two terms and adding them, we can find \(\mathrm{Var}{\left[L_{\mathcal{T}}\right]}\)!