What's the smallest of n Rice-distributed variables?

2023-12-26 (feast of Stephen)

The Rice Distribution

Take a bunch of points distributed as in the figure*. * 1000 points randomly drawn from a 2D Gaussian with mean \(2,2\) and variance \(1\). (Mean is at the big red dot.) They're drawn from a 2D Gaussian with mean \((2,2)\) and variance 1; in general, we could have some mean \(\mathbf μ\) and variance \(σ^2\). The magnitudes of the points are drawn from the Rice distribution \[\begin{aligned} p(r) &= \int_{-\infty}^\infty dxdy\; \frac 1 {σ \sqrt{2π}} \exp\left[- \frac{1}{2σ^2} (x - μ_1)^2 + (y - μ_2)^2\right]\ δ(x^2 + y^2 - r^2) \\ &= \frac r {σ^2} \exp\left[ - \frac {r^2 + μ^2}{2σ^2}\right] I_0\left(\frac{rμ}{σ^2}\right) \end{aligned} \] where \(I_0\) is a modified Bessel function of the first kind. (True confessions: I haven't actually done the integral---I'm just copying Wikipedia---so I'm setting myself up for embarrassment.)

The small-\(r\) behavior \[ p(r) \propto r \] comes from the integration measure: for small enough \(r\), the Gaussian is just some constant, so all we see is the circumference of the circle at radius \(r\), which is \(\propto r\). To get the behavior more precisely, use that \[ \exp\left[ - \frac {r^2 + μ^2}{2σ^2}\right] = \exp\left[-\frac{μ^2}{2σ^2} \right]\left(1 - \frac{r^2}{2σ^2} + O(r^4/σ^4)\right)\;,\] which is familiar enough, and \[ \begin{aligned} I_0\left(\frac {rμ}{σ^2} \right) &= \sum_{m = 0}^\infty \frac 1 {(m!)^2} \left(\frac {rμ} {2σ^2} \right)^{2m} \\ &= 1 + \frac {r^2μ^2}{16 σ^4} + O(r^4 μ^4/σ^8) \end{aligned} \] for \[ p(r) = r \cdot \frac 1 {σ^2}\exp\left[-\frac{μ^2}{2σ^2} \right] + O\left(r^3 \cdot \text{[various combinations of $σ,μ$]}\right) \;.\]

In principle we need the CDF, which is given by something called the Marcom Q-function. I don't know how to evaluate that, though, so I'm going to use a combination of estimate-by-sampling and the expansion \[ F(r;μ,σ) = r^2 \cdot \frac 1 {2σ^2}\exp\left[-\frac{μ^2}{2σ^2} \right] + O\left(r^4 \cdot \text{[various combinations of $σ,μ$]}\right) \] which comes from integrating up the small-\(r\) behavior of \(p(r)\). Of course this is really \[ F(r;μ,σ) \approx \frac 1 2 r^2 \cdot C\;, \] with a constant of proportionality \[ C = \frac 1 {σ^2}\exp\left[-\frac{μ^2}{2σ^2} \right] \;:\] \(F\) is quadratic in \(r\), but with a prefactor that's exponentially small in \(μ^2/σ^2\). So if \(σ \lesssim μ\) we're going to see very little weight near the origin.

Why care about extreme values of the Rice distribution? Valley splitting in silicon spin qubits

The Rice distribution comes up in certain behind-the-scenes properties of silicon spin qubits.**I've spent most of my time thinking about so-called Loss-DiVincenzo qubits in Si/SiGe, and that's what I'll be talking about here. But I assume valley space will also be important for other kinds of qubit. The recent-ish review by Burkard, Ladd, Nichol, Pan, and Petta offers a pretty good introduction; the somewhat older review of Zwanenburg et al. has a bunch of details about material properties etc., but I found rather harder to read. To build a silicon spin qubit, you first confine an electron in a thin****Really thin: think 5-10 nm. layer of pure silicon, sandwiched between other semiconductors, and with a bunch of supporting infrastructure on top. The thin silicon layer provides a big 2D arena for the electron to run around in; the supporting infrastructure confines it to a little area, say 100nm \(\times\) 100nm, of that arena. The electron's spin degree of freedom holds the qubit; you manipulate it with magnetic fields, microwaves, etc.

But the electron has another degree freedom, the valley degree of freedom. The details aren't important here; what matters is that it's a 2D degree of freedom with a Hamiltonian like \[ H_{\text{valley}} = Δ τ^+ + h.c. \;,\] where \(τ^{x,y,z,\pm}\) are the Pauli matrices in valley space and \(Δ \in \mathbb C\). For symmetry reasons**Long story. there's no \(τ^z\) term, so this lives in the xy plane; a quick calculation gives that the gap is \[ V = |Δ| \;.\] As long as the gap is big compared to the temperature, life is good: you're pretty much automatically in the valley ground state, so you don't have to worry about it. But if the valley is small compared to the temperature, your dot will go bad, because your electron has an appreciable probability of being in the valley excited state.* *I'm intentionally not addressing why it's bad to be in a mixture of valley ground and valley excited states—that's pretty far afield of the topic of this note. And in fact the valley gap displays a pretty good amount of variability.

A very cool paper of Merritt Losert and collaborators offers a physical picture for where the variability in the valley gap comes from, and how to model it. They argue that the matrix element \(Δ\) is the sum of a deterministic contibution and a random contribution. The random contribution is complex and central limiting, so the whole \(Δ\) is a complex normal distribution, and the valley gap \(Δ\) is given by the Rice distribution.

Now suppose you've got some largeish number of dots (say \(n = 100\)) on your chip. Presumably you want them all to work, so you care about the probability that any one has a small valley gap. That's given by the probability tat the smallest of \(n = 100\) Rice-distributed variables, the valley gaps of each dot, are over some threshold.

Smallest of \(n\) Rice-distributed variables*

*I'm running a pretty standard extreme value theory playbook here; I learned this stuff from de Haan and Ferreira, Extreme Value Theory: An Introduction. That's the thing people tend to cite, but I'm not delighted with the exposition. I just wish I knew something better.

Our probability distribution has a CDF \(F(r)\). The probability that one variable is greater than some \(x\) is of course \[ 1 - F(x) \;,\] so the probability that all \(n\) are above \(x\) is** ** Probability that all of \(n = 100\) Rice-distributed variables are above \(x\), for \(σ = 1, μ = (2,2)\). Blue: many (\(N = 10^6\) ) sample sets. Black: small-\(F\) approximation \(e^{-\frac 1 2 n C x^2}\). \[ p_{\text{good}}(x) = (1 - F(x))^n\;.\] It's easier to work in log space: \[ \begin{aligned} \ln p_{\text{good}}(x) &= n \ln (1 - F(x))\\ &\approx -nF(x)\;. \end{aligned} \] (The Taylor series is only good for \(F(x) \ll 1\), but that's the only regime we actually care about. For \(F(x) \gtrsim 1/n \ll 1\), failure is almost certain.) In our case \(F(x) = \frac 1 2 Cx^2\), so we have \[ p_{\text{good}}(x) \approx e^{-\frac 1 2 nCx^2}\] with \( C = \frac 1 {σ^2}\exp\left[-\frac{μ^2}{2σ^2} \right]\). This is really a CDF, at least morally: it's the probability that all the the \(n\) variables are above \(x\).

The figure** shows this small-\(F\) approximation, together with data from a bunch of trials. The small-F approx. is worse than I expected actually; we can do better by a Taylor series expansion + sanity check \[ p_{\text{good}} \approx \max\left[0, 1 - \frac 1 2 n C x^2\right]\;.\]

The probability density that one of the variables is at \(x\) is \[ p_{\text{smallest}}(x) \approx - \partial_x p_{\text{good}}(x) = nCx e^{-\frac 1 2 nCx^2} \;.\] This is a Weibull distribution, so in principle we could ring the changes: look up the mean, variance, etc. But comparing this to data,*** *** Probability that the smallest of \(n = 100\) Rice-distributed variables is at \(x\), for \(σ = 1, μ = (2,2)\). Blue: many (\(N = 10^6\) ) sample sets. Black: small-\(F\) approximation \(e^{-\frac 1 2 n C x^2}\). you can (again) see that the Weibull PDF doesn't look so much like the data. It agrees for small \(x\), but we're actually better off with the leading-order Taylor series approx. \[p_{\text{smallest}}(x) \approx nCx \;. \] I wish I knew what is going on here.