Stephane Dartois

Stephane Dartois' website on Mathematics and Physics


On Kac-Rice formula

In a coming paper I will use Kac-Rice formula to study a question motivated by quantum information. So I decided it’d be a nice time to try to give a short account on the well known formula.

First a disclaimer, I am certainly not an expert on the specifics of the formula or the random geometry aspects that are tied to it. However, I’ve learnt a good amount of beautiful and enjoyable mathematics while working with it, so that I’d like to share a few very simple remarks about it.

What is Kac-Rice formula?

Let’s say \phi: \mathbb{R}^n\rightarrow \mathbb{R}^{n'} with n\ge n' is a (nice, smooth enough) process. Kac-Rice formula lets you access the average number of level crossings of this process. That is, for u \in \mathbb{R}^{n'}, and if one denotes N_u(\phi)=\#\{t \in \mathbb{R}^{n}: \phi(t)=u\} Kac-Rice formula states, when it is defined,

\mathbb{E}(N_u(\phi))=\int_{\mathbb{R}^{n}}\text{d}x\mathbb{E}[\lvert\det\partial_{x_i}\phi^j(x)\rvert:\phi(t)=u] \rho_{\nabla \phi(t)}(0).

\lvert\det \partial_{x_i}\phi^j(x)\rvert is the Jacobian and \rho_{\nabla \phi(t)} is the density of the gradient of \phi(t) evaluated at 0 (which needs to exist and be continuous around zero).

In fact, it is much more general (see the Book of Adler and Taylor « Random fields and geometry » for technical aspects and a wonderful amount of beautiful geometry and probability). Let \mathcal{M} be an n dimensional Riemannian manifold with metric g, let f:\mathcal{M}\rightarrow \mathbb{R}^{n'}, h:\mathcal{M}\rightarrow \mathbb{R}^k be two (nice and gentle enough) processes over \mathcal{M}. Set, for (Borel) B\subseteq \mathbb{R}^{n'}, N_u(f,h,B)=\#\{t\in \mathcal{M}: f(t)=u, h(t)\in B\}, so that in words N_u(f,h,B) is the number of level crossing by f given that h belongs to B. Well, then Kac-Rice formula on manifolds gives

\mathbb{E}\left(N_u(f,h,B)\right)=\int \text{d}\text{Vol}_g(t) \mathbb{E}[\lvert\text{Jac}_{\phi(t)} \rvert \mathbf{1}_{h(t)\in B}:f(t)=u]\rho_{\nabla f(t)}(0).

What is it used for?

Of course this questions has many answers. Certainly, a bit of generalities can’t hurt. The story started in the 40’s, with the work of M. Kac and S.O. Rice. Kac worked out the formula to study the number of real roots of a random polynomial and published his work in 1943. On the other hand Rice was interested in the « singing of transmission lines ».

A big part of nowadays use of Kac-Rice formula is to estimate the number of critical points of a random function, with the underlying idea that random functions are good representatives of complicated functions. For instance, the hamiltonian of a glassy system, being a disordered system, will typically be complicated to describe and have a lot of metastable states. Similarly the cost function used in machine learning (ML), will be complicated and have many local maxima, minima and low/high index critical points. Understanding the number of critical points at different value of the energy (for glassy systems) or of the cost (ML) can tell us whether their is a chance at optimizing the function in a reasonable amount of time (where optimizing will mean reaching equilibrium in the context of physical systems and finding the « best » model in ML).

Easy examples

Let’s have a look at how to use the Kac-Rice formula on simple examples. Some of those examples were discussed with Guillaume Dubach and Camille Male during an informal and improvised presentation of Kac-Rice formula.1

Law of critical points of random degree 2 polynomials: Consider the following simple but still illustrative example. Set P(x):=ax^2+bx +c the random polynomial whose coefficients a,b,c are independent standard normal (Gaussian, centered, with variance one). Here I ask what is the average number of critical points of P such that P(x) \in B\subseteq \mathbb{R}. There is an obvious way of doing that, which I do not discuss right now. I want to use Kac-Rice formula. So what I am interested in is really the number of 0-level crossing of the derivative P'(x)=2ax+b over some interval I. That is the count of points x \in I\subseteq \mathbb{R} such that P'(x)=0, P(x)\in B. Of course, if I, B=\mathbb{R}, the number of critical points is almost surely 1.

According to Kac-Rice formula we have \mathbb{E}\left(N_0(P',P,B)\right)=\int_{I\subseteq\mathbb{R}}\text{d}x \mathbb{E}[\lvert a\rvert\mathbf{1}_{P(x)\in B}:2ax+b=0]\rho_{P'(x)}(0). To compute the average we need the density of the derivative at zero. This is easily established by noticing that P'(x) is a Gaussian random variable with variance 4x^2+1.
We also need the law of (P(x),2a) conditioned to the random variable P'(x)=2ax+b. Thanks to the fact that both a,b are independent Gaussian random variables, this is easily obtained, by computing the covariance of (P(x), 2a, z(x)):
\Sigma=\begin{pmatrix}x^4+x^2+1& 2x^2 & 2x^3+x\\ 2x^2 & 4 & 4x\\2x^3+x&4x&4x^2+1\end{pmatrix}
and from there just computing the usual conditional covariance formula for (P(x),2a), which leads to
\Sigma_{(P(x),2a)|P'(x)}=\begin{pmatrix} \frac{x^4+4x^2+1}{4x^2+1} &-\frac{2x^2}{4x^2+1}\\-\frac{2x^2}{4x^2+1}&\frac{4}{4x^2+1} \end{pmatrix}.
Indeed, this is sufficient to determine the conditional law, as the conditioning only imposes linear relations between Gaussian random variables, so that the resulting conditional random variables are Gaussian and centered. So \mathbb{E}[\lvert P''(x)\rvert\mathbf{1}_{P(x)\in B}:P'(x)=0]=\int_{u\in P^{-1}(B),v \in \mathbb{R}}\text{d}u\text{d}v\sqrt{\frac{1+4x^2}{4(2\pi)^2}}\lvert v\rvert e^{-\frac12(u,v)\Sigma_{(P(x),2a)|P'(x)}^{-1}(u,v)^t}.
Simplifying the problem a bit, we now let B=\mathbb{R}. Doing so leaves us with
\mathbb{E}[\lvert P''(x)\rvert:P'(x)=0]=\frac12\sqrt{\frac{1+4x^2}{2\pi}}\int_{v \in \mathbb{R}}\text{d}v\lvert a\rvert e^{-\frac18(1+4x^2)v^2}.
Realizing that the density of the derivative at 0 is just \frac1{\sqrt{2\pi (1+4x^2)}}, and putting everything together, we obtain from Kac-Rice formula that
\mathbb{E}(N_0(P',P,\mathbb{R}))=\int_I\text{d}x\frac2{\pi}\frac1{1+4x^2}.
In fact, this says that the critical point of a degree two polynomial is distributed according to Cauchy’s law of location parameter 0 and scale parameter \frac12.

This can be easily confirmed by realizing that the critical point of P(x) is almost surely given by -b/2a, which is a ratio of two independent centered Gaussian random variables and is known to give a Cauchy random variable.

Eigenvalues of the Gaussian Orthogonal Ensemble: We now study another simple example. We want to recover the eigenvalue distribution of the GOE from the Kac-Rice integral. This is a good example as it is a baby version of an important use case for Kac-Rice: the spherical p-spin model. Our example is the (almost) trivial p=2 case.

First a reminder. Let A be a N\times N real symmetric matrix. Then Courant-Fisher theorem tells us that the eigenvalues of A are the values of the function f(x)=\langle x,Ax\rangle at its critical points, for x\in \mathbb{R}, \langle x, x\rangle=1. In fact since if x_c is a critical point, then -x_c is another and f(x_c)=f(-x_c), counting critical points of f will lead us to double count the eigenvalues (one way to avoid that would be to work on the projective space instead of the unit vector sphere).

Keeping the above remark in mind, we now consider A to be a $N\times N$ GOE matrix. Thas is elements A_{i\le j} are all independent, and A_{i<j}\sim N(0,\frac1{N}), \, A_{i,i}\sim N(0,\frac1{2N}).
According to Kac-Rice, up to a factor of 2, the average number of eigenvalues in the interval I_u:=(-\infty, u] is given by \int d\text{Vol}(x) \mathbb{E}\left( \lvert \det \text{Hess} f(x) \mathbf{1}_{f(x)\in I_u}:\nabla f(x)=0\right)\rho_{\nabla f(x)}(0).
A first remark is that \mathbb{E}\left( \lvert \det \text{Hess} f(x) \mathbf{1}_{f(x)\in I_u}:\nabla f(x)=0\right) will not depend on x as the matrices A \underset{law}{=}OAO^t, for any orthogonal transformation O. So we will compute this average at x=\mathbf{n}:=(1,0,\ldots,0) and \int d\text{Vol}(x) will just contribute the volume of the N-1 dimensional sphere.
As earlier, we need the joint law of (f(\mathbf{n}), \nabla f(\mathbf{n}), \text{Hess} f(\mathbf{n})). We compute the joint law in a chart around \mathbf{n} given by \phi(x=(x_1,\ldots, x_N))=(x_2,\ldots,x_N) using at our advantage that the covariant derivatives around \mathbf{n} coincide with the \partial_{x_i}. This allows us to compute correlation using the remark that \frac{\partial^k}{\partial x_{i_1}\ldots \partial x_{i_k}} \frac{\partial^l}{\partial y_{j_1}\ldots\partial y_{j_l}}\mathbb{E}(f(x)f(y))=\mathbb{E}(\frac{\partial^k}{\partial x_{i_1}\ldots \partial x_{i_k}}f(x) \frac{\partial^l}{\partial y_{j_1}\ldots\partial y_{j_l}}f(y)). We find,

  • \mathbb{E}(f(\mathbf{n})f(\mathbf{n}))=\frac1{2N}
  • \mathbb{E}(\partial_{x_i}f(\mathbf{n})f(\mathbf{n}))=0
  • \mathbb{E}(\partial_{x_i}f(\mathbf{n})\partial_{y_m}f(\mathbf{n}))=\frac{1}{N}\delta_{im}
  • \mathbb{E}(\partial_{x_i}\partial_{x_j}f(\mathbf{n})f(\mathbf{n}))=-\frac{1}{N}\delta_{ij}
  • \mathbb{E}(\partial_{x_i}\partial_{x_j}f(\mathbf{n})\partial_{y_m}f(\mathbf{n}))=0
  • \mathbb{E}(\partial_{x_i}\partial_{x_j}f(\mathbf{n})\partial_{y_m}\partial_{y_n}f(\mathbf{n}))=\frac{2}{N}\delta_{ij}\delta_{mn}+\frac1{N}(\delta_{im}\delta_{jn}+\delta_{in}\delta_{jm})

This allows us to identifiy the joint law of the joint law of (f(\mathbf{n}), \nabla f(\mathbf{n}), \text{Hess} f(\mathbf{n}) noticing that the gradient is independent from (f(\mathbf{n}), \text{Hess} f(\mathbf{n})) and that conditioned on f(\mathbf{n})=u one has \text{Hess} f(\mathbf{n})=\sqrt{\frac{(N-1)}{N}}M_{N-1}-\sqrt{2}u \text{Id} where M_{N-1} is a N-1\times N-1 random GOE matrix, u \sim \mathcal{N}(0,1/N) while \nabla f(\mathbf{n})\sim\mathcal{N}(0,\frac1{N}\text{Id}_{N-1}).

Therefore according to Kac-Rice formula, the number of critical point of f is given by

\mathbb{E}(N_0(\nabla f,f,(-\infty, t))=C_{N-1}\int_{-\infty}^tdu \mathbb{E}_{GOE}(\lvert \det(M_{N-1}-u)\rvert)

Using a classical trick, we can recognize, up to multiplicative constant that \int_{-\infty}^tdu \mathbb{E}_{GOE}(\lvert \det(M_{N-1}-u)\rvert)=\int_{-\infty}^tdu\int_{-\infty}^{\infty}\prod_{i=1}^{N-1}d\lambda_i \lvert u-\lambda_i\rvert\prod_{i<j}\lvert \lambda_i-\lambda_j\rvert e^{-\frac{N}{2}\sum_{i=1}^{N-1}\lambda_i^2-\frac{N}{2}u^2} is the distribution function of eigenvalues of A by noticing that one can interpret u as the N^{\text{th}} eigenvalues of A (where the \lambda_i‘s are the remaining (N-1)^{\text{th}}. This gives a good check of our Kac-Rice formula, as computing the average number of critical points of the random bilinear form \langle x, Ax\rangle recovers the distribution of eigenvalues of A as should be expected from Courant-Fisher.

  1. I thank them for their patience, their nice remarks and their general curiosity ↩︎