Cassandra E. Granade
Centre for Engineered Quantum Systems
joint work with Joshua Combes and D. G. Cory • 1509.03770 https://www.cgranade.com/research/talks/quics/09-2015/ \newcommand{\ee}{\mathrm{e}} \newcommand{\ii}{\mathrm{i}} \newcommand{\dd}{\mathrm{d}} \newcommand{\id}{𝟙} \newcommand{\TT}{\mathrm{T}} \newcommand{\defeq}{\mathrel{:=}} \newcommand{\Tr}{\operatorname{Tr}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Cov}{\operatorname{Cov}} \newcommand{\rank}{\operatorname{rank}} \newcommand{\expect}{\mathbb{E}} \newcommand{\sket}[1]{|#1\rangle\negthinspace\rangle} \newcommand{\sbraket}[1]{\langle\negthinspace\langle#1\rangle\negthinspace\rangle} \newcommand{\Gini}{\operatorname{Ginibre}} \newcommand{\supp}{\operatorname{supp}}
10/abc \mapsto https://dx.doi.org/10/abc
(Make your own at shortdoi.org!)
This talk can be summarized in one slide.
\begin{equation} \Pr(\text{model} | \text{data}) = \frac{\Pr(\text{data} | \text{model})}{\Pr(\text{data})} \Pr(\text{model}) \end{equation}
Bayesian methods for experimental data processing work.
Quantum tomography: Characterizing quantum states and processes from experimental data.
From a statistical perspective, this is a parameter estimation problem.
We get the first two for free from taking a Bayesian perspective.
In this talk, we will address all three challenges.
See also:
Suppose we want to learn x from data D. Then \Pr(x | D) = \frac{\Pr(D | x)}{\Pr(D)} \Pr(x) is the posterior distribution over x.
\Pr(x | D) describes what we know about x.
We estimate x by the expectation value \hat{x} = \expect[x | D] = \int x\,\Pr(x|D)\, \dd x.
This estimator is optimal for the mean squared error.
We estimate our error in the same way: \begin{align*} \expect[(x - \hat{x}(D))^2 | D] & = \expect[(x - \expect[x | D])^2 | D] \\ & = \Var(x | D). \end{align*} For multiple parameters, \expect[(\vec{x} - \hat{\vec{x}})^\TT (\vec{x} - \hat{\vec{x}}) | D] = \Tr(\Cov(\vec{x} | D)).
More generally, \hat{f} = \expect[f(x) | D].
\Pr(D | x) is a likelihood function that specifies an experimental model.
For state tomography, our likelihood is Born's rule, \Pr(E | \rho) = \Tr[E \rho], where \rho is a state and E is a measurement effect.
Thus, Bayes' rule allows us to estimate \rho.
Want conjugate priors f(x; y) to perform Bayesian inference: \begin{align*} \frac{\Pr(d | x)}{\Pr(d)} f(x; y) = f(x; y'(d, y)). \end{align*} Inference then consists of finding a “nice” form for y'.
Frustrating to find conjugate priors for states that respect Hilbert space structure.
The beta distribution is supported on [0, 1] for all \alpha, \beta > 0 and is a conjugate prior to the binomial distribution.
We will use later that the beta distribution is always supported on 0.
Same techniques as before apply to enable learning snapshots of dynamics. Choi-Jamiołkowski isomorphism lets us rewrite process tomography as (ancilla-assisted) state tomography.
\Tr[E \Lambda(\rho)] = \Tr[(\id \otimes E) J(\Lambda) (\rho^\TT \otimes \id)] = \sbraket{\rho^\TT, E | J(\Lambda)}
We parameterize a state \rho as a real vector \vec{x}, x_i = \sbraket{B_i | \rho} = \Tr(B_i^\dagger \rho), where \{B_i\} is a basis of Hermitian operators.
By convention, \Tr(B_i) = \delta_{i0} / \sqrt{D}. E.g.:
For state tomography, the BME is approximately optimal for the fidelity (Ferrie and Keung 1503.00677).
The error in an observable X is given by the covariance superoperator \Sigma\rho = \Cov(\sket{\rho}), \begin{align*} \Var(X) & = \Var_{\rho}[\langle X\rangle_{\rho}] + \langle X^2\rangle_{\expect[\rho]} - \langle X\rangle_{\expect[\rho]}^2 \\ & = \sbraket{X | \Sigma\rho | X} + \sbraket{X | X | \hat\rho} - \langle X\rangle_{\expect[\rho]}. \end{align*} (Blume-Kohout 10/cn772j)
In practice, Bayesian mean estimation is not tractable in the exact case. We thus use particle filtering to approximate.
Implemented by the QInfer library for Python.
\begin{align} \Pr(\rho) & \approx \sum_{p\in\text{particles}} w_p \delta(\rho - \rho_p) \\ w_p & \mapsto w_p \times \Pr(E | \rho_p) / \mathcal{N} \end{align}
Mixtures of \delta-functions are conjugate priors for practically any likelihood.
Big advantage: we only need samples from the prior!
As we collect data, this becomes unstable, so we must resample.
Particle filtering is used in a range of quantum information applications.
Expressing as Bayesian parameter estimation / particle filtering problem affords us a few other advantages.
A credible region R_\alpha for \rho satisfies \Pr(\rho \in R_{\alpha} | D) \ge \alpha.
Construct from particle approximation to posterior, covariance regions (Granade et al 10/s87), convex hull or minimum-volume enclosing ellipse (Ferrie 10/tb4).
Built-in to QInfer.
Bayes factor–based model selection built-in to QInfer.
\begin{align*} \text{Suppose } \vec{x} & \sim \Pr(\vec{x} | \vec{y}) \\ \text{Then, } \Pr(D | \vec{y}) & = \expect_{\vec{x} | \vec{y}} [\Pr(D | \vec{x}, \vec{y})] \\ & = \int \Pr(D | \vec{x}, \vec{y}) \Pr(\vec{x} | \vec{y})\,\dd\vec{x}. \end{align*} Marginalizing gives us a likelihood for the hyperparameters \vec{y}!
Allows us to include effects outside of Born's rule. For instance, non-Cauchy decoherence (Granade et al. 10/s87).
But how do we choose our prior? Let's get to the meat of things.
In order to be useful, a prior over states should:
It helps to consider an uninformative prior first.
\rank(\rho) = K. If K = 1, \rho is pure. If K = N, Hilbert-Schmidt prior.
NB: Choosing X to be real gives Ginibre over redits.
import qinfer as qi
basis = qi.tomography.pauli_basis(1)
prior = qi.tomography.GinibreReditDistribution(basis)
qi.tomography.plot_rebit_prior(prior, rebit_axes=[1, 3])
What makes \Gini(N, K) uninformative? \expect[\rho] = \id / N. The mean is the maximally-mixed state.
How do we add information to the prior, specifically the prior estimate state \rho_\mu?
Big idea: Use an ensemble of amplitude damping channels to transform a uniform prior.
Let \phi be a fiducial prior. Then, for scalars \alpha,\beta and a state \rho_*, draw \rho_{\text{sample}} by:
NB: \supp \pi \supseteq \supp \phi
Choose \rho_* s. t. \expect_{\rho\sim\pi}[\rho] = \rho_\mu:
\rho_* = \left(\frac{\alpha + \beta}{\alpha}\right) \left( \rho_\mu - \frac{\beta}{\alpha+\beta} \frac{\id}{N} \right)
Choose \alpha,\beta s. t. \expect[\epsilon] is minimized: \alpha = 1 \qquad \beta = \frac{\lambda_\min}{1 - N \lambda_\min}, where \lambda_\min is the smallest eigenvalue of \rho_\mu.
That is, we contract the fiducial prior as little as possible to obtain the desired mean.
This construction preserves the support of our “flat” prior, takes \rho_\mu as an input and can be easily sampled.
Inherits other assumptions by convexity (e.g.: rebit, CPTP, positivity, etc.).
import qinfer as qi
import qutip as qt
I, X, Y, Z = qt.qeye(2), qt.sigmax(), qt.sigmay(), qt.sigmaz()
prior_mean = (I + (2/3) * Z + (1/3) * X) / 2
basis = qi.tomography.pauli_basis(1)
fiducial_prior = qi.tomography.GinibreReditDistribution(basis)
prior = qi.tomography.GADFLIDistribution(fiducial_prior, prior_mean)
QInfer's tomography support is backed by QuTiP.
Posterior covariance characterizes uncertainty.
Principal channels tell us which components dominate our uncertianty.
(a quick tutorial)
All models are wrong, some are useful. —Chris Ferrie
We've seen how to create bases and priors, to finish we need a model, an updater and a heuristic.
model = qi.BinomialModel(qi.tomography.TomographyModel(basis))
The sequential Monte Carlo updater performs Bayes updates using particle filtering.
updater = qi.smc.SMCUpdater(model, 2000, prior)
heuristic = qi.tomography.RandomPauliHeuristic(updater,
other_fields={'n_meas': 40}
)
This heuristic measures random Paulis 40 times each.
for idx_exp in xrange(50):
experiment = heuristic()
# This is where your data goes! 💗
# For now, we'll simulate. 💔
datum = model.simulate_experiment(true_state, experiment)
updater.update(datum, experiment)
Interlace Bayesian updates with diffusion \rho(t_{k+1}) = \rho(t_k) + \Delta \eta. and truncation.
Draw each traceless parameter of the diffusion step \Delta \eta from a Gaussian.
(Isard and Blake 10/cc76f6)
We don't need to assume a particular diffusion rate. Include as model parameter \eta, such that \begin{equation} \sigma = \sqrt{t_{k + 1} - t_k} \eta. \end{equation}
Bayesian updates on the state then “co-evolve” \eta to learn diffusion rate.
We don't need to assume that the "true" state follows a random walk.