Rejection and Particle Filtering for Hamiltonian Learning
Cassandra E. Granade
Centre for Engineered Quantum Systems
$
\newcommand{\ee}{\mathrm{e}}
\newcommand{\ii}{\mathrm{i}}
\newcommand{\dd}{\mathrm{d}}
\newcommand{\id}{𝟙}
\newcommand{\TT}{\mathrm{T}}
\newcommand{\defeq}{\mathrel{:=}}
\newcommand{\Tr}{\operatorname{Tr}}
\newcommand{\Var}{\operatorname{Var}}
\newcommand{\Cov}{\operatorname{Cov}}
\newcommand{\rank}{\operatorname{rank}}
\newcommand{\expect}{\mathbb{E}}
\newcommand{\sket}[1]{|#1\rangle\negthinspace\rangle}
\newcommand{\sbraket}[1]{\langle\negthinspace\langle#1\rangle\negthinspace\rangle}
\newcommand{\Gini}{\operatorname{Ginibre}}
\newcommand{\supp}{\operatorname{supp}}
\newcommand{\ket}[1]{\left|#1\right\rangle}
\newcommand{\bra}[1]{\left\langle#1\right|}
\newcommand{\braket}[1]{\left\langle#1\right\rangle}
$
Learning Hamiltonians is critical to a range of QIP tasks:
- Metrology
- Learning magnetic fields, etc.
- Calibration
- Static field / pulse power / crosstalk, etc.
- Debugging/Diagnosis
- $T_2$ estimation, other noise finding
- Verification/Validation
- Analog and digital quantum simulation
### **Example**: Ramsey Estimation ###
Suppose $H = \omega \sigma_z / 2$ for some unknown $\omega$.
Traditional approach:
- Prepare $\ket{+} \propto \ket{0} + \ket{1}$, measure “click” w/ pr.:
$
\|\bra{+} \ee^{\ii \omega t \sigma_z / 2} \ket{+}\|^2 = \cos^2(\omega t / 2)
$.
- Repeat for many “shots” to estimate click pr.
- Repeat for many times to estimate signal.
You'll get something that looks a bit like this:
What's $\omega$? Fourier transform and look at the peak.
We can do better.
# $H = H(\vec{x})$. #
Hamiltonian learning is a special case of *parameter estimation*:
given data $D$, what is $\vec{x}$?
---
We want an approach that can work for small and large quantum devices
alike.
To get there, we consider several different approaches to
parameter estimation.
-
Analytic
Sergeevich et al. c4vv95
Ferrie, Granade, and Cory tfx
-
Rejection filter
Wiebe and Granade bk9d
-
Particle filter
Doucet et al. bmch
-
Likelihood-free particle filter
Ferrie and Granade tdj
### The Likelihood Function ###
$\|\bra{+} \ee^{\ii \omega t \sigma_z / 2} \ket{+}\|^2$
defines probability $\Pr(d | \omega; t)$ for every
outcome $d$, model $\omega$ and experiment $t$.
Basis for both maximum-likelihood and Bayesian methods.
### Bayesian Parameter Estimation ###
The likelihood tells us what we learn from data:
$$
\Pr(\vec{x} | d; e) = \frac{\Pr(d | \vec{x}; e)}{\Pr(d | e)} \Pr(\vec{x})
$$
---
Estimate $\hat{x} = \expect[\vec{x} | d; e] = \int \vec{x} \Pr(\vec{x} | d; e)\dd \vec{x}$.
- **Optimal** for mean—squared error.
Each posterior $\Pr(\vec{x} | d; e)$ encodes our uncertainty about
$x$. What should we expect squared-error to be?
**Scalar case**:
$
\expect[(x - \hat{x})^2 | d; e] =
\mathbb{V}[x \mid d; e].
$
**Vector case**: $\expect[(\vec{x} - \hat{x}) (\vec{x} - \hat{x})^\TT | d; e]
= \Cov(\vec{x} | d; e)$.
We can use our posteriors to make *adaptive* decisions:
$$
e_* = \operatorname{arg min}_e \expect_d[\mathbb{V}(x | d; e)]
$$
### **Example**: $x = (\omega)$ ###
Can analytically find posterior for Gaussian priors,
use to adaptively choose $t_k$.
![](./figures/how-t2.png)
**Problem**: may be intractable to analytically compute $$
\hat{x} \defeq
\int \Pr(\vec{x} | d; e) \dd\vec{x} =
\int \frac{
\Pr(d | \vec{x}; e)
}{
\int \Pr(d | \vec{x}; e) \Pr(\vec{x}) \dd\vec{x}
} \Pr(\vec{x}) \dd\vec{x}.
$$
---
**Answer**: numerically approximate $\int f(\vec{x}) \Pr(\vec{x} | d)\dd\vec{x}$.
*Efficient* to use Monte Carlo integration if we can sample from $\Pr(\vec{x})$.
## Rejection Sampling ##
Given samples from $\Pr(\vec{x})$ and likelihood
function $\Pr(d | \vec{x}; e)$, how do we sample
from posterior for datum $d$?
- Draw $\vec{x} \sim \Pr(\vec{x})$,
accept $\vec{x}$ w/ $\Pr(d | \vec{x}; e)$.
Accepted samples are distributed according to posterior.
Rejection Sampling Isn't Enough
Let $D = {d_1, \dots, d_k}$ be a set of data.
$$
\Pr(\text{accept} | \vec{x}) = \Pr(D | \vec{x}) = \prod_{d \in D} \Pr(d | \vec{x})
\overset{k \to \infty}{\longrightarrow} 0.
$$
Example: Biased Coin $x = (p)$
$\Pr(H | p) = p$, $d \in \{H, T\}$.
$p \approx 0.5 \Longrightarrow \Pr(d_1, \dots, d_k | p) \approx 1 / 2^k$.
We will accept exponentially few samples!
### Gaussian Resampling ###
For each datum $d$, use rejection sampling to approximate posterior
moments:
- $\bar{x} \gets \expect[\vec{x} | d]$.
- $\Sigma \gets \operatorname{Cov}[\vec{x} | d] = \expect[\vec{x} \vec{x}^\TT | d] - \bar{x} \bar{x}^\TT$.
---
At the next iteration, draw prior samples from Gaussian with these moments:
$$
\vec{x} \mid d \sim \mathcal{N}(\bar{x}, \Sigma)
$$
Keeps $\Pr(\text{accept}) \approx \text{constant}$.
Can compute $\bar{x}$, $\Sigma$ from
one sample at a time by accumulating
$$
x_{\Sigma} = \sum x
\text{ and }
x^2_{\Sigma} = \sum x^2.
$$
\begin{align}
\bar{x} & = x_{\Sigma} / n_{\text{accept}} \\
\Sigma & = x^2_{\Sigma} / n_{\text{accept}} - \bar{x}^2.
\end{align}
Welford's algorithm: numerically-stable modification.
Rejection Filtering (RejF)
Input:
Prior mean $\bar{x}$, prior covariance $\Sigma$,
number of attempts $m$.
- For each datum $d$ and experiment $e$:
-
$n, \bar{x}', M_2 \gets 0$
- While $n < m$:
-
Draw $\vec{x} \sim \mathcal{N}(\bar{x}, \Sigma)$.
- Accept $\vec{x}$ w/ $\Pr(d | \vec{x}; e)$.
- If accepted, update $n$, $\bar{x}'$, $M_2$.
-
$\bar{x} \gets \bar{x}'$, $\Sigma \gets M_2 / (n - 1)$.
## Advantages of RejF ##
- Accept pr based on a single datum
⇒ likelihood $\not\to 0$.
- Easy to implement
- Never needed to remember each accepted $x$!
- Very low-memory (constant # of
registers), ideal for FPGA use.
- Easily parallelizable
- “Reset rule” can abort from approximation failures.
Example: Phase Estimation, $x = (\phi)$
Prepare state $\ket{\phi}$ s. t. $U\ket{\phi} = \ee^{\ii \phi}\ket{\phi}$,
measure to learn $\phi$.
$\theta = 0 \Rightarrow$ freq. est likelihood, w/ $\phi = \omega$, $M = t$.
Applications
- Interferometry / metrology
Higgins et al.
crwd6w
- Gate calibration / robust PE
Kimmel et al.
bmrg
- Quantum simulation and chemistry
Reiher et al.
1605.03590
Example: Phase Estimation, $x = (\phi)$
**Drawback**: RejF requires posterior after each datum
to be $\approx$ Gaussian.
We can solve this by using a more general approach:
- Weaken Gaussian assumption.
- Generalize the rejection sampling step.
Liu-West Resampler
If we remember each sample $\vec{x}$, we can use them to
relax RejF assumptions.
Input:
$a, h \in [0, 1]$ s.t. $a^2 + h^2 = 1$,
distribution $p(\vec{x})$.
-
Approximate $\bar{x} \gets \expect[\vec{x}]$, $\Sigma \gets \operatorname{Cov}(\vec{x})$
- Draw parent $\vec{x}$ from $p(\vec{x})$.
- Draw $\vec{\epsilon} \sim \mathcal{N}(0, \Sigma)$.
-
Return
new sample $\vec{x}' \gets a \vec{x} + (1 - a) \bar{x} + h \vec{\epsilon}$.
### Why Does Liu-West Work? ###
\begin{align}
\vec{x}' & \gets a \vec{x} + (1 - a) \bar{x} + h \vec{\epsilon} \\\\
\expect[\vec{x}'] & = [a + (1 - a)] \bar{x} \\\\
\Cov(\vec{x}') & = (a^2 + h^2) \Cov(\vec{x}). \\\\
\Longrightarrow a^2 + h^2 & = 1 \text{ preserves } \expect[\vec{x}], \Cov(\vec{x}).
\end{align}
---
Mixes original approximation with $1 - a$ of a Gaussian
matching moments.
- $a \to 0$: RejF (assumed density) approx
- $a \to 1$: Bootstrap
- $a = 0.98$: typical case (2% Gaussian).
### From Samples to Particles ###
Define a particle $(w_i, \vec{x}_i)$ as
a sample $\vec{x}_i$ and a weight $w_i \in [0, 1]$.
- $\expect[\vec{x}] = \sum_i w_i \vec{x}_i$
- $\Cov(\vec{x}) =
\sum_i w_i \vec{x}_i \vec{x}_i^\TT - \expect[\vec{x}]\expect^\TT[\vec{x}]$
- $\expect[f(\vec{x})] = \sum_i w_i f(\vec{x}_i)$
---
Corresponds to
$
p(\vec{x}) \approx \sum_i w_i \delta(\vec{x} - \vec{x}_i).
$
Particles can represent distributions using either
weights or
positions.
Particle Filter
-
Draw $N$ initial samples $\vec{x}_i$ from the prior $\Pr(\vec{x})$
w/ uniform weights.
-
Instead of rej. sampling, update weights
by
\begin{align}
\tilde{w}_i & = w_i \times \Pr(d | \vec{x}_i; e)
\end{align}
-
Renormalize.
\begin{align}
w_i & \mapsto \tilde{w}_i / \sum_i \tilde{w}_i
\end{align}
-
Periodically use Liu-West to draw new $\{\vec{x}_i\}$
with uniform weights.
Useful for Hamiltonian models...
- Rabi/Ramsey/Phase est.
Granade et al.
s87
- Swap spectroscopy
Stenberg et al.
7nw
...as well as other QIP tasks.
- Tomography
Huszár and Holsby
s86
Struchalin et al.
bmg5
Ferrie
7nt
Granade et al.
bhdw,
1605.05039
- Randomized benchmarking
Granade et al.
zmz
- Continuous measurement
Chase and Geremia
chk4q7
- Interferometry/metrology
Granade
10012/9217
### **Example**: $\vec{x} = (\omega, T_2)$ ###
![](figures/unknown-t2.png)
Example: $\vec{x} =$ COSY / Crosstalk
\(
H(\omega_1, \omega_2, J) = \frac12 \left(
\omega_1 \sigma_z \otimes \id +
\omega_2 \id \otimes \sigma_z +
J \sigma_z \otimes \sigma_z
\right)
\)
Example: $\vec{x} =$ COSY / Crosstalk
Error Bars
Particle filters also yield credible regions $X_\alpha$ s.t.
$$
\Pr(\vec{x} \in X_\alpha | d; e) \ge \alpha.
$$
E.g.: Posterior covariance ellipse, convex hull, MVEE.
#### **QInfer**: Particle Filter Implementation for Quantum Info ####
We provide an open-source implementation for use in experiment and theory alike.
Written in Python, works with MATLAB and Julia.
Assessing Performance
-
Risk:
average error given $\color{white}{\vec{x}}$
-
Bayes risk:
average error over $\color{white}{\vec{x}}$
---
- Exact $\hat{x}$ optimal for Bayes risk / squared error.
- Difficult to *prove* particle filter risk; find in special
cases or by running many trials.
Simulation Costs
\begin{align}
\tilde{w}_i & = w_i \times \color{red}{\Pr(d | \vec{x}_i; e)} \\
w_i & \mapsto \tilde{w}_i / \sum_i \tilde{w}_i
\end{align}
- $\Pr(d | \vec{x}_i; e)$ is a *simulation*.
- Need to
simulate for each particle (~1000s calls/datum).
- Infeasible for large quantum models.
Let's do better: use *quantum* simulation instead.
### Two Kinds of Simulation ###
![](figures/simulators.png)
Weak simulators produce *plausible datasets*.
### Likelihood-Free RejF ###
Replace rej. sampling step by drawing
datum from likelihood instead of computing
exact value:
- Draw datum $d'$ from $\Pr(d | \vec{x}; e)$.
- Accept $\vec{x}$ if $d = d'$.
### Likelihood-Free Particle Filtering ###
Can also use frequencies $f$ from weak simulation to
approximate likelihoods
in particle filtering.
$$
\widehat{\Pr}(d | \vec{x}; e) =
\frac{f(d | \vec{x}; e)}{k}
$$
$k$: number of weak simulation calls used.
#### **Example**: Noisy Coin ####
How well can we learn the bias $x = (p)$ of a noisy coin?
$$
\Pr(\text{click} | p) = 0.95 p + 0.1 (1 - p)
$$
### Quantum Hamiltonian Learning ###
![](figures/qle.png)
### **Example**: Nearest-Neighbor 1D Ising ###
![](figures/qle-line-ising.png)
We can do more with access to a *trusted*
simulator.
### Quantum Interactivity ###
![](figures/iqle.png)
Perform weak simulations on trusted device only.
We design experiments using the
PGH: Particle Guess Heuristic
-
Draw $\vec{x}_-, \vec{x}_-'$ from current
posterior.
- Let $t = 1 / |\vec{x}_- - \vec{x}_-'|$.
- Return $e = (\vec{x}_-, t)$.
Adaptively chooses experiments such that
$t |\vec{x}_- - \vec{x}_-'| \approx\,$ constant.
### **Example**: Ising on Complete Graph ###
![](figures/iqle-complete-ising.png)
Robust even to wrong model. ($0.5$ NN + $10^{-4}$ Complete)
![](figures/qhl-bad-model.png)
One important approximation: physical locality.
Approximation quality can be bounded if
Lieb-Robinson velocity is finite.
Scan trusted device across untrusted.
Run particle filter only on supported
parameters.
50 qubit Ising chain, 8 qubit simulator, 4 qubit observable
Going Further
- Hyperparameterization
Granade et al.
s87
-
$\Pr(d | y) = \expect_x[\Pr(d | x) \Pr(x | y)]$.
Allows composing w/ noise, inhomogeneity, etc.
- Time-dependence
Isard and Blake
cc76f6
-
Adding timestep update allows learning stochastic
processes.
- Model selection
Ferrie
7nt
-
Using acceptance ratio or normalizations
enables comparing models.
- Quantum filtering
Wiebe and Granade
1512.03145
-
Rejection filtering can be quantized using
Harrow et al.
bcz3hc.
Welford's Algorithm
Can compute $\bar{x}$, $\Sigma$ from
one sample at a time. Numerically stable.
- $n, \bar{x}, M_2 \gets 0$.
- For each sample $x$:
-
$n \gets n + 1$
-
$\Delta \gets x - \mu$
-
$\bar{x} \gets \bar{x} + \Delta / n$
-
$M_2 \gets M_2 + \Delta (x - \bar{x})$
- Return mean $\bar{x}$, variance $M_2 / (n - 1)$.
Vector case is similar.