What is a Hamiltonian?
Lots of meanings and applications in condensed matter, quantum info, etc.
In this talk, we consider precisely one:
$$
\ket{\psi(t)} = \ee^{\ii H t} \ket{\psi(0)}
$$
A Hamiltonian generates dynamics.
Learning Hamiltonians is critical to a range of tasks:
- Metrology
- Learning magnetic fields, etc.
- Calibration
- Static field / pulse power / crosstalk, etc.
- Debugging/Diagnosis
- $T_2$ estimation, other noise finding
- Verification/Validation
- Analog and digital quantum simulation
### **Example**: Ramsey Estimation ###
Suppose $H = \omega \sigma_z / 2$ for some unknown $\omega$.
Traditional approach:
- Prepare $\ket{+} \propto \ket{0} + \ket{1}$, measure “click” w/ pr.:
$
\|\bra{+} \ee^{\ii \omega t \sigma_z / 2} \ket{+}\|^2 = \cos^2(\omega t / 2)
$.
- Repeat for many “shots” to estimate click pr.
- Repeat for many times to estimate signal.
You'll get something that looks a bit like this:
What's $\omega$? Fourier transform and look at the peak.
We can do better.
# $H = H(\vec{x})$. #
Hamiltonian learning is a special case of *parameter estimation*:
given data $D$, what is $\vec{x}$?
We want an approach that can work for small and large quantum devices
alike.
Punchline: our algorithm can
characterize 50-qubit devices.
\begin{align}
H & = \sum_{i = 1}^{50} \sum_{j = 1}^{50} \omega_{ij}
\sigma_z^{(i)} \sigma_z^{(j)}, \\
\omega_{i,j} & \sim \operatorname{Uniform}\left(0, 10^{-2 (|i - j| - 1)}\right).
\end{align}
To get there, we consider several different approaches to
parameter estimation.
-
Analytic
Sergeevich et al. c4vv95
Ferrie, Granade, and Cory tfx
-
Rejection filter
Wiebe and Granade bk9d
-
Particle filter
Doucet et al. bmch
-
Likelihood-free particle filter
Ferrie and Granade tdj
-
Quantum bootstrapping
Wiebe, Granade and Cory 7nx
### The Likelihood Function ###
$\|\bra{+} \ee^{\ii \omega t \sigma_z / 2} \ket{+}\|^2$
defines probability $\Pr(d | \omega; t)$ for every
outcome $d$, model $\omega$ and experiment $t$.
Basis for both maximum-likelihood and Bayesian methods.
Aside: Why Bayesian?
Frequentist methods work. Bayesian methods also work.
Methodology should follow the question of interest.
Example
What should I believe the properties of my system are,
given my experience and a set of observations?
Bayesian question, hence Bayesian answer.
### Bayesian Parameter Estimation ###
The likelihood tells us what we learn from data:
$$
\Pr(\vec{x} | d; e) = \frac{\Pr(d | \vec{x}; e)}{\Pr(d | e)} \Pr(\vec{x})
$$
---
Estimate $\hat{x} = \expect[\vec{x} | d; e] = \int \vec{x} \Pr(\vec{x} | d; e)\dd \vec{x}$.
- **Optimal** for mean—squared error.
Inference as an Iterative Algorithm
Input:
Prior $\Pr(\vec{x})$,
data set $D$,
likelihood $\Pr(d | \vec{x}; e)$
- $p(\vec{x}) \gets \Pr(\vec{x})$
- For each datum $d \in D$ and experiment $e$:
-
Update based on $d$:
$p(\vec{x}) \gets \Pr(d | \vec{x}; e) p(\vec{x}) / \Pr(d)$.
- Return $\hat{x} = \int \vec{x}\,p(\vec{x})\,\dd\vec{x}$.
---
At each step, $p(\vec{x})$ also encodes our uncertainty.
$
\expect[(x - \hat{x})^2 | d; e] =
\mathbb{V}[x \mid d; e].
$
We can use our posteriors to make *adaptive* decisions:
$$
e_* = \operatorname{arg min}_e \expect_d[\mathbb{V}(x | d; e)]
$$
### **Example**: $x = (\omega)$ ###
Can analytically find posterior for Gaussian priors,
use to adaptively choose $t_k$.
![](./figures/how-t2.png)
**Problem**: may be intractable to analytically compute $$
\hat{x} \defeq
\int \Pr(\vec{x} | d; e) \dd\vec{x} =
\int \frac{
\Pr(d | \vec{x}; e)
}{
\int \Pr(d | \vec{x}; e) \Pr(\vec{x}) \dd\vec{x}
} \Pr(\vec{x}) \dd\vec{x}.
$$
---
**Answer**: numerically approximate $\int f(\vec{x}) \Pr(\vec{x} | d)\dd\vec{x}$.
*Efficient* to use Monte Carlo integration if we can sample from
the posterior $\Pr(\vec{x} | d; e)$.
## Rejection Sampling ##
Given samples from $\Pr(\vec{x})$ and likelihood
function $\Pr(d | \vec{x}; e)$, how do we sample
from posterior for datum $d$?
- Draw $\vec{x} \sim \Pr(\vec{x})$,
accept $\vec{x}$ w/ $\Pr(d | \vec{x}; e)$.
Accepted samples are distributed according to posterior.
Rejection Sampling Isn't Enough
Let $D = {d_1, \dots, d_k}$ be a set of data.
$$
\Pr(\text{accept} | \vec{x}) = \Pr(D | \vec{x}) = \prod_{d \in D} \Pr(d | \vec{x})
\overset{k \to \infty}{\longrightarrow} 0.
$$
Example: Biased Coin $x = (p)$
$\Pr(H | p) = p$, $d \in \{H, T\}$.
$p \approx 0.5 \Longrightarrow \Pr(d_1, \dots, d_k | p) \approx 1 / 2^k$.
We will accept exponentially few samples!
### Gaussian Resampling ###
For each datum $d$, use rejection sampling to approximate posterior
moments:
- $\bar{x} \gets \expect[\vec{x} | d]$.
- $\Sigma \gets \operatorname{Cov}[\vec{x} | d] = \expect[\vec{x} \vec{x}^\TT | d] - \bar{x} \bar{x}^\TT$.
---
At the next iteration, draw prior samples from Gaussian with these moments:
$$
\vec{x} \mid d \sim \mathcal{N}(\bar{x}, \Sigma)
$$
Keeps $\Pr(\text{accept}) \approx \text{constant}$.
Can compute $\bar{x}$, $\Sigma$ from
one sample at a time by accumulating
$$
x_{\Sigma} = \sum x
\text{ and }
x^2_{\Sigma} = \sum x^2.
$$
\begin{align}
\bar{x} & = x_{\Sigma} / n_{\text{accept}} \\
\Sigma & = x^2_{\Sigma} / n_{\text{accept}} - \bar{x}^2.
\end{align}
Welford's algorithm: numerically-stable modification.
Rejection Filtering (RejF)
Input:
Prior mean $\bar{x}$, prior covariance $\Sigma$,
number of samples $m$ to accept.
- For each datum $d$ and experiment $e$:
-
$n, \bar{x}', M_2 \gets 0$
- While $n < m$:
-
Draw $\vec{x} \sim \mathcal{N}(\bar{x}, \Sigma)$.
- Accept $\vec{x}$ w/ $\Pr(d | \vec{x}; e)$.
- If accepted, update $n$, $\bar{x}'$, $M_2$.
-
$\bar{x} \gets \bar{x}'$, $\Sigma \gets M_2 / (n - 1)$.
Easy to implement and embed in control hardware.
Example: Phase Estimation, $x = (\phi)$
Prepare state $\ket{\phi}$ s. t. $U\ket{\phi} = \ee^{\ii \phi}\ket{\phi}$,
measure to learn $\phi$.
$\Pr(1 | \phi; M, \theta) = \cos^2(M [\phi - \theta])$
Applications
- Interferometry / metrology
Higgins et al.
crwd6w
- Gate calibration / robust PE
Kimmel et al.
bmrg
- Quantum simulation and chemistry
Reiher et al.
1605.03590
Example: Phase Estimation, $x = (\phi)$
**Drawback**: RejF requires posterior after each datum
to be $\approx$ Gaussian.
We can solve this by using a more general approach:
- Weaken Gaussian assumption.
- Generalize the rejection sampling step.
Liu-West Resampler
If we remember each sample $\vec{x}$, we can use them to
relax RejF assumptions.
Input:
$a, h \in [0, 1]$ s.t. $a^2 + h^2 = 1$,
distribution $p(\vec{x})$.
-
Approximate $\bar{x} \gets \expect[\vec{x}]$, $\Sigma \gets \operatorname{Cov}(\vec{x})$
- Draw parent $\vec{x}$ from $p(\vec{x})$.
- Draw $\vec{\epsilon} \sim \mathcal{N}(0, \Sigma)$.
-
Return
new sample $\vec{x}' \gets a \vec{x} + (1 - a) \bar{x} + h \vec{\epsilon}$.
### Why Does Liu-West Work? ###
\begin{align}
\vec{x}' & \gets a \vec{x} + (1 - a) \bar{x} + h \vec{\epsilon} \\\\
\expect[\vec{x}'] & = [a + (1 - a)] \bar{x} \\\\
\Cov(\vec{x}') & = (a^2 + h^2) \Cov(\vec{x}). \\\\
\Longrightarrow a^2 + h^2 & = 1 \text{ preserves } \expect[\vec{x}], \Cov(\vec{x}).
\end{align}
---
Mixes original approximation with $1 - a$ of a Gaussian
matching moments.
- $a \to 0$: RejF (assumed density) approx
- $a \to 1$: Bootstrap
- $a = 0.98$: typical case (2% Gaussian).
### From Samples to Particles ###
Define a particle $(w_i, \vec{x}_i)$ as
a sample $\vec{x}_i$ and a weight $w_i \in [0, 1]$.
- $\expect[\vec{x}] = \sum_i w_i \vec{x}_i$
- $\Cov(\vec{x}) =
\sum_i w_i \vec{x}_i \vec{x}_i^\TT - \expect[\vec{x}]\expect^\TT[\vec{x}]$
- $\expect[f(\vec{x})] = \sum_i w_i f(\vec{x}_i)$
---
Corresponds to
$
p(\vec{x}) \approx \sum_i w_i \delta(\vec{x} - \vec{x}_i).
$
Particles can represent distributions using either
weights or
positions.
Particle Filter
-
Draw $N$ initial samples $\vec{x}_i$ from the prior $\Pr(\vec{x})$
w/ uniform weights.
-
Instead of rej. sampling, update weights
by
\begin{align}
\tilde{w}_i & = w_i \times \Pr(d | \vec{x}_i; e)
\end{align}
-
Renormalize.
\begin{align}
w_i & \mapsto \tilde{w}_i / \sum_i \tilde{w}_i
\end{align}
-
Periodically use Liu-West to draw new $\{\vec{x}_i\}$
with uniform weights.
Useful for Hamiltonian models...
- Rabi/Ramsey/Phase est.
Granade et al.
s87
- Swap spectroscopy
Stenberg et al.
7nw
...as well as other QIP tasks.
- Tomography
Huszár and Holsby
s86
Struchalin et al.
bmg5
Ferrie
7nt
Granade et al.
bhdw,
1605.05039
- Randomized benchmarking
Granade et al.
zmz
- Continuous measurement
Chase and Geremia
chk4q7
- Interferometry/metrology
Granade
10012/9217
Estimation in Practice
We need a bit more to make particle filtering a
practical solution.
-
Error bars
How well do we know $\vec{x}$?
-
Time-dependence
$\vec{x} = \vec{x}(t)$
-
Software impl.
Off-the-shelf.
Dealing with each in turn...
Error Bars
Particle filters report credible regions $X_\alpha$ s.t.
$$
\Pr(\vec{x} \in X_\alpha | d; e) \ge \alpha.
$$
E.g.: Posterior covariance ellipse, convex hull, MVEE.
Time-Dependence
In addition to updating particle weights, move each particle
stochastically:
$$
\vec{x}(t_{k+1}) = \vec{x}(t_k) + \vec{\eta},\qquad
\vec{\eta} \sim \mathcal{N}(0, (t_{k+1} - t_k) \Sigma)
$$
### Software Implementation ###
We provide **QInfer**, an open-source implementation for use in quantum information.
Written in Python, works with MATLAB and Julia.
---
Puts focus on algorithms and experiments, not implementations.
Bigger and Better
We've seen that filtering is useful for estimating
small quantum models. Now let's push on to bigger systems.
What challenges do we face for large systems?
Simulation Costs
\begin{align}
\tilde{w}_i & = w_i \times \color{red}{\Pr(d | \vec{x}_i; e)} \\
w_i & \mapsto \tilde{w}_i / \sum_i \tilde{w}_i
\end{align}
- $\Pr(d | \vec{x}_i; e)$ is a *simulation*.
- Need to
simulate for each particle (~1000s calls/datum).
- Infeasible for large quantum models.
Let's do better: use *quantum* simulation instead.
### Two Kinds of Simulation ###
![](figures/simulators.png)
Weak simulators produce *plausible datasets*.
### Likelihood-Free RejF ###
Replace rej. sampling step by drawing
datum from likelihood instead of computing
exact value:
- Draw datum $d'$ from $\Pr(d | \vec{x}; e)$.
- Accept $\vec{x}$ if $d = d'$.
### Likelihood-Free Particle Filtering ###
Can also use frequencies $f$ from weak simulation to
approximate likelihoods
in particle filtering.
$$
\widehat{\Pr}(d | \vec{x}; e) =
\frac{f(d | \vec{x}; e)}{k}
$$
$k$: number of weak simulation calls used.
### **Example**: Noisy Coin ###
How well can we learn the bias $x = (p)$ of a noisy coin?
$$
\Pr(\text{click} | p) = 0.95 p + 0.1 (1 - p)
$$
### Quantum Hamiltonian Learning ###
We can learn large Hamiltonians by
using likelihood-free filtering w/ interactivity.
![](figures/iqle.png)
Perform weak simulations on trusted device only.
Example: Ising on Complete Graph
Robust even to wrong model. ($0.5$ NN + $10^{-4}$ Complete)
![](figures/qhl-bad-model.png)
Quantum Bootstrapping
One important approximation f/ physical insight:
information locality.
Allows using small trusted device to
learn large Hamiltonians.
Approximation quality can be bounded if
Lieb-Robinson velocity is finite.
Scan trusted device across untrusted.
Run particle filter only on supported
parameters.
50 qubit Ising chain, 8 qubit simulator, 4 qubit observable
Filtering
-
Practical solution for
current experimental tasks.
-
Enables learning large Hamiltonians using quantum
resources.
-
Physical insight gives new statistical
algorithm for even larger systems.
Going Further
- Hyperparameterization
Granade et al.
s87
-
$\Pr(d | y) = \expect_x[\Pr(d | x) \Pr(x | y)]$.
Allows composing w/ noise, inhomogeneity, etc.
- Model selection
Ferrie
7nt
-
Using acceptance ratio or normalizations
enables comparing models.
- Quantum filtering
Wiebe and Granade
1512.03145
-
Rejection filtering is a dequantization of
quantum filtering using
Harrow et al.
bcz3hc.
Welford's Algorithm
Can compute $\bar{x}$, $\Sigma$ from
one sample at a time. Numerically stable.
- $n, \bar{x}, M_2 \gets 0$.
- For each sample $x$:
-
$n \gets n + 1$
-
$\Delta \gets x - \mu$
-
$\bar{x} \gets \bar{x} + \Delta / n$
-
$M_2 \gets M_2 + \Delta (x - \bar{x})$
- Return mean $\bar{x}$, variance $M_2 / (n - 1)$.
Vector case is similar.
We design experiments using the
PGH: Particle Guess Heuristic
-
Draw $\vec{x}_-, \vec{x}_-'$ from current
posterior.
- Let $t = 1 / |\vec{x}_- - \vec{x}_-'|$.
- Return $e = (\vec{x}_-, t)$.
Adaptively chooses experiments such that
$t |\vec{x}_- - \vec{x}_-'| \approx\,$ constant.