Bijectors.jl

Transforming probability distributions in Julia

https://github.com/TuringLang/Bijectors.jl

A "bijector" you say?

A bijector or diffeomorphism is a differentiable bijection (one-to-one and onto) \(b\) with a differentiable inverse \(b^{-1}\).

For example \(b(x) = \exp(x)\) for \(x \in (-\infty, \infty)\)

  • \(\exp\) is differentiable
  • \(\exp\) has inverse \(\log\)
  • \(\log\) is differentiable (on \((0, \infty)\))

So \(\exp\) (and \(\log\)) is a bijector!

Ok, but why do I care?

Consider a generative process for \(y\) defined

\begin{equation*} \begin{split} x & \sim p \\ y & := b(x) \end{split} \end{equation*}

where \(b\) is some transformation.

If \(b\) is a bijector, this induces a density \(\tilde{p}(y)\) defined by

\begin{equation*} \tilde{p}(y) := p \big( b^{-1}(y) \big) \left| \det \mathcal{J}_{b^{-1}}(y) \right| \end{equation*}

Useful for changing the support of distributions, reparameterization, density estimation, varational inference, etc.

Bijectors.jl

using Bijectors; using Bijectors: Exp, Log

b = Exp()
b⁻¹ = inv(b)

b⁻¹ isa Log
true

We can evaluate a Bijector

x = 0.0
b(x) == 1.0  # since e⁰ = 1
true

We can compose bijectors to get a new Bijector

(b ∘ b) isa Bijector
true

And evaluate compositions of bijectors

(b⁻¹ ∘ b)(x) == x
true

What about more complex/deeper compositions?

cb = b ∘ b ∘ b
cb⁻¹ = inv(cb)        # <= inversion of a "large" composition

(cb⁻¹ ∘ cb)(x) == x
true

As mentioned before, the following terms are of particular interest

\begin{equation*} \log \left| \det \mathcal{J}_{b^{-1}}(y) \right| \quad \text{or} \quad \log \left| \det \mathcal{J}_{b}(x) \right| \end{equation*}

Which works seamlessly even for compositions

logabsdetjac(cb, x)
3.718281828459045

Above logabsdetjac(cb, x) expands to

(y, logjac) = forward(cb.ts[1], x)
var"##res#256" = forward(cb.ts[2], y)
y = (var"##res#256").rv
logjac += (var"##res#256").logabsdetjac
logjac += logabsdetjac(cb.ts[3], y)
return logjac

while if we let cb = Composed([b, b, b]), we get

y, logjac = forward(cb.ts[1], x)
for i = 2:length(cb.ts)
    res = forward(cb.ts[i], y)
    y = res.rv
    logjac += res.logabsdetjac
end

return logjac

And similarily for other methods.

d = Exponential(1.)
Exponential{Float64}(θ=1.0)
support(d)
RealInterval(0.0, Inf)
b = bijector(d)
Log{0}()
td = transformed(d, b) # OR `transformed(d)` in this case
Bijectors.TransformedDistribution{Exponential{Float64},Log{0},Univariate}(
dist: Exponential{Float64}(θ=1.0)
transform: Log{0}()
)

support(td)
RealInterval(-Inf, Inf)

Note: Distributions.support only works for UnivariateDistribution.

Operation Method Freebie
\(b \mapsto b^{-1}\) inv(b) \(\checkmark\)
\((b_1, b_2) \mapsto (b_1 \circ b_2)\) b1 ∘ b2 \(\checkmark\)
\((b_1, b_2) \mapsto [b_1, b_2]\) stack(b1, b2) \(\checkmark\)
Operation Method Freebie
\(x \mapsto b(x)\) b(x) \(\times\)
\(y \mapsto b^{-1}(y)\) inv(b)(y) \(\times\)
\(x \mapsto \log \lvert\det \mathcal{J}_b(x)\rvert\) logabsdetjac(b, x) AD
\(x \mapsto \big( b(x), \log \lvert \det \mathcal{J}_b(x)\rvert \big)\) forward(b, x) \(\checkmark\)
Operation Method Freebie
\(p \mapsto q:= b_* p\) q = transformed(p, b) \(\checkmark\)
\(y \sim q\) y = rand(q) \(\checkmark\)
\(\log q(y)\) logpdf(q, y) \(\checkmark\)
\(p \mapsto b\) s.t. \(\mathrm{support}(b_* p) = \mathbb{R}^d\) bijector(p) \(\times\)
\(\big(x \sim p, b(x), \log \lvert\det \mathcal{J}_b(x)\rvert, \log q(y) \big)\) forward(q) \(\checkmark\)

Example: density estimation

Consider an Affine transformation, i.e.

\begin{equation*} \mathrm{aff}(x) = W x + b \end{equation*}

for matrix \(W\) (with assumption \(\det W \ne 0\)) and vector \(b\),

and a non-linear (but invertible) activation function, e.g. LeakyReLU:

\begin{equation*} a(x) = \begin{cases} x & \text{if } x \ge 0 \\ \alpha x & \text{if } x < 0 \end{cases} \end{equation*}

for some non-zero \(\alpha \in \mathbb{R}\) (usually chosen to be very small).

Looks familiar?

Yup; it's basically an invertible neural network!

layers = [LeakyReLU(α[i]) ∘ Affine(W[i], b[i]) for i = 1:num_layers]

b = foldl(∘, layers)
td = transformed(base_dist, b)  # <= "deep" normalising flow!

nf-banana-density-estimation.gif

Figure 1: Empirical density estimate (blue) compared with single batch of samples (red). Code can be found in scripts/nf_banana.jl.

Thank you!