API reference

Bayesian Inference of Loop Dynamics (BILD)

This module provides the implementation of BILD, proposed by Gabriele, Brandão, Grosse-Holz, et al.. Since the ideas behind this scheme are explained in the paper (and its supplementary information), in the code we only provide technical documentation, assuming knowledge of the reference text.

For the main entry point see bild.sample.

bild.amis

Implementation of AMIS for posterior sampling at fixed k

This module provides the implementation of the AMIS (Cornuet et al. 2012) sampling scheme for BILD, at a fixed number of switches k in the profile; the interface is the FixedkSampler.

We parametrize the binary profiles as (s, theta) where
  • s is a vector of interval lengths, as fraction of the whole trajectory. It thus has dimension k+1 and satisfies sum(s) = 1, as well as s_i > 0 forall i.

  • theta is the vector of states associated with each interval, i.e. an integer valued vector of length k+1. It satisfies the constraints given by the transitions matrix in the MultiStateModel you are using; specifically, subsequent states can never be the same.

The conversion of switch intervals s to discretely sampled profiles is according to the following scheme:

trajectory:       -----x-----x-----x-----x-----x-----x
position:             0.0   0.2   0.4   0.6   0.8   1.0

example (s):           ==0.25==|=====0.5=====|==0.25==
switch positions:              |(0.25)       |(0.75)
conversion:            |<<<<<|<<<<<|<<<<<|<<<<<|<<<<<
θ=(0, 1, 2):        0     0  '  1     1  '  2     2

Note that the conversion step amounts to a simple floor() operation on the switch positions. We chose this conversion over the maybe more “natural” one of rounding to the nearest integer because it guarantees

  • that a uniformly distributed continuous switch position will still be uniformly distributed over its possible discrete positions.

  • conservation of switches: a switch at .99 would disappear from the profile with the rounding scheme.

Importantly, note that in either case an interval shorter than the framerate will likely not be represented in the profile, and thus switches are not always conserved. However, if such profiles are the main contribution to the evidence for this number of switches, we should (and do!) just choose the appropriate lower number of switches instead.

class bild.amis.Dirichlet

Bases: object

Dirichlet distribution

The Dirichlet is implemented in scipy.stats.dirichlet; we wrap it here for consistency with the CFC, and add the method of moments estimator

sample(a, N=1)

Sample from the Dirichlet distribution

Parameters:
  • a ((k+1,) np.ndarray) – concentration parameters

  • N (int, optional) – how many samples to draw

Returns:

(N, k+1) np.ndarray

logpdf(a, ss)

Evaluate Dirichlet distribution

Parameters:
  • a ((k+1,) np.ndarray) – concentration parameters

  • ss ((N, k+1) np.ndarray) – the samples for which to evaluate

Returns:

(N,) np.ndarray

estimate(ss, log_weights)

Method of moments estimator for the Dirichlet distribution

Parameters:
  • ss ((N, k+1) np.ndarray, dtype=float) – the switch positions s for a sample of size n

  • log_weights ((N,) np.ndarray, dtype=float) – the associated weights for all samples (unnormalized)

Returns:

(k+1,) np.ndarray – the concentration parameters for the fitted Dirichlet distribution

Notes

We use the method of moments, because it is straight-forward to apply to a weighted sample. Given an ensemble of switch positions s, we have

  • the mean positions m = <s>

  • the variances v = <(s-m)²>

  • from this we can find the total concentration as A = mean(m*(1-m)/v) - 1

  • thus yielding the final estimate of α = A*m.

class bild.amis.CFC(transitions)

Bases: object

Conflict Free Categorical (CFC)

The CFC is our proposal distribution for state traces θ. These are vectors of length k+1, containing integers between 0 and n-1k being the number of switches and n the number of states we consider. The fundamental twist is that neighboring entries cannot take the same value, which makes it difficult to define a reasonable proposal distribution over this space of state traces.

Generalizing the above constraint, CFC accepts the transitions argument, which specifies which state transitions are allowed or forbidden.

The CFC is parametrized by weights p for each entry being in any of the available states (p.shape = (n, k+1)). The constraints are then enforced by a causal sampling scheme:

  • sample θ[0] ~ Categorical(p[:, 0])

  • select the weights associated with all allowed states (given θ[0]) from p[:, 1] and renormalize

  • sample θ[1] ~ Categorical(p[:, 1]) (with the adapted p[:, 1])

  • continue iteratively

This scheme makes sampling and likelihood evaluation relatively straight-forward. For estimation, we have to convert the actually observed marginals in a sample back to the weights p. Letting f_n and g_n denote the marginals at step i and i-1 respectively, we have that p_n := p[:, i] is given by the solution of

p_n = f_n / sum_{m in D_n}( g_m / ( sum_{k in C_m} 1-p_m ) ) ,

where the index sets are C_n = {k | n --> k is allowed} and D_n = {k | k --> n is allowed}. Solving this equation by fixed point iteration, we find approximate parameters p for given marginals over the states.

Parameters:

transitions ((n, n) np.ndarray, dtype=bool) – transitions[i, j] indicates whether the transition from state i to state j is allowed or not.

transitions

see Parameters

Type:

(n, n) np.ndarray, dtype=bool

MOM_maxiter, MOM_precision

stopping criteria when solving for weight parameters from marginals. See solve_marginals_single.

Type:

int, float

Notes

For numerical stability we always work with log(p) instead of p.

To uniquely identify the weight parameters p for a given distribution, we apply the normalization condition sum(p, axis=0) == 1 (or, equivalently: logsumexp(logp, axis=0) == 0).

property n
sample(logp, N=1)

Sample from the Conflict Free Categorical

Parameters:
  • logp ((n, k+1) np.ndarray) – the weight parameters defining the distribution

  • N (int, optional) – how many traces to sample

Returns:

thetas ((N, k+1) np.ndarray) – the sampled state traces

logpmf(logp, thetas)

Evaluate the Conflict Free Categorical

Parameters:
  • logp ((n, k+1) np.ndarray, dtype=float) – the weights defining the CFC.

  • thetas ((N, k+1) np.ndarray, dtype=int) – the N state traces for which to evaluate the distribution

Returns:

(N,) np.ndarray, dtype=float – the logarithm of the CFC evaluated at the given state traces

estimate(thetas, log_weights)

“Method of Marginals” estimation for the CFC

Parameters:
  • thetas ((N, k+1) np.ndarray, dtype=int) – the sample of state traces

  • log_weights ((N,) np.ndarray, dtype=float) – the associated weights for all samples (unnormalized)

Returns:

logp ((n, k+1) np.ndarray) – the estimated weight parameters; normalized to satisfy logsumexp(logp, axis=0) == 0.

logp_from_marginals(log_marginals)

Calculate weight parameters from marginals

Parameters:

log_marginals ((n, k+1) np.ndarray, dtype=float) – the marginals for each of the k+1 slots; should be normalized: logsumexp(log_marginals, axis=0) == 0

Returns:

logp ((n, k+1) np.ndarray, dtype=float) – the weight parameters corresponding to the given marginals

solve_marginals_single(logf, logg)

Convert marginals to weight parameters

This is a helper function that runs the iteration needed in estimate.

The iteration stops when the absolute difference between successive logp iterates is less than self.MOM_precision; if this does not happen in self.MOM_maxiter steps or less, a RuntimeError is raised.

Parameters:
  • logf ((n,) np.ndarray) – marginals for current and previous step, respectively

  • logg ((n,) np.ndarray) – marginals for current and previous step, respectively

Raises:

RuntimeError – ran out of iterations

Returns:

logp ((n,) np.ndarray) – the weight parameters for the step associated with the marginal f.

uniform_marginals(k)

Compute the state marginals for the uniform distribution

Parameters:

k (int) – the total number of switches

Returns:

log_marginals ((n, k+1) np.ndarray, dtype=float) – the marginals under the uniform distribution (normalized such that logsumexp(log_marginals, axis=0) == 0)

Notes

If self.transitions blocks anything beyond self-transitions, the uniform CFC has neither uniform weight parameters, nor uniform marginals.

This function estimates the marginals under the uniform CFC by counting trajectories passing through a given state slot, which can be done by taking powers of the transition matrix. For long trajectories, this might result in an overflow of the np.integer types; thus this function uses python’s built-in int, which has arbitrary precision. The return value is normalized and cast to regular floats.

logp_uniform(k)

Weight parameters for uniform distribution with k switches

Parameters:

k (int) –

Returns:

logp ((n, k+1) np.ndarray, dtype=float)

Notes

This is just a convenience function, returning logp_from_marginals(uniform_marginals(k)).

N_total(k, log=False)

Give the total number of state traces with k switches

Parameters:
  • k (int) –

  • log (bool) – if True, return math.log(N_total). This is safer than np.log(N_total), since N_total might be large.

Returns:

int

full_sample(k, Nmax=1000)

Give the full sample of trajectories with k switches

Parameters:
  • k (int) –

  • Nmax (int) – assemble the sample only if it contains less than Nmax traces.

Returns:

(N, k+1) np.ndarray, dtype=int

Raises:

ValueError – if self.N_total(k) > Nmax

class bild.amis.FixedkSampler(traj, model, k, N=100, concentration_brake=0.01, polarization_brake=0.001, max_fev=20000, max_fcomplete=1000)

Bases: object

Running the AMIS scheme for a fixed number of switches k

The most important method is step, which performs one AMIS iteration, i.e. it samples N profiles from the current proposal and updates the weights and proposal parameters accordingly.

Parameters:
  • traj (Trajectory) – the trajectory we are sampling for

  • model (MultiStateModel) – the model to use (defines the likelihood and allowed state transitions)

  • k (int) – the number of switches for this run

  • N (int, optional) – the sample size for each AMIS step

  • concentration_brake (float, optional) – limits changes in the total concentration of the Dirichlet proposal by constraining |log(new_concentration/old_concentration)| <= N*concentration_brake, where concentration = sum(α).

  • polarization_brake (float, optional) – limits changes in the CFC proposal by constraining |p_new - p_old| <= N*polarization_brake.

  • max_fev (int, optional) – upper limit on likelihood evaluations. If this limit is reached, the sampler will go to an “exhausted” state, where it does not allow further evaluations.

  • max_fcomplete (int, optional) – threshold up to which exhaustive sampling is considered worthwhile

traj, model

setup for the sampling. See Parameters

Type:

Trajectory, MultiStateModel

k, N

see Parameters

Type:

int

brakes

see Parameters

Type:

(concentration, polarization)

max_fev

see Parameters

Type:

int

max_fcomplete

see Parameters

Type:

int

exhausted

the state of the sampler. See step.

Type:

bool

samples

list of samples. Each sample is a dict; the entries ['ss', 'thetas', 'logLs'] are guaranteed, others might exist.

Type:

list

dirichlet

the proposal distribution over switch intervals

Type:

Dirichlet

cfc

the proposal distribution over state traces

Type:

CFC

parameters

list of proposal parameters for the samples in samples, as tuples (a, logp).

Type:

[((k+1,) np.ndarray, (n, k+1) np.ndarray)]

evidences

estimated evidence at each step. Similar to samples and parameters, this is a list with an entry for each step. The entries are tuples of log-evidence, standard error on the log-evidence, and Kullback-Leibler divergence D_KL( posterior || proposal ) of posterior on proposal. The latter can come in handy in interpreting convergence.

Type:

[(logE, dlogE, KL)]

logprior

value of the uniform prior over profiles.

Type:

float

exception ExhaustionImpractical

Bases: ValueError

st2profile(s, theta)

Convert parameters (s, θ) to Loopingprofile

Parameters:
  • s ((k+1,) np.ndarray, dtype=float) – switch intervals. Should satisfy np.sum(s) == 1.

  • theta ((k+1,) np.ndarray, dtype=int) – states associated with each interval in s.

Returns:

Loopingprofile

log_proposal(parameters, ss, thetas)

Evaluate the proposal distribution

Parameters:
  • parameters ((a, logp)) – tuple of parameters for Dirichlet and CFC, respectively

  • ss ((N, k+1) np.ndarray, dtype=float) – switch intervals

  • thetas ((N, k+1) np.ndarray, dtype=int) – state traces

Returns:

(N,) np.ndarray, dtype=float

logL(ss, thetas)

Evaluate model likelihood

Parameters:
  • ss ((N, k+1) np.ndarray, dtype=float) – switch intervals

  • thetas ((N, k+1) np.ndarray, dtype=int) – state traces

Returns:

(N,) np.ndarray, dtype=float

fix_exhaustive()

Evaluate by exhaustive sampling of the parameter space

Raises:

FixedkSampler.ExhaustionImpractical – if parameter space is too big to warrant exhaustive sampling; i.e. there would be more than max_fcomplete profiles to evaluate.

Notes

Since in this case the evidence is exact, its standard error should be zero. To avoid numerical issues, we set dlogev = 1e-10.

step()

Run a single step of the AMIS sampling scheme

Returns:

bool – whether the sample was performed. False if sampler is exhausted, True otherwise.

See also

FixedkSampler

tstat(other)

Calculate separation (by evidence) from another sampler

Inspired by the t-statistic from frequentist inference, we use the separation score (logev - other_logev) / sqrt( dlogev² + other_dlogev² ).

Returns:

float

MAP_profile()

Give the current MAP estimate

Returns:

Loopingprofile

log_marginal_posterior()

Calculate posterior marginals

Returns:

(n, T) np.ndarray, dtype=float – log marginal posterior over n states for each time point t in the trajectory; normed.

bild.choicesampler

class bild.choicesampler.ChoiceSampler(muhat, shat, N, dE, samplesize=10000)

Bases: object

Sample selection for iterative AMIS

When running BILD, we are faced with the sample selection issue: which k is most important to sample next, given our current estimate of the evidence curve? This turns out to be a non-trivial question, if we take into account the evidence margin ΔE. This class implements an information-based sample selection scheme, described in the Notes.

Parameters:
  • muhat ((k,) np.ndarray, dtype=float) – the current point estimates for the evidence

  • shat ((k,) np.ndarray, dtype=float) – the variance of the estimated means muhat, i.e. the squared standard error of the mean

  • N ((k,) np.ndarray, dtype=int (or float, to allow for np.inf)) – the number of samples used to estimate evidence at each k. “One sample” here is one of the steps we are asking about where to take next; i.e. since for each AMIS iteration we actually sample (usually) 100 profiles, “one sample” here corresponds to 100 profiles.

  • dE (float) – the evidence margin ΔE to apply in the sampling scheme

  • samplesize (int) – the sample size to use for KLD estimation. Technical hyperparameter.

muhat

see Parameters

Type:

(k,) np.ndarray, dtype=float

shat

see Parameters

Type:

(k,) np.ndarray, dtype=float

N

see Parameters

Type:

(k,) np.ndarray, dtype=int (or float, for np.inf)

dE

see Parameters

Type:

float

samplesize

see Parameters

Type:

int

kmax

number of k positions under consideration. Shorthand for len(self.muhat)

Type:

int

EDmu2

expected squared update in evidence upon one more sample at the corresponding k. Calculated analytically.

Type:

(k,) np.ndarray, dtype=float

Dmu

root mean squared update in evidence at k; i.e. sqrt(EDmu2)

Type:

(k,) np.ndarray, dtype=float

_scaled_rvs

the standard normal sample underlying the subsequent estimations

Type:

(samplesize, k) np.ndarray, dtype=float

bestk

for each sample, the k selected under ΔE scheme.

Type:

(samplesize,) np.ndarray, dtype=int

best_is_k

truth table corresponding to self.bestk

Type:

(samplesize, k) np.ndarray, dtype=bool

n0

histogram of bestk.

Type:

(k,) np.ndarray, dtype=int

Notes

The basic idea of this sample selection scheme is to ask “which additional sample do we expect to yield the most useful information?”. This is done by investigating the evolution of the “choice distribution” p(k): the categorical distribution that describes our belief about which k is best under ΔE, given the evidence curve with its uncertainty. The way it is calculated is simply by sampling from the error bars of the evidence curve samplesize times. Accordingly, samplesize should be chosen such that samplesize / kmax >> 1. We can then predict by roughly how much the evidence is likely to change with an additional sample, for each k; this allows us to estimate an expected information gain, as the Kullback-Leibler divergence between the new (predicted) distribution and the current one.

When working through this scheme analytically, note the following subtle point: the expected change in the choice distribution is zero; but the expected KLD is non-zero and meaningful, since it is quadratic in this change. Intuitively: regardless of whether the probability for a given k goes up or down, we learn something.

See also

amis, core.sample

init_sample()

Initialize the internal random sample

This is called in the constructor; it is its own function mainly for code cleanliness; it allows individual instances to be reinitialized.

evaluate(k_change=None, n_step=0, omit_k=None)

Generate sample from choice distribution, possibly with some updates

Parameters:
  • k_change (int, index array, or None) – which k to change

  • n_step (float) – how far to move, in units of the natural step size Dmu for this k (so usually: +1, -1, +-0.5, etc.).

  • omit_k (int, index array, or None) – pretend the evidence curve has not been evaluated at these k, i.e. just ignore them. This is used to estimate the importance of any given k in the “lookahead” scheme.

Returns:

k ((samplesize,) np.ndarray, dtype=int) – sample from the choice distribution

Notes

All samples rely on the same underlying standard normal sample. This makes them highly correlated, which reduces the variance in estimating differences.

Dn()

Calculate the expected change in choice distribution

Returns:

(k, k) np.ndarray, dtype=float[k1, k2] gives the expected change in the histogram counts for k=k2 upon adding a sample at k=k1.

KLD_moreSamples()

Calculate expected KLD upon additional sampling

Returns:

(k,) np.ndarray, dtype=float – estimated KLD for one additional sample at each k

KLD_omitK(omit_k=None)

“Importance” of any given k (or combinations)

Parameters:

omit_k (int, index array, or None) – which k to omit

Returns:

float – the information gain from including these positions in the selection scheme. This can be used to gauge the information gain of an additional sample.

bild.core

Core implementation of the bild module

bild.core.sample(traj, model, dE=0, init_runs=20, certainty_in_k=0.99, k_lookahead=2, k_max=20, sampler_kw={}, choice_kw={}, show_progress=False)

Entry point for BILD

This function executes the whole BILD scheme, i.e. it takes a trajectory and returns a best-fit looping profile (plus some of the internally stored variables that might be of interest for downstream analysis).

Parameters:
  • traj (noctiluca.Trajectory, numpy.ndarray, pandas.DataFrame) –

    the trajectory to sample for.

    This input is handled by the userinput.make_Trajectoy() function of the noctiluca package and thus accepts a range of formats. For a trajectory with N loci, T frames, and d spatial dimensions, numpy arrays can be of shape (N, T, d), (T, d), (T,), while pandas dataframes should have columns (x1, y1, z1, x2, y2, z2, ..., frame (optional)). For precise specs see noctiluca.util.userinput.

  • model (models.MultiStateModel) – the model defining the likelihood function

  • dE (float, optional) – the evidence margin ΔE to apply in finding the actual point estimate. Note that this can also be set post-hoc when evaluating the results; c.f. SamplingResults.best_profile(), which however might lead to less accurate results.

  • init_runs (int) – minimum number of AMIS runs for a new value of k.

  • certainty_in_k (float) – the level of certainty (in the number of switches k) that we need to reach before we stop sampling.

  • k_lookahead (int) – how far we look ahead for global maxima. If any of the positions at k >= k_max - k_lookahead have a noticeable influence on the final choice of k, we explore higher k before sampling the existing ones more. The specific condition for “noticable” is that the information gain from all the existing samples in this region is more than the expected information gain from one additional sample anywhere. The default value of 2 comes from the fact that for binary profiles the evidence follows an odd-even pattern, so it is necessary to look ahead by 2 additional switches.

  • k_max (int) – the maximum number of switches to sample to

  • sampler_kw (dict) – keyword arguments for the amis.FixedkSampler.

  • choice_kw (dict) – keyword arguments for choicesampler.ChoiceSampler.

  • show_progress (bool) – whether to show a progress bar

Returns:

SamplingResults

Notes

There are two possibilities for applying the evidence margin ΔE: while sampling, or post hoc through SamplingResults.best_profile(dE=...). The benefit of specifying the evidence margin during the sampling is that we can take it into account for the sample selection, i.e. the positions relevant to this setting of ΔE will be sampled sufficiently. When setting ΔE post hoc, one always runs the risk of pushing the actual result of the inference into a regime where the sampling machinery decided that we do not need to know a lot of detail. This effect should of course small for small ΔE.

Post-processing functionality is provided in the postproc module, but usually the profile found by sampling alone is already pretty good.

See also

SamplingResults, amis.FixedkSampler, postproc, models

class bild.core.SamplingResults(traj, model, dE, samplers, log=None)

Bases: object

Standard format for sample output

traj

the trajectory these results pertain to

Type:

Trajectory

model

the model used in obtaining them

Type:

MultiStateModel

dE

evidence margin applied during the sampling

Type:

float

samplers

the samplers run for this inference run

Type:

list of FixedkSampler

log

records from the sampling. Mostly useful for inspection / debugging.

Type:

dict

k

list of evaluated k values

Type:

np.ndarray

evidence

the corresponding evidences for each k

Type:

np.ndarray

evidence_se

the standard error on the evidence

Type:

np.ndarray

property k
property evidence
property evidence_se
best_k(dE=None)

Find the best k at given ΔE

Parameters:

dE (float >= 0 or None) – the evidence margin to apply; defaults to self.dE

Returns:

int

See also

best_profile

best_profile(dE=None)

Find the best profile at given ΔE

Parameters:

dE (float >= 0 or None) – the evidence margin to apply; defaults to self.dE

Returns:

Loopingprofile

See also

best_k

log_marginal_posterior(dE=None)

Calculate posterior marginals

Parameters:

dE (float >= 0, None, or 'average') – the evidence margin to apply. If set 'average' (default): instead of picking the best k, average over k, weighted by evidence. Defaults to None, which means “use the internal value self.dE.

Returns:

(n, T) np.ndarray, dtype=float

bild.models

Inference models and the interface they should conform to

The MultiStateModel defines the interface used for inference by the rest of the BILD module. We provide specific implementations, the MultiStateRouse model, as well as the FactorizedModel, which essentially constitutes an HMM.

class bild.models.MultiStateModel

Bases: object

Abstract base class for inference models

The most important capability of any model is the likelihood function logL for a combination of Loopingprofile and Trajectory. Furthermore, a model should be able to tell how many states it has and what number of spatial dimensions it expects through the nStates and d properties respectively.

Finally, there are some convenience functions that are recommended, but not required to implement:

When implementing a MultiStateModel, remember to call init_transitions at the end of your __init__().

transitions

transitions[i, j] indicates whether the transition from state i to state j is allowed or not.

Type:

(n, n) np.ndarray, dtype=bool

init_transitions(n)
property nStates

How many internal states does this model have?

property d

Spatial dimension

initial_loopingprofile(traj)

Give a quick guess for a good Loopingprofile for a Trajectory.

The default implementation gives a random Loopingprofile.

Parameters:

traj (Trajectory) –

Returns:

Loopingprofile

abstract logL(loopingprofile, traj)

Calculate log-likelihood for (Loopingprofile, Trajectory) pair.

Parameters:
Returns:

float – log-likelihood associated with the inputs

trajectory_from_loopingprofile(profile, localization_error=None, missing_frames=None, preproc=None)

Generate a Trajectory for the given Loopingprofile

Parameters:
  • profile (Loopingprofile) –

  • localization_error (float or (d,) np.ndarray, dtype=float) – how much Gaussian noise to add to the trajectory. Specify either as a single float, in which case a Gaussian random variable of that standard deviation will be added to each dimension, or as an array with one value of the standard deviation for each dimension individually.

  • missing_frames (None, float in [0, 1), int, or np.ndarray) –

    frames to remove from the generated trajectory. Can be
    • None or 0 : remove no frames

    • float in (0, 1) : remove frames at random, with this probability

    • int : remove this many frames at random

    • np.ndarray, dtype=int : indices of the frames to remove

  • preproc (string) – specific to this base implementation. Set to 'localization_error' or 'missing_frames' to resolve the corresponding parameter according to the rules outlined above and return it.

Returns:

Trajectory

Notes

Even though this method provides default preprocessing for localization_error and missing_frames, implementations are not forced to use them. So the meaning of these arguments might be different from what’s outlined here.

class bild.models.MultiStateRouse(N, D, k, d=3, looppositions=(None, (0, -1)), measurement='end2end', localization_error=None)

Bases: MultiStateModel

A multi-state Rouse model

This inference model uses a given number of rouse.Model instances to choose from for each propagation interval. In the default use case this switches between a looped and unlooped model, but it could be way more general than that, e.g. incorporating different looped states, loop positions, numbers of loops, etc.

Parameters:
  • N (int) – number of monomers

  • D (float) – Rouse parameters: 1d diffusion constant of free monomers and backbone spring constant

  • k (float) – Rouse parameters: 1d diffusion constant of free monomers and backbone spring constant

  • d (int, optional) – spatial dimension

  • looppositions (list of tuples) – specification of the extra bonds to add for the different states. Each entry corresponds to one possible state of the model and should be a tuple (left_mon, right_mon, rel_strength=1) or a list of such; specification of rel_strength is optional. This will introduce an additional bond between the monomer indexed left_mon and the one indexed right_mon, with strength rel_strength*k. Note that you can remove the i-th backbone bond by giving (i, i+1, -1). Remember that (if relevant) you also have to include the unlooped state, which you can do by giving None instead of a tuple; alternatively, give a vacuous extra bond like (0, 0).

  • measurement ("end2end" or (N,) np.ndarray) – which distance to measure. The default setting “end2end” is equivalent to specifying a vector np.array([-1, 0, ..., 0, 1]), i.e. measuring the distance from the first to the last monomer.

  • localization_error (float, (d,) np.array, or None, optional) – localization error assumed by the model, e.g. in calculating the likelihood or generating trajectories. If None, try to use traj.localization_error where possible. If scalar, will be broadcast to all spatial dimensions.

models

the Rouse models for the individual states

Type:

list of rouse.Model

measurement

the measurement vector

Type:

(N,) np.ndarray

localization_error

if None, use traj.localization_error where possible

Type:

array or None

Notes

This class also provides a function to convert to a FactorizedModel, using the exact steady state distributions of the Rouse models. This is used for example to quickly guess an initial looping profile.

See also

MultiStateModel, rouse.Model

property d

Spatial dimension

logL(profile, traj)

Rouse likelihood, evaluated by Kalman filter

Parameters:
Returns:

float

initial_loopingprofile(traj)

Give an initial guess for a looping profile

Parameters:

traj (Trajectory) – the trajectory under investigation

Returns:

Loopingprofile

trajectory_from_loopingprofile(profile, localization_error=None, missing_frames=None)

Generative model

Parameters:
Returns:

Trajectory

toFactorized()

Give the corresponding FactorizedModel

This is the model that simply calculates likelihoods from the steady state probabilities of each of the individual states.

Returns:

FactorizedModel

class bild.models.FactorizedModel(distributions, d=3)

Bases: MultiStateModel

A simplified model, assuming time scale separation

This model assumes that each point is sampled from one of a given list of distributions, where there is no correlation between the choice of distribution for each point.

Parameters:

distributions (list of distribution objects) – these will usually be scipy.stats.rv_continuous objects (e.g. Maxwell), but can be pretty arbitrary. The only function they have to provide is logpdf(), which should take a scalar or vector of distance values and return a corresponding number of outputs. If you plan on using trajectory_from_loopingprofile, the distributions should also have an rvs() method for sampling.

distributions
Type:

list of distribution objects

Notes

This being a heuristical model, we assume that the localization error is already incorporated in the distributions, as would be the case if they came from experimental data. Therefore, this class ignores the localization_error attribute of Trajectory.

The d attribute mandated by the MultiStateRouse interface is used only for generation of trajectories.

Instances of this class memoize trajectories they have seen before. To reset the memoization, you can either reinstantiate, or clear the cache manually:

>>> model = FactorizedModel(model.distributions)
... model.clear_memo()

If using scipy.stats.maxwell, make sure to use it correctly, i.e. you have to specify scale=.... Writing scipy.stats.maxwell(5) instead of scipy.stats.maxwell(scale=5) shifts the distribution instead of scaling it and leads to -inf values in the likelihood.

Examples

Experimentally measured distributions can be used straightforwardly using scipy.stats.gaussian_kde: assuming we have measured ensembles of distances dists_i for reference states i, we can use

>>> model = FactorizedModel([scipy.stats.gaussian_kde(dists_0),
...                          scipy.stats.gaussian_kde(dists_1),
...                          scipy.stats.gaussian_kde(dists_2)])
property d

Spatial dimension

clear_memo()

Clear the memoization cache

initial_loopingprofile(traj)

Gives the MLE profile for the given trajectory

Parameters:

traj (Trajectory) –

Returns:

Loopingprofile

logL(profile, traj)

Calculate log-likelihood for (Loopingprofile, Trajectory) pair.

Parameters:
Returns:

float – log-likelihood associated with the inputs

trajectory_from_loopingprofile(profile, localization_error=0.0, missing_frames=None)

Generative model

Parameters:
Returns:

Trajectory

Notes

The FactorizedModel only contains distributions for the scalar distance between the points, amounting to the assumption that the full distribution of distance vectors is isotropic. Thus, in generating the trajectory we sample a magnitude from the given distributions and a direction from the unit sphere.

bild.postproc

Some post-processing of inferred profiles

This module provides an iterative optimization scheme for inferred looping profiles. Briefly, at each step we evaluate how much the likelihood increases if we were to move each boundary of the profile and then do the move that gives the largest increase.

bild.postproc.logLR_boundaries(profile, traj, model)

Give log-likelihood ratios for moving each boundary either direction

Parameters:
  • profile (Loopingprofile) – the profile whose boundaries we want to optimize

  • traj (Trajectory) – the trajectory to which the profile applies

  • model (MultiStateModel) – the model providing the likelihood function

Returns:

(k, 2) np.ndarray, dtype=float – the log(likelihood ratio) for moving any of the k boundaries to the left ([i, 0]) / right ([i, 1]).

exception bild.postproc.BoundaryEliminationError

Bases: Exception

bild.postproc.optimize_boundary(profile, traj, model, max_iteration=10000)

Locally optimize the boundaries of a profile

Parameters:
  • profile (Loopingprofile) – the profile whose boundaries we want to optimize

  • traj (Trajectory) – the trajectory to which the profile applies

  • model (MultiStateModel) – the model providing the likelihood function

  • max_iteration (int, optional) – limit on the number of iterations through the (find boundaries) –> (make best move) cycle

Raises:

BoundaryEliminationError – if the local optimization tries to shrink an interval to zero, such that the total number of boundaries would change. This usually suggests that the original sampling might not have been extensive enough.

Returns:

Loopingprofile – the optimized profile

bild.stats

Statistical tools

bild.stats.KM_survival(data, censored, conf=0.95, Tmax=inf, S1at=0)

Kaplan-Meier survival estimator on censored data

This is the standard survival estimator for right-censored data, i.e. data points that are marked as “censored” enter the estimate as t_true > t.

Parameters:
  • data ((N,) array-like) – individual survival times

  • censored ((N,) array-like, boolean) – indicate for each data point whether it is right-censored or not.

  • conf (float in (0, 1), optional) – the confidence bounds on the survival curve to calculate

  • Tmax (float) – can be used to compute survival only up to some time Tmax.

  • S1at (float, optional) – give a natural lower limit for survival times where S = 1.

Returns:

(T, 4) array – the columns of this array are: t, S(t), l(t), u(t), where l and u are lower and upper confidence levels respectively.

bild.stats.MLE_censored_exponential(data, censored, conf=0.95)

MLE estimate for exponential distribution, given right-censored data

Parameters:
  • data (array-like, dtype=float) – the (scalar) values

  • censored (array-like, dtype=bool, same shape as data) – whether the corresponding value in data is censored (True) or completely observed (False).

  • conf (float in [0, 1], optional) – the confidence level to use in calculating bounds

Returns:

m, low, high (float) – point estimate for the mean of the exponential distribution, and confidence bounds.

bild.util

Utilities for BILD

class bild.util.Loopingprofile(states=None)

Bases: object

This class represents a single looping profile

This is a thin wrapper around a state array, whose entries indicate which state of the model should be used for propagation from frame to frame. Specifically, the following scheme applies to the relation between a Loopingprofile profile and the associated Trajectory traj:

profile[0]              profile[1]                                  profile[-1]
-------------> traj[0] ------------> traj[1] --- ... ---> traj[-2] -------------> traj[-1]

Note that profile[i] can be interpreted as “the state of the model used to propagate to traj[i]”. Furthermore, profile[0] indicates the model state used to calculate the steady state ensemble each trajectory starts from.

Operators defined for a Loopingprofile profile:
  • len(profile) gives the length in frames

  • profile[i] for element-access (get/set)

  • profile0 == profile1 if the associated states are equal

states

the internal state array. If this is not given upon initialization, initialized to an empty array.

Type:

np.ndarray, dtype=int

copy()

Copy to a new object

Because we can copy just self.state, this is faster than doing deepcopy(profile).

Returns:

Loopingprofile

count_switches()

Give number of switches in the profile

Returns:

int

intervals()

Find intervals of constant state in the profile

Returns:

list of tuples – each entry signifies an interval, in the format (start, end, state), where start or end are None for the first/last interval in the profile, respectively.

plottable()

For plotting profiles

Returns:

t, y (np.ndarray) – coordinates for plotting the profile as proper step function. See below for usage

Example

Given a profile profile, you might visualize it as

>>> from matplotlib import pyplot as plt
... plt.plot(*profile.plottable(), color='tab:green')
... plt.show()

Note that another way to achieve the same thing is to just use matplotlib’s stairs function:

>>> plt.stairs(profile, edges=np.arange(-1, len(profile)))
bild.util.state_probabilities(profiles, nStates=None)

Calculate marginal probabilities for an ensemble of profiles

Parameters:
  • profiles (list of Loopingprofile) – the profiles to use

  • nStates (int, optional) – the number of states in the model associated with the profiles. Since a profile might be (e.g.) flat zero, it is not in general possible to determine this from the profiles. Does not affect the calculated probabilities, but (“just”) the shape of the output. If None, will be identified from the profiles (i.e. nStates = (largest state index found) + 1)

Returns:

(nStates, T) np.ndarray – the marginal probabilities for each state at each time point