API reference
Bayesian Inference of Loop Dynamics (BILD)
This module provides the implementation of BILD, proposed by Gabriele, Brandão, Grosse-Holz, et al.. Since the ideas behind this scheme are explained in the paper (and its supplementary information), in the code we only provide technical documentation, assuming knowledge of the reference text.
For the main entry point see bild.sample
.
bild.amis
Implementation of AMIS for posterior sampling at fixed k
This module provides the implementation of the AMIS (Cornuet et al. 2012) sampling scheme for BILD,
at a fixed number of switches k
in the profile; the interface is the
FixedkSampler
.
- We parametrize the binary profiles as
(s, theta)
where s
is a vector of interval lengths, as fraction of the whole trajectory. It thus has dimensionk+1
and satisfiessum(s) = 1
, as well ass_i > 0 forall i
.theta
is the vector of states associated with each interval, i.e. an integer valued vector of lengthk+1
. It satisfies the constraints given by thetransitions
matrix in theMultiStateModel
you are using; specifically, subsequent states can never be the same.
The conversion of switch intervals s
to discretely sampled profiles is
according to the following scheme:
trajectory: -----x-----x-----x-----x-----x-----x
position: 0.0 0.2 0.4 0.6 0.8 1.0
example (s): ==0.25==|=====0.5=====|==0.25==
switch positions: |(0.25) |(0.75)
conversion: |<<<<<|<<<<<|<<<<<|<<<<<|<<<<<
θ=(0, 1, 2): 0 0 ' 1 1 ' 2 2
Note that the conversion step amounts to a simple floor()
operation on the
switch positions. We chose this conversion over the maybe more “natural” one of
rounding to the nearest integer because it guarantees
that a uniformly distributed continuous switch position will still be uniformly distributed over its possible discrete positions.
conservation of switches: a switch at .99 would disappear from the profile with the rounding scheme.
Importantly, note that in either case an interval shorter than the framerate will likely not be represented in the profile, and thus switches are not always conserved. However, if such profiles are the main contribution to the evidence for this number of switches, we should (and do!) just choose the appropriate lower number of switches instead.
- class bild.amis.Dirichlet
Bases:
object
Dirichlet distribution
The Dirichlet is implemented in
scipy.stats.dirichlet
; we wrap it here for consistency with theCFC
, and add the method of moments estimator- sample(a, N=1)
Sample from the Dirichlet distribution
- Parameters:
a ((k+1,) np.ndarray) – concentration parameters
N (int, optional) – how many samples to draw
- Returns:
(N, k+1) np.ndarray
- logpdf(a, ss)
Evaluate Dirichlet distribution
- Parameters:
a ((k+1,) np.ndarray) – concentration parameters
ss ((N, k+1) np.ndarray) – the samples for which to evaluate
- Returns:
(N,) np.ndarray
- estimate(ss, log_weights)
Method of moments estimator for the Dirichlet distribution
- Parameters:
ss ((N, k+1) np.ndarray, dtype=float) – the switch positions
s
for a sample of sizen
log_weights ((N,) np.ndarray, dtype=float) – the associated weights for all samples (unnormalized)
- Returns:
(k+1,) np.ndarray – the concentration parameters for the fitted Dirichlet distribution
Notes
We use the method of moments, because it is straight-forward to apply to a weighted sample. Given an ensemble of switch positions
s
, we havethe mean positions
m = <s>
the variances
v = <(s-m)²>
from this we can find the total concentration as
A = mean(m*(1-m)/v) - 1
thus yielding the final estimate of
α = A*m
.
- class bild.amis.CFC(transitions)
Bases:
object
Conflict Free Categorical (CFC)
The CFC is our proposal distribution for state traces θ. These are vectors of length
k+1
, containing integers between 0 andn-1
—k
being the number of switches andn
the number of states we consider. The fundamental twist is that neighboring entries cannot take the same value, which makes it difficult to define a reasonable proposal distribution over this space of state traces.Generalizing the above constraint,
CFC
accepts thetransitions
argument, which specifies which state transitions are allowed or forbidden.The CFC is parametrized by weights
p
for each entry being in any of the available states (p.shape = (n, k+1)
). The constraints are then enforced by a causal sampling scheme:sample
θ[0] ~ Categorical(p[:, 0])
select the weights associated with all allowed states (given
θ[0]
) fromp[:, 1]
and renormalizesample
θ[1] ~ Categorical(p[:, 1])
(with the adaptedp[:, 1]
)continue iteratively
This scheme makes sampling and likelihood evaluation relatively straight-forward. For estimation, we have to convert the actually observed marginals in a sample back to the weights
p
. Lettingf_n
andg_n
denote the marginals at stepi
andi-1
respectively, we have thatp_n := p[:, i]
is given by the solution ofp_n = f_n / sum_{m in D_n}( g_m / ( sum_{k in C_m} 1-p_m ) ) ,
where the index sets are
C_n = {k | n --> k is allowed}
andD_n = {k | k --> n is allowed}
. Solving this equation by fixed point iteration, we find approximate parametersp
for given marginals over the states.- Parameters:
transitions ((n, n) np.ndarray, dtype=bool) –
transitions[i, j]
indicates whether the transition from statei
to statej
is allowed or not.
- transitions
see Parameters
- Type:
(n, n) np.ndarray, dtype=bool
- MOM_maxiter, MOM_precision
stopping criteria when solving for weight parameters from marginals. See
solve_marginals_single
.- Type:
int, float
Notes
For numerical stability we always work with
log(p)
instead ofp
.To uniquely identify the weight parameters
p
for a given distribution, we apply the normalization conditionsum(p, axis=0) == 1
(or, equivalently:logsumexp(logp, axis=0) == 0
).- property n
- sample(logp, N=1)
Sample from the Conflict Free Categorical
- Parameters:
logp ((n, k+1) np.ndarray) – the weight parameters defining the distribution
N (int, optional) – how many traces to sample
- Returns:
thetas ((N, k+1) np.ndarray) – the sampled state traces
- logpmf(logp, thetas)
Evaluate the Conflict Free Categorical
- Parameters:
logp ((n, k+1) np.ndarray, dtype=float) – the weights defining the CFC.
thetas ((N, k+1) np.ndarray, dtype=int) – the
N
state traces for which to evaluate the distribution
- Returns:
(N,) np.ndarray, dtype=float – the logarithm of the CFC evaluated at the given state traces
- estimate(thetas, log_weights)
“Method of Marginals” estimation for the CFC
- Parameters:
thetas ((N, k+1) np.ndarray, dtype=int) – the sample of state traces
log_weights ((N,) np.ndarray, dtype=float) – the associated weights for all samples (unnormalized)
- Returns:
logp ((n, k+1) np.ndarray) – the estimated weight parameters; normalized to satisfy
logsumexp(logp, axis=0) == 0
.
- logp_from_marginals(log_marginals)
Calculate weight parameters from marginals
- Parameters:
log_marginals ((n, k+1) np.ndarray, dtype=float) – the marginals for each of the
k+1
slots; should be normalized:logsumexp(log_marginals, axis=0) == 0
- Returns:
logp ((n, k+1) np.ndarray, dtype=float) – the weight parameters corresponding to the given marginals
See also
- solve_marginals_single(logf, logg)
Convert marginals to weight parameters
This is a helper function that runs the iteration needed in
estimate
.The iteration stops when the absolute difference between successive
logp
iterates is less thanself.MOM_precision
; if this does not happen inself.MOM_maxiter
steps or less, aRuntimeError
is raised.- Parameters:
logf ((n,) np.ndarray) – marginals for current and previous step, respectively
logg ((n,) np.ndarray) – marginals for current and previous step, respectively
- Raises:
RuntimeError – ran out of iterations
- Returns:
logp ((n,) np.ndarray) – the weight parameters for the step associated with the marginal
f
.
- uniform_marginals(k)
Compute the state marginals for the uniform distribution
- Parameters:
k (int) – the total number of switches
- Returns:
log_marginals ((n, k+1) np.ndarray, dtype=float) – the marginals under the uniform distribution (normalized such that
logsumexp(log_marginals, axis=0) == 0
)
See also
Notes
If
self.transitions
blocks anything beyond self-transitions, the uniform CFC has neither uniform weight parameters, nor uniform marginals.This function estimates the marginals under the uniform CFC by counting trajectories passing through a given state slot, which can be done by taking powers of the transition matrix. For long trajectories, this might result in an overflow of the
np.integer
types; thus this function uses python’s built-inint
, which has arbitrary precision. The return value is normalized and cast to regular floats.
- logp_uniform(k)
Weight parameters for uniform distribution with
k
switches- Parameters:
k (int) –
- Returns:
logp ((n, k+1) np.ndarray, dtype=float)
See also
Notes
This is just a convenience function, returning
logp_from_marginals(uniform_marginals(k))
.
- N_total(k, log=False)
Give the total number of state traces with k switches
- Parameters:
k (int) –
log (bool) – if
True
, returnmath.log(N_total)
. This is safer thannp.log(N_total)
, sinceN_total
might be large.
- Returns:
int
- full_sample(k, Nmax=1000)
Give the full sample of trajectories with
k
switches- Parameters:
k (int) –
Nmax (int) – assemble the sample only if it contains less than
Nmax
traces.
- Returns:
(N, k+1) np.ndarray, dtype=int
- Raises:
ValueError – if
self.N_total(k) > Nmax
- class bild.amis.FixedkSampler(traj, model, k, N=100, concentration_brake=0.01, polarization_brake=0.001, max_fev=20000, max_fcomplete=1000)
Bases:
object
Running the AMIS scheme for a fixed number of switches k
The most important method is
step
, which performs one AMIS iteration, i.e. it samplesN
profiles from the current proposal and updates the weights and proposal parameters accordingly.- Parameters:
traj (Trajectory) – the trajectory we are sampling for
model (MultiStateModel) – the model to use (defines the likelihood and allowed state transitions)
k (int) – the number of switches for this run
N (int, optional) – the sample size for each AMIS step
concentration_brake (float, optional) – limits changes in the total concentration of the Dirichlet proposal by constraining
|log(new_concentration/old_concentration)| <= N*concentration_brake
, whereconcentration = sum(α)
.polarization_brake (float, optional) – limits changes in the CFC proposal by constraining
|p_new - p_old| <= N*polarization_brake
.max_fev (int, optional) – upper limit on likelihood evaluations. If this limit is reached, the sampler will go to an “exhausted” state, where it does not allow further evaluations.
max_fcomplete (int, optional) – threshold up to which exhaustive sampling is considered worthwhile
- traj, model
setup for the sampling. See Parameters
- Type:
Trajectory, MultiStateModel
- k, N
see Parameters
- Type:
int
- brakes
see Parameters
- Type:
(concentration, polarization)
- max_fev
see Parameters
- Type:
int
- max_fcomplete
see Parameters
- Type:
int
- samples
list of samples. Each sample is a dict; the entries
['ss', 'thetas', 'logLs']
are guaranteed, others might exist.- Type:
list
- parameters
list of proposal parameters for the samples in
samples
, as tuples(a, logp)
.- Type:
[((k+1,) np.ndarray, (n, k+1) np.ndarray)]
- evidences
estimated evidence at each step. Similar to
samples
andparameters
, this is a list with an entry for eachstep
. The entries are tuples of log-evidence, standard error on the log-evidence, and Kullback-Leibler divergenceD_KL( posterior || proposal )
of posterior on proposal. The latter can come in handy in interpreting convergence.- Type:
[(logE, dlogE, KL)]
- logprior
value of the uniform prior over profiles.
- Type:
float
- exception ExhaustionImpractical
Bases:
ValueError
- st2profile(s, theta)
Convert parameters (s, θ) to Loopingprofile
- Parameters:
s ((k+1,) np.ndarray, dtype=float) – switch intervals. Should satisfy
np.sum(s) == 1
.theta ((k+1,) np.ndarray, dtype=int) – states associated with each interval in
s
.
- Returns:
Loopingprofile
- log_proposal(parameters, ss, thetas)
Evaluate the proposal distribution
- Parameters:
parameters ((a, logp)) – tuple of parameters for Dirichlet and CFC, respectively
ss ((N, k+1) np.ndarray, dtype=float) – switch intervals
thetas ((N, k+1) np.ndarray, dtype=int) – state traces
- Returns:
(N,) np.ndarray, dtype=float
- logL(ss, thetas)
Evaluate model likelihood
- Parameters:
ss ((N, k+1) np.ndarray, dtype=float) – switch intervals
thetas ((N, k+1) np.ndarray, dtype=int) – state traces
- Returns:
(N,) np.ndarray, dtype=float
- fix_exhaustive()
Evaluate by exhaustive sampling of the parameter space
- Raises:
FixedkSampler.ExhaustionImpractical – if parameter space is too big to warrant exhaustive sampling; i.e. there would be more than
max_fcomplete
profiles to evaluate.
Notes
Since in this case the evidence is exact, its standard error should be zero. To avoid numerical issues, we set
dlogev = 1e-10
.
- step()
Run a single step of the AMIS sampling scheme
- Returns:
bool – whether the sample was performed.
False
if sampler is exhausted,True
otherwise.
See also
- tstat(other)
Calculate separation (by evidence) from another sampler
Inspired by the t-statistic from frequentist inference, we use the separation score
(logev - other_logev) / sqrt( dlogev² + other_dlogev² )
.- Returns:
float
- MAP_profile()
Give the current MAP estimate
- Returns:
Loopingprofile
- log_marginal_posterior()
Calculate posterior marginals
- Returns:
(n, T) np.ndarray, dtype=float – log marginal posterior over
n
states for each time pointt
in the trajectory; normed.
bild.choicesampler
- class bild.choicesampler.ChoiceSampler(muhat, shat, N, dE, samplesize=10000)
Bases:
object
Sample selection for iterative AMIS
When running BILD, we are faced with the sample selection issue: which k is most important to sample next, given our current estimate of the evidence curve? This turns out to be a non-trivial question, if we take into account the evidence margin ΔE. This class implements an information-based sample selection scheme, described in the Notes.
- Parameters:
muhat ((k,) np.ndarray, dtype=float) – the current point estimates for the evidence
shat ((k,) np.ndarray, dtype=float) – the variance of the estimated means
muhat
, i.e. the squared standard error of the meanN ((k,) np.ndarray, dtype=int (or float, to allow for np.inf)) – the number of samples used to estimate evidence at each k. “One sample” here is one of the steps we are asking about where to take next; i.e. since for each AMIS iteration we actually sample (usually) 100 profiles, “one sample” here corresponds to 100 profiles.
dE (float) – the evidence margin ΔE to apply in the sampling scheme
samplesize (int) – the sample size to use for KLD estimation. Technical hyperparameter.
- muhat
see Parameters
- Type:
(k,) np.ndarray, dtype=float
- shat
see Parameters
- Type:
(k,) np.ndarray, dtype=float
- N
see Parameters
- Type:
(k,) np.ndarray, dtype=int (or float, for np.inf)
- dE
see Parameters
- Type:
float
- samplesize
see Parameters
- Type:
int
- kmax
number of k positions under consideration. Shorthand for
len(self.muhat)
- Type:
int
- EDmu2
expected squared update in evidence upon one more sample at the corresponding k. Calculated analytically.
- Type:
(k,) np.ndarray, dtype=float
- Dmu
root mean squared update in evidence at k; i.e.
sqrt(EDmu2)
- Type:
(k,) np.ndarray, dtype=float
- _scaled_rvs
the standard normal sample underlying the subsequent estimations
- Type:
(samplesize, k) np.ndarray, dtype=float
- bestk
for each sample, the k selected under ΔE scheme.
- Type:
(samplesize,) np.ndarray, dtype=int
- best_is_k
truth table corresponding to
self.bestk
- Type:
(samplesize, k) np.ndarray, dtype=bool
Notes
The basic idea of this sample selection scheme is to ask “which additional sample do we expect to yield the most useful information?”. This is done by investigating the evolution of the “choice distribution” p(k): the categorical distribution that describes our belief about which k is best under ΔE, given the evidence curve with its uncertainty. The way it is calculated is simply by sampling from the error bars of the evidence curve
samplesize
times. Accordingly,samplesize
should be chosen such thatsamplesize / kmax >> 1
. We can then predict by roughly how much the evidence is likely to change with an additional sample, for each k; this allows us to estimate an expected information gain, as the Kullback-Leibler divergence between the new (predicted) distribution and the current one.When working through this scheme analytically, note the following subtle point: the expected change in the choice distribution is zero; but the expected KLD is non-zero and meaningful, since it is quadratic in this change. Intuitively: regardless of whether the probability for a given k goes up or down, we learn something.
See also
amis
,core.sample
- init_sample()
Initialize the internal random sample
This is called in the constructor; it is its own function mainly for code cleanliness; it allows individual instances to be reinitialized.
- evaluate(k_change=None, n_step=0, omit_k=None)
Generate sample from choice distribution, possibly with some updates
- Parameters:
k_change (int, index array, or None) – which k to change
n_step (float) – how far to move, in units of the natural step size
Dmu
for this k (so usually: +1, -1, +-0.5, etc.).omit_k (int, index array, or None) – pretend the evidence curve has not been evaluated at these k, i.e. just ignore them. This is used to estimate the importance of any given k in the “lookahead” scheme.
- Returns:
k ((samplesize,) np.ndarray, dtype=int) – sample from the choice distribution
Notes
All samples rely on the same underlying standard normal sample. This makes them highly correlated, which reduces the variance in estimating differences.
- Dn()
Calculate the expected change in choice distribution
- Returns:
(k, k) np.ndarray, dtype=float –
[k1, k2]
gives the expected change in the histogram counts for k=k2 upon adding a sample at k=k1.
- KLD_moreSamples()
Calculate expected KLD upon additional sampling
- Returns:
(k,) np.ndarray, dtype=float – estimated KLD for one additional sample at each k
- KLD_omitK(omit_k=None)
“Importance” of any given k (or combinations)
- Parameters:
omit_k (int, index array, or None) – which k to omit
- Returns:
float – the information gain from including these positions in the selection scheme. This can be used to gauge the information gain of an additional sample.
bild.core
Core implementation of the bild
module
- bild.core.sample(traj, model, dE=0, init_runs=20, certainty_in_k=0.99, k_lookahead=2, k_max=20, sampler_kw={}, choice_kw={}, show_progress=False)
Entry point for BILD
This function executes the whole BILD scheme, i.e. it takes a trajectory and returns a best-fit looping profile (plus some of the internally stored variables that might be of interest for downstream analysis).
- Parameters:
traj (noctiluca.Trajectory, numpy.ndarray, pandas.DataFrame) –
the trajectory to sample for.
This input is handled by the userinput.make_Trajectoy() function of the
noctiluca
package and thus accepts a range of formats. For a trajectory withN
loci,T
frames, andd
spatial dimensions, numpy arrays can be of shape(N, T, d)
,(T, d)
,(T,)
, while pandas dataframes should have columns(x1, y1, z1, x2, y2, z2, ..., frame (optional))
. For precise specs see noctiluca.util.userinput.model (models.MultiStateModel) – the model defining the likelihood function
dE (float, optional) – the evidence margin ΔE to apply in finding the actual point estimate. Note that this can also be set post-hoc when evaluating the results; c.f.
SamplingResults.best_profile()
, which however might lead to less accurate results.init_runs (int) – minimum number of AMIS runs for a new value of
k
.certainty_in_k (float) – the level of certainty (in the number of switches k) that we need to reach before we stop sampling.
k_lookahead (int) – how far we look ahead for global maxima. If any of the positions at
k >= k_max - k_lookahead
have a noticeable influence on the final choice ofk
, we explore higher k before sampling the existing ones more. The specific condition for “noticable” is that the information gain from all the existing samples in this region is more than the expected information gain from one additional sample anywhere. The default value of 2 comes from the fact that for binary profiles the evidence follows an odd-even pattern, so it is necessary to look ahead by 2 additional switches.k_max (int) – the maximum number of switches to sample to
sampler_kw (dict) – keyword arguments for the
amis.FixedkSampler
.choice_kw (dict) – keyword arguments for
choicesampler.ChoiceSampler
.show_progress (bool) – whether to show a progress bar
- Returns:
SamplingResults
Notes
There are two possibilities for applying the evidence margin ΔE: while sampling, or post hoc through
SamplingResults.best_profile(dE=...)
. The benefit of specifying the evidence margin during the sampling is that we can take it into account for the sample selection, i.e. the positions relevant to this setting of ΔE will be sampled sufficiently. When setting ΔE post hoc, one always runs the risk of pushing the actual result of the inference into a regime where the sampling machinery decided that we do not need to know a lot of detail. This effect should of course small for small ΔE.Post-processing functionality is provided in the
postproc
module, but usually the profile found by sampling alone is already pretty good.See also
SamplingResults
,amis.FixedkSampler
,postproc
,models
- class bild.core.SamplingResults(traj, model, dE, samplers, log=None)
Bases:
object
Standard format for
sample
output- traj
the trajectory these results pertain to
- Type:
Trajectory
- model
the model used in obtaining them
- Type:
- dE
evidence margin applied during the sampling
- Type:
float
- samplers
the samplers run for this inference run
- Type:
list of FixedkSampler
- log
records from the sampling. Mostly useful for inspection / debugging.
- Type:
dict
- k
list of evaluated
k
values- Type:
np.ndarray
- evidence
the corresponding evidences for each
k
- Type:
np.ndarray
- evidence_se
the standard error on the evidence
- Type:
np.ndarray
- property k
- property evidence
- property evidence_se
- best_k(dE=None)
Find the best k at given ΔE
- Parameters:
dE (float >= 0 or None) – the evidence margin to apply; defaults to
self.dE
- Returns:
int
See also
- best_profile(dE=None)
Find the best profile at given ΔE
- Parameters:
dE (float >= 0 or None) – the evidence margin to apply; defaults to
self.dE
- Returns:
Loopingprofile
See also
- log_marginal_posterior(dE=None)
Calculate posterior marginals
- Parameters:
dE (float >= 0, None, or 'average') – the evidence margin to apply. If set
'average'
(default): instead of picking the bestk
, average overk
, weighted by evidence. Defaults toNone
, which means “use the internal valueself.dE
.- Returns:
(n, T) np.ndarray, dtype=float
bild.models
Inference models and the interface they should conform to
The MultiStateModel
defines the interface used for inference by the rest of
the BILD module. We provide specific implementations, the MultiStateRouse
model, as well as the FactorizedModel
, which essentially constitutes an HMM.
- class bild.models.MultiStateModel
Bases:
object
Abstract base class for inference models
The most important capability of any model is the likelihood function
logL
for a combination ofLoopingprofile
andTrajectory
. Furthermore, a model should be able to tell how many states it has and what number of spatial dimensions it expects through thenStates
andd
properties respectively.Finally, there are some convenience functions that are recommended, but not required to implement:
provide an initial guess for a good profile by
initial_loopingprofile
sample from the likelihood by
trajectory_from_loopingprofile
When implementing a
MultiStateModel
, remember to callinit_transitions
at the end of your__init__()
.- transitions
transitions[i, j]
indicates whether the transition from statei
to statej
is allowed or not.- Type:
(n, n) np.ndarray, dtype=bool
- init_transitions(n)
- property nStates
How many internal states does this model have?
- property d
Spatial dimension
- initial_loopingprofile(traj)
Give a quick guess for a good
Loopingprofile
for aTrajectory
.The default implementation gives a random
Loopingprofile
.- Parameters:
traj (Trajectory) –
- Returns:
Loopingprofile
- abstract logL(loopingprofile, traj)
Calculate log-likelihood for (
Loopingprofile
,Trajectory
) pair.- Parameters:
loopingprofile (Loopingprofile) –
traj (Trajectory) –
- Returns:
float – log-likelihood associated with the inputs
- trajectory_from_loopingprofile(profile, localization_error=None, missing_frames=None, preproc=None)
Generate a
Trajectory
for the givenLoopingprofile
- Parameters:
profile (Loopingprofile) –
localization_error (float or (d,) np.ndarray, dtype=float) – how much Gaussian noise to add to the trajectory. Specify either as a single float, in which case a Gaussian random variable of that standard deviation will be added to each dimension, or as an array with one value of the standard deviation for each dimension individually.
missing_frames (None, float in [0, 1), int, or np.ndarray) –
- frames to remove from the generated trajectory. Can be
None
or0
: remove no framesfloat in (0, 1)
: remove frames at random, with this probabilityint
: remove this many frames at randomnp.ndarray, dtype=int
: indices of the frames to remove
preproc (string) – specific to this base implementation. Set to
'localization_error'
or'missing_frames'
to resolve the corresponding parameter according to the rules outlined above and return it.
- Returns:
Trajectory
Notes
Even though this method provides default preprocessing for
localization_error
andmissing_frames
, implementations are not forced to use them. So the meaning of these arguments might be different from what’s outlined here.
- class bild.models.MultiStateRouse(N, D, k, d=3, looppositions=(None, (0, -1)), measurement='end2end', localization_error=None)
Bases:
MultiStateModel
A multi-state Rouse model
This inference model uses a given number of
rouse.Model
instances to choose from for each propagation interval. In the default use case this switches between a looped and unlooped model, but it could be way more general than that, e.g. incorporating different looped states, loop positions, numbers of loops, etc.- Parameters:
N (int) – number of monomers
D (float) – Rouse parameters: 1d diffusion constant of free monomers and backbone spring constant
k (float) – Rouse parameters: 1d diffusion constant of free monomers and backbone spring constant
d (int, optional) – spatial dimension
looppositions (list of tuples) – specification of the extra bonds to add for the different states. Each entry corresponds to one possible state of the model and should be a tuple
(left_mon, right_mon, rel_strength=1)
or a list of such; specification ofrel_strength
is optional. This will introduce an additional bond between the monomer indexedleft_mon
and the one indexedright_mon
, with strengthrel_strength*k
. Note that you can remove the i-th backbone bond by giving(i, i+1, -1)
. Remember that (if relevant) you also have to include the unlooped state, which you can do by givingNone
instead of a tuple; alternatively, give a vacuous extra bond like(0, 0)
.measurement ("end2end" or (N,) np.ndarray) – which distance to measure. The default setting “end2end” is equivalent to specifying a vector
np.array([-1, 0, ..., 0, 1])
, i.e. measuring the distance from the first to the last monomer.localization_error (float, (d,) np.array, or None, optional) – localization error assumed by the model, e.g. in calculating the likelihood or generating trajectories. If
None
, try to usetraj.localization_error
where possible. If scalar, will be broadcast to all spatial dimensions.
- models
the Rouse models for the individual states
- Type:
list of
rouse.Model
- measurement
the measurement vector
- Type:
(N,) np.ndarray
- localization_error
if
None
, usetraj.localization_error
where possible- Type:
array or None
Notes
This class also provides a function to convert to a
FactorizedModel
, using the exact steady state distributions of the Rouse models. This is used for example to quickly guess an initial looping profile.See also
MultiStateModel
,rouse.Model
- property d
Spatial dimension
- logL(profile, traj)
Rouse likelihood, evaluated by Kalman filter
- Parameters:
profile (Loopingprofile) –
traj (noctiluca.Trajectory) –
- Returns:
float
- initial_loopingprofile(traj)
Give an initial guess for a looping profile
- Parameters:
traj (Trajectory) – the trajectory under investigation
- Returns:
Loopingprofile
- trajectory_from_loopingprofile(profile, localization_error=None, missing_frames=None)
Generative model
- Parameters:
profile (Loopingprofile) – the profile from whose associated ensemble to sample
localization_error (float or (d,) np.ndarray, dtype=float) – see
MultiStateModel.trajectory_from_loopingprofile
missing_frames (None, float in [0, 1), int, or np.ndarray) – see
MultiStateModel.trajectory_from_loopingprofile
- Returns:
Trajectory
- toFactorized()
Give the corresponding
FactorizedModel
This is the model that simply calculates likelihoods from the steady state probabilities of each of the individual states.
- Returns:
FactorizedModel
- class bild.models.FactorizedModel(distributions, d=3)
Bases:
MultiStateModel
A simplified model, assuming time scale separation
This model assumes that each point is sampled from one of a given list of distributions, where there is no correlation between the choice of distribution for each point.
- Parameters:
distributions (list of distribution objects) – these will usually be
scipy.stats.rv_continuous
objects (e.g. Maxwell), but can be pretty arbitrary. The only function they have to provide islogpdf()
, which should take a scalar or vector of distance values and return a corresponding number of outputs. If you plan on usingtrajectory_from_loopingprofile
, the distributions should also have anrvs()
method for sampling.
- distributions
- Type:
list of distribution objects
Notes
This being a heuristical model, we assume that the localization error is already incorporated in the
distributions
, as would be the case if they came from experimental data. Therefore, this class ignores thelocalization_error
attribute ofTrajectory
.The
d
attribute mandated by theMultiStateRouse
interface is used only for generation of trajectories.Instances of this class memoize trajectories they have seen before. To reset the memoization, you can either reinstantiate, or clear the cache manually:
>>> model = FactorizedModel(model.distributions) ... model.clear_memo()
If using
scipy.stats.maxwell
, make sure to use it correctly, i.e. you have to specifyscale=...
. Writingscipy.stats.maxwell(5)
instead ofscipy.stats.maxwell(scale=5)
shifts the distribution instead of scaling it and leads to-inf
values in the likelihood.Examples
Experimentally measured distributions can be used straightforwardly using
scipy.stats.gaussian_kde
: assuming we have measured ensembles of distancesdists_i
for reference statesi
, we can use>>> model = FactorizedModel([scipy.stats.gaussian_kde(dists_0), ... scipy.stats.gaussian_kde(dists_1), ... scipy.stats.gaussian_kde(dists_2)])
- property d
Spatial dimension
- clear_memo()
Clear the memoization cache
- initial_loopingprofile(traj)
Gives the MLE profile for the given trajectory
- Parameters:
traj (Trajectory) –
- Returns:
Loopingprofile
- logL(profile, traj)
Calculate log-likelihood for (
Loopingprofile
,Trajectory
) pair.- Parameters:
loopingprofile (Loopingprofile) –
traj (Trajectory) –
- Returns:
float – log-likelihood associated with the inputs
- trajectory_from_loopingprofile(profile, localization_error=0.0, missing_frames=None)
Generative model
- Parameters:
profile (Loopingprofile) – the profile from whose associated ensemble to sample
localization_error (float or (d,) np.ndarray, dtype=float) – see
MultiStateModel.trajectory_from_loopingprofile
; note that since the localization error should already be accounted for in thedistributions
of the model, it is not added to the trajectory here. Instead, it is just written totraj.localization_error
.missing_frames (None, float in [0, 1), int, or np.ndarray) – see
MultiStateModel.trajectory_from_loopingprofile
- Returns:
Trajectory
Notes
The
FactorizedModel
only contains distributions for the scalar distance between the points, amounting to the assumption that the full distribution of distance vectors is isotropic. Thus, in generating the trajectory we sample a magnitude from the given distributions and a direction from the unit sphere.
bild.postproc
Some post-processing of inferred profiles
This module provides an iterative optimization scheme for inferred looping profiles. Briefly, at each step we evaluate how much the likelihood increases if we were to move each boundary of the profile and then do the move that gives the largest increase.
- bild.postproc.logLR_boundaries(profile, traj, model)
Give log-likelihood ratios for moving each boundary either direction
- Parameters:
profile (Loopingprofile) – the profile whose boundaries we want to optimize
traj (Trajectory) – the trajectory to which the profile applies
model (MultiStateModel) – the model providing the likelihood function
- Returns:
(k, 2) np.ndarray, dtype=float – the log(likelihood ratio) for moving any of the
k
boundaries to the left ([i, 0]
) / right ([i, 1]
).
See also
- exception bild.postproc.BoundaryEliminationError
Bases:
Exception
- bild.postproc.optimize_boundary(profile, traj, model, max_iteration=10000)
Locally optimize the boundaries of a profile
- Parameters:
profile (Loopingprofile) – the profile whose boundaries we want to optimize
traj (Trajectory) – the trajectory to which the profile applies
model (MultiStateModel) – the model providing the likelihood function
max_iteration (int, optional) – limit on the number of iterations through the (find boundaries) –> (make best move) cycle
- Raises:
BoundaryEliminationError – if the local optimization tries to shrink an interval to zero, such that the total number of boundaries would change. This usually suggests that the original sampling might not have been extensive enough.
- Returns:
Loopingprofile – the optimized profile
bild.stats
Statistical tools
- bild.stats.KM_survival(data, censored, conf=0.95, Tmax=inf, S1at=0)
Kaplan-Meier survival estimator on censored data
This is the standard survival estimator for right-censored data, i.e. data points that are marked as “censored” enter the estimate as
t_true > t
.- Parameters:
data ((N,) array-like) – individual survival times
censored ((N,) array-like, boolean) – indicate for each data point whether it is right-censored or not.
conf (float in (0, 1), optional) – the confidence bounds on the survival curve to calculate
Tmax (float) – can be used to compute survival only up to some time
Tmax
.S1at (float, optional) – give a natural lower limit for survival times where S = 1.
- Returns:
(T, 4) array – the columns of this array are: t, S(t), l(t), u(t), where l and u are lower and upper confidence levels respectively.
- bild.stats.MLE_censored_exponential(data, censored, conf=0.95)
MLE estimate for exponential distribution, given right-censored data
- Parameters:
data (array-like, dtype=float) – the (scalar) values
censored (array-like, dtype=bool, same shape as data) – whether the corresponding value in
data
is censored (True
) or completely observed (False
).conf (float in [0, 1], optional) – the confidence level to use in calculating bounds
- Returns:
m, low, high (float) – point estimate for the mean of the exponential distribution, and confidence bounds.
bild.util
Utilities for BILD
- class bild.util.Loopingprofile(states=None)
Bases:
object
This class represents a single looping profile
This is a thin wrapper around a
state
array, whose entries indicate which state of the model should be used for propagation from frame to frame. Specifically, the following scheme applies to the relation between aLoopingprofile
profile
and the associatedTrajectory
traj
:profile[0] profile[1] profile[-1] -------------> traj[0] ------------> traj[1] --- ... ---> traj[-2] -------------> traj[-1]
Note that
profile[i]
can be interpreted as “the state of the model used to propagate totraj[i]
”. Furthermore,profile[0]
indicates the model state used to calculate the steady state ensemble each trajectory starts from.- Operators defined for a
Loopingprofile
profile
: len(profile)
gives the length in framesprofile[i]
for element-access (get/set)profile0 == profile1
if the associated states are equal
- states
the internal state array. If this is not given upon initialization, initialized to an empty array.
- Type:
np.ndarray, dtype=int
- copy()
Copy to a new object
Because we can copy just
self.state
, this is faster than doingdeepcopy(profile)
.- Returns:
Loopingprofile
- count_switches()
Give number of switches in the profile
- Returns:
int
- intervals()
Find intervals of constant state in the profile
- Returns:
list of tuples – each entry signifies an interval, in the format
(start, end, state)
, wherestart
orend
areNone
for the first/last interval in the profile, respectively.
- plottable()
For plotting profiles
- Returns:
t, y (np.ndarray) – coordinates for plotting the profile as proper step function. See below for usage
Example
Given a profile
profile
, you might visualize it as>>> from matplotlib import pyplot as plt ... plt.plot(*profile.plottable(), color='tab:green') ... plt.show()
Note that another way to achieve the same thing is to just use
matplotlib
’sstairs
function:>>> plt.stairs(profile, edges=np.arange(-1, len(profile)))
- Operators defined for a
- bild.util.state_probabilities(profiles, nStates=None)
Calculate marginal probabilities for an ensemble of profiles
- Parameters:
profiles (list of Loopingprofile) – the profiles to use
nStates (int, optional) – the number of states in the model associated with the profiles. Since a profile might be (e.g.) flat zero, it is not in general possible to determine this from the profiles. Does not affect the calculated probabilities, but (“just”) the shape of the output. If
None
, will be identified from the profiles (i.e.nStates = (largest state index found) + 1
)
- Returns:
(nStates, T) np.ndarray – the marginal probabilities for each state at each time point