PyGT.tools

Optimal Markovian coarse-graining for a given community structure

Various functions to analyze Markov chains, including estimating the optimal coarse-grained CTMC for a given community structure.

PyGT.tools.choose_nodes_to_remove(rm_region, pi, tau, style='free_energy', percent_retained=50)[source]

Given a branching probability matrix, stationary distribution and waiting times of a CTMC, return a Boolean array selecting nodes from a given subset to remove by graph transformation according to some simple criteria. [Kannan20b]

Parameters:
  • rm_region ((N,) array) – boolean array that specifies region in which nodes can be removed i.e. for A<->B dynamics, all nodes in A,B should be retained.
  • pi ((N,) array) – stationary distribution
  • tau ((N,) array) – vector of waiting times
  • B (sparse matrix, optional) – sparse matrix of branching (CTMC) or transition (DTMC) probabilities, used for style=’node_degree’
  • style (str, optional) –

    ranking used to remove nodes, from high to low. Highest percentile is removed ‘escape_time’ : (=tau)ascending,

    ’free_energy’ : (=-log|pi|), descending,

    ’hybrid’ : remove nodes in highest percentile in escape_time and free_energy

    ’combined’ : tau * pi, ascending,

    ’node_degree’: descending (requires B),

    Default = free_energy

  • percent_retained (float, optional) – percent of nodes to keep in reduced network. Default = 50.0
Returns:

rm_vec – boolean array that specifies nodes to remove, which will always be a subset of the nodes selected by rm_region

Return type:

(N,) array

PyGT.tools.check_detailed_balance(pi, K)[source]

Check if Markov chain satisfies detailed balance condition, \(k_{ij} \pi_j = k_{ji} \pi_i\) for all \(i,j\).

Parameters:
  • pi (array-like (N,)) – stationary probabilities of full or reduced system
  • K (sparse or dense matrix (N, N)) – transition rate matrix of full or reduced system
Returns:

Self-explanatory

Return type:

success, bool

PyGT.tools.make_fastest_path(G, i, f, depth=1, limit=None)[source]

Wrapper for scipy.sparse.csgraph.shortest_path which returns node indicies on as-determined shortest i->f path and those depth connections away. Used to determine which nodes to remove by graph transformation during sensitivity analysis applied to kinetic transition networks [Swinburne20a].

Parameters:
  • B ((N,N) sparse matrix) – Matrix that will be interpreted as weighted graph for path calculation
  • i (int) – Initial node index
  • f (int) – Initial node index
  • depth (int, (default=1)) – Size of near-path region
Returns:

  • path (array_like) – indices of path nodes
  • path_region (array_like) – indices of near-path nodes

PyGT.tools.check_kemeny(pi, tauM)[source]

Check that Markov chain satisfies the Kemeny constant relation, \(\sum_i\pi_i, \mathcal{T}_{ij} = \xi\) for all \(j\), where \(\xi\) is a constant and \(\mathcal{T}_{ij}\) is the j->i mean first passage time.

Parameters:
  • pi (array-like (N,)) – stationary probabilities of full or reduced system
  • tauM (sparse or dense matrix (N, N)) – mean first passage time matrix of full or reduced system
Returns:

  • kemeny_constant, float – Average kemeny constant \(mean(xi)\) across nodes
  • success, bool – Self-explanatory. False if \(std(xi)/mean(xi)>1e-9\)

PyGT.tools.load_CTMC(K)[source]

Setup a GT calculation for a transition rate matrix representing a continuous-time Markov chain.

Parameters:K (array-like (nnodes, nnodes)) – Rate matrix with elements \(K_{ij}\) corresponding to the \(i \leftarrow j\) transition rate and diagonal elements \(K_{ii} = \sum_\gamma K_{\gamma i}\) such that the columns of \(\textbf{K}\) sum to zero.
Returns:
  • B (np.ndarray[float64] (nnodes, nnodes)) – Branching probability matrix in dense format, used as input to GT
  • tau (np.ndarray[float64] (nnodes,)) – Vector of waiting times of nodes, used as input to GT
PyGT.tools.load_DTMC(T, tau_lag)[source]

Setup a GT calculation for a transition probability matrix representing a discrete-time Markov chain.

Parameters:
  • T (array-like (nnodes, nnodes)) – Discrete-time, column-stochastic transition probability matrix.
  • tau_lag (float) – Lag time at which T was estimated.
Returns:

  • B (np.ndarray[float64] (nnodes, nnodes)) – Branching probability matrix in dense format, used as input to GT
  • tau (np.ndarray[float64] (nnodes,)) – Vector of waiting times of nodes, used as input to GT

PyGT.tools.eig_wrapper(M)[source]

Wrapper of scipy.linalg.eig that returns real eigenvalues and orthonormal left and right eigenvector pairs

Parameters:M ((N,N) dense matrix) –
Returns:
  • nu ((N,) array-like) – Real component of eigenvalues
  • v ((N,N) array-like) – Matrix of left eigenvectors
  • w ((N,N) array-like) – Matrix of right eigenvectors
class PyGT.tools.Analyze_KTN(path, K=None, pi=None, commpi=None, communities=None, commdata=None)[source]

Estimate a coarse-grained continuous-time Markov chain given a partiioning \(\mathcal{C} = \{I, J, ...\}\) of the \(V\) nodes into \(N<V\) communities. Various formulations for the inter-community rates are implemented, including the local equilibrium approximation, Hummer-Szabo relation, and other routes to obtain the optimal coarse-grained Markov chain for a given community structure. [Kannan20a]

path

path to directory with all relevant files

Type:str or Path object
K

Rate matrix with elements \(K_{ij}\) corresponding to the i leftarrow j transition rate, and diagonal elements \(K_{ii} = -\sum_\gamma K_{\gamma i}\) such that the columns sum to zero.

Type:array-like (nnodes, nnodes)
pi

Stationary distribution of nodes, \(\pi_i\), i.e. vector of equilibrium occupation probabilities.

Type:array-like (nnodes,)
commpi

Stationary distribution of communities, \(\Pi_J = \sum_{j \in J} \pi_j\).

Type:array-like (ncomms,)
communities

dictionary mapping community IDs (1-indexed) to node IDs (1-indexed).

Type:dict
commdata

Filename, located in directory specified by path, of a single-column file where each line contains the community ID (0-indexed) of the node specified by the line number in the file.

Type:str

Note

Either communities or commdata must be specified.

construct_coarse_rate_matrix_LEA()[source]

Calculate the coarse-grained rate matrix obtained using the local equilibrium approximation (LEA).

construct_coarse_rate_matrix_Hummer_Szabo()[source]

Calculate the optimal coarse-grained rate matrix using the Hummer-Szabo relation, aka Eqn. (12) in Hummer & Szabo J. Phys. Chem. B. (2015).

construct_coarse_rate_matrix_KKRA(mfpt=None, GT=False, **kwargs)[source]

Calculate optimal coarse-grained rate matrix using Eqn. (79) of Kells et al. J. Chem. Phys. (2020), aka the KKRA expression in Eqn. (10) of [Kannan20a].

Parameters:
  • mfpt ((nnodes, nnodes)) – Matrix of inter-microstate MFPTs between all pairs of nodes. Defaults to None.
  • GT (bool) – If True, matrix of inter-microstate MFPTs is computed with GT. Kwargs can then be specified for GT (such as the pool_size for parallelization). Defaults to False.
get_intermicrostate_mfpts_linear_solve()[source]

Calculate the matrix of inter-microstate MFPTs between all pairs of nodes by solving a system of linear equations given by Eq.(8) of [Kannan20a].

get_intermicrostate_mfpts_fundamental_matrix()[source]

Calculate the matrix of inter-microstate MFPTs between all pairs of nodes using Eq. (6) of [Kannan20a].

get_intercommunity_MFPTs_linear_solve()[source]

Calculate the true MFPTs between communities by inverting the non-absorbing rate matrix. Equivalent to Eqn. (14) in Swinbourne & Wales JCTC (2020).

get_intercommunity_weighted_MFPTs(mfpt, diagzero=True)[source]

Comppute the matrix \(\widetilde{\mathcal{T}}_{\rm C}\) of appropriately weighted inter-community MFPTs, as defined in Eq. (18) in [Kannan20a].

Parameters:
  • mfpt (array-like (N,N)) – matrix of intermicrostate MFPTs.
  • diagzero (bool) – Whether to define the inter-community weighted MFPTs such as the diagonal elements are zero. Defaults to True.
get_timescale_error(m, K, R)[source]

Calculate the ith timescale error for i in {1,2,…m} of a coarse-grained rate matrix R compared to the full matrix K.

Parameters:
  • m (int) – Number of dominant eigenvalues (m < N)
  • K (np.ndarray[float] (V, V)) – Rate matrix for full network
  • R (np.ndarray[float] (N, N)) – Coarse-grained rate matrix
Returns:

timescale_errors – Errors for m-1 slowest timescales

Return type:

np.ndarray[float] (m-1,)

get_eigenfunction_error(m, K, R)[source]

Calculate the \(i^{\rm th}\) eigenvector approximation error for \(i \in {1, 2, ... m}\) of a coarse-grained rate matrix R by comparing its eigenvector to the correspcorresponding eigenvector of the full matrix.

get_comm_stat_probs(pi, log=False)[source]

Calculate the community stationary probabilities by summing over the stationary probabilities of the nodes in each community.

Parameters:pi (list (nnodes,)) – stationary probabilities of node in original Markov chain
Returns:commpi – stationary probabilities of communities in coarse coarse_network
Return type:list (ncomms,)
read_communities(commdat)[source]

Read in a single column file called communities.dat where each line is the community ID (zero-indexed) of the minima given by the line number.

Parameters:commdat (dat file) – single-column file containing community IDs of each minimum
Returns:communities – mapping from community ID (1-indexed) to minima ID (1-indexed)
Return type:dict