PyGT.tools¶

Optimal Markovian coarse-graining for a given community structure¶

Various functions to analyze Markov chains, including estimating the optimal coarse-grained CTMC for a given community structure.

PyGT.tools.choose_nodes_to_remove(rm_region, pi, tau, style='free_energy', percent_retained=50)[source]¶

Given a branching probability matrix, stationary distribution and waiting times of a CTMC, return a Boolean array selecting nodes from a given subset to remove by graph transformation according to some simple criteria. [Kannan20b]

Parameters:	rm_region ((N,) array) – boolean array that specifies region in which nodes can be removed i.e. for A<->B dynamics, all nodes in A,B should be retained. pi ((N,) array) – stationary distribution tau ((N,) array) – vector of waiting times B (sparse matrix, optional) – sparse matrix of branching (CTMC) or transition (DTMC) probabilities, used for style=’node_degree’ style (str, optional) – ranking used to remove nodes, from high to low. Highest percentile is removed ‘escape_time’ : (=tau)ascending, ’free_energy’ : (=-log\|pi\|), descending, ’hybrid’ : remove nodes in highest percentile in escape_time and free_energy ’combined’ : tau * pi, ascending, ’node_degree’: descending (requires B), Default = free_energy percent_retained (float, optional) – percent of nodes to keep in reduced network. Default = 50.0
Returns:	rm_vec – boolean array that specifies nodes to remove, which will always be a subset of the nodes selected by rm_region
Return type:	(N,) array

PyGT.tools.check_detailed_balance(pi, K)[source]¶

Check if Markov chain satisfies detailed balance condition, \(k_{ij} \pi_j = k_{ji} \pi_i\) for all \(i,j\).

Parameters:	pi (array-like (N,)) – stationary probabilities of full or reduced system K (sparse or dense matrix (N, N)) – transition rate matrix of full or reduced system
Returns:	Self-explanatory
Return type:	success, bool

PyGT.tools.make_fastest_path(G, i, f, depth=1, limit=None)[source]¶

Wrapper for scipy.sparse.csgraph.shortest_path which returns node indicies on as-determined shortest i->f path and those depth connections away. Used to determine which nodes to remove by graph transformation during sensitivity analysis applied to kinetic transition networks [Swinburne20a].

Parameters:

B ((N,N) sparse matrix) – Matrix that will be interpreted as weighted graph for path calculation
i (int) – Initial node index
f (int) – Initial node index
depth (int, (default=1)) – Size of near-path region

Returns:

path (array_like) – indices of path nodes
path_region (array_like) – indices of near-path nodes

PyGT.tools.check_kemeny(pi, tauM)[source]¶

Check that Markov chain satisfies the Kemeny constant relation, \(\sum_i\pi_i, \mathcal{T}_{ij} = \xi\) for all \(j\), where \(\xi\) is a constant and \(\mathcal{T}_{ij}\) is the j->i mean first passage time.

Parameters:

pi (array-like (N,)) – stationary probabilities of full or reduced system
tauM (sparse or dense matrix (N, N)) – mean first passage time matrix of full or reduced system

Returns:

kemeny_constant, float – Average kemeny constant \(mean(xi)\) across nodes
success, bool – Self-explanatory. False if \(std(xi)/mean(xi)>1e-9\)

PyGT.tools.load_CTMC(K)[source]¶

Setup a GT calculation for a transition rate matrix representing a continuous-time Markov chain.

Parameters:	K (array-like (nnodes, nnodes)) – Rate matrix with elements \(K_{ij}\) corresponding to the \(i \leftarrow j\) transition rate and diagonal elements \(K_{ii} = \sum_\gamma K_{\gamma i}\) such that the columns of \(\textbf{K}\) sum to zero.
Returns:	B (np.ndarray[float64] (nnodes, nnodes)) – Branching probability matrix in dense format, used as input to GT tau (np.ndarray[float64] (nnodes,)) – Vector of waiting times of nodes, used as input to GT

PyGT.tools.load_DTMC(T, tau_lag)[source]¶

Setup a GT calculation for a transition probability matrix representing a discrete-time Markov chain.

Parameters:

T (array-like (nnodes, nnodes)) – Discrete-time, column-stochastic transition probability matrix.
tau_lag (float) – Lag time at which T was estimated.

Returns:

B (np.ndarray[float64] (nnodes, nnodes)) – Branching probability matrix in dense format, used as input to GT
tau (np.ndarray[float64] (nnodes,)) – Vector of waiting times of nodes, used as input to GT

PyGT.tools.eig_wrapper(M)[source]¶

Wrapper of scipy.linalg.eig that returns real eigenvalues and orthonormal left and right eigenvector pairs

Parameters:	M ((N,N) dense matrix) –
Returns:	nu ((N,) array-like) – Real component of eigenvalues v ((N,N) array-like) – Matrix of left eigenvectors w ((N,N) array-like) – Matrix of right eigenvectors

class PyGT.tools.Analyze_KTN(path, K=None, pi=None, commpi=None, communities=None, commdata=None)[source]¶

Estimate a coarse-grained continuous-time Markov chain given a partiioning \(\mathcal{C} = \{I, J, ...\}\) of the \(V\) nodes into \(N<V\) communities. Various formulations for the inter-community rates are implemented, including the local equilibrium approximation, Hummer-Szabo relation, and other routes to obtain the optimal coarse-grained Markov chain for a given community structure. [Kannan20a]

path¶

path to directory with all relevant files

Type:	str or Path object

K¶

Rate matrix with elements \(K_{ij}\) corresponding to the i leftarrow j transition rate, and diagonal elements \(K_{ii} = -\sum_\gamma K_{\gamma i}\) such that the columns sum to zero.

Type:	array-like (nnodes, nnodes)

pi¶

Stationary distribution of nodes, \(\pi_i\), i.e. vector of equilibrium occupation probabilities.

Type:	array-like (nnodes,)

commpi¶

Stationary distribution of communities, \(\Pi_J = \sum_{j \in J} \pi_j\).

Type:	array-like (ncomms,)

communities¶

dictionary mapping community IDs (1-indexed) to node IDs (1-indexed).

Type:	dict

commdata¶

Filename, located in directory specified by path, of a single-column file where each line contains the community ID (0-indexed) of the node specified by the line number in the file.

Type:	str

Note

Either communities or commdata must be specified.

construct_coarse_rate_matrix_LEA()[source]¶: Calculate the coarse-grained rate matrix obtained using the local equilibrium approximation (LEA).

construct_coarse_rate_matrix_Hummer_Szabo()[source]¶: Calculate the optimal coarse-grained rate matrix using the Hummer-Szabo relation, aka Eqn. (12) in Hummer & Szabo J. Phys. Chem. B. (2015).

construct_coarse_rate_matrix_KKRA(mfpt=None, GT=False, **kwargs)[source]¶

Calculate optimal coarse-grained rate matrix using Eqn. (79) of Kells et al. J. Chem. Phys. (2020), aka the KKRA expression in Eqn. (10) of [Kannan20a].

Parameters:	mfpt ((nnodes, nnodes)) – Matrix of inter-microstate MFPTs between all pairs of nodes. Defaults to None. GT (bool) – If True, matrix of inter-microstate MFPTs is computed with GT. Kwargs can then be specified for GT (such as the pool_size for parallelization). Defaults to False.

get_intermicrostate_mfpts_linear_solve()[source]¶: Calculate the matrix of inter-microstate MFPTs between all pairs of nodes by solving a system of linear equations given by Eq.(8) of [Kannan20a].

get_intermicrostate_mfpts_fundamental_matrix()[source]¶: Calculate the matrix of inter-microstate MFPTs between all pairs of nodes using Eq. (6) of [Kannan20a].

get_intercommunity_MFPTs_linear_solve()[source]¶: Calculate the true MFPTs between communities by inverting the non-absorbing rate matrix. Equivalent to Eqn. (14) in Swinbourne & Wales JCTC (2020).

get_intercommunity_weighted_MFPTs(mfpt, diagzero=True)[source]¶

Comppute the matrix \(\widetilde{\mathcal{T}}_{\rm C}\) of appropriately weighted inter-community MFPTs, as defined in Eq. (18) in [Kannan20a].

Parameters:	mfpt (array-like (N,N)) – matrix of intermicrostate MFPTs. diagzero (bool) – Whether to define the inter-community weighted MFPTs such as the diagonal elements are zero. Defaults to True.

get_timescale_error(m, K, R)[source]¶

Calculate the ith timescale error for i in {1,2,…m} of a coarse-grained rate matrix R compared to the full matrix K.

Parameters:	m (int) – Number of dominant eigenvalues (m < N) K (np.ndarray[float] (V, V)) – Rate matrix for full network R (np.ndarray[float] (N, N)) – Coarse-grained rate matrix
Returns:	timescale_errors – Errors for m-1 slowest timescales
Return type:	np.ndarray[float] (m-1,)

get_eigenfunction_error(m, K, R)[source]¶: Calculate the \(i^{\rm th}\) eigenvector approximation error for \(i \in {1, 2, ... m}\) of a coarse-grained rate matrix R by comparing its eigenvector to the correspcorresponding eigenvector of the full matrix.

get_comm_stat_probs(pi, log=False)[source]¶

Calculate the community stationary probabilities by summing over the stationary probabilities of the nodes in each community.

Parameters:	pi (list (nnodes,)) – stationary probabilities of node in original Markov chain
Returns:	commpi – stationary probabilities of communities in coarse coarse_network
Return type:	list (ncomms,)

read_communities(commdat)[source]¶

Read in a single column file called communities.dat where each line is the community ID (zero-indexed) of the minima given by the line number.

Parameters:	commdat (dat file) – single-column file containing community IDs of each minimum
Returns:	communities – mapping from community ID (1-indexed) to minima ID (1-indexed)
Return type:	dict