PyGT.tools¶
Optimal Markovian coarse-graining for a given community structure¶
Various functions to analyze Markov chains, including estimating the optimal coarse-grained CTMC for a given community structure.
-
PyGT.tools.
choose_nodes_to_remove
(rm_region, pi, tau, style='free_energy', percent_retained=50)[source]¶ Given a branching probability matrix, stationary distribution and waiting times of a CTMC, return a Boolean array selecting nodes from a given subset to remove by graph transformation according to some simple criteria. [Kannan20b]
Parameters: - rm_region ((N,) array) – boolean array that specifies region in which nodes can be removed i.e. for A<->B dynamics, all nodes in A,B should be retained.
- pi ((N,) array) – stationary distribution
- tau ((N,) array) – vector of waiting times
- B (sparse matrix, optional) – sparse matrix of branching (CTMC) or transition (DTMC) probabilities, used for style=’node_degree’
- style (str, optional) –
ranking used to remove nodes, from high to low. Highest percentile is removed ‘escape_time’ : (=tau)ascending,
’free_energy’ : (=-log|pi|), descending,
’hybrid’ : remove nodes in highest percentile in escape_time and free_energy
’combined’ : tau * pi, ascending,
’node_degree’: descending (requires B),
Default = free_energy
- percent_retained (float, optional) – percent of nodes to keep in reduced network. Default = 50.0
Returns: rm_vec – boolean array that specifies nodes to remove, which will always be a subset of the nodes selected by rm_region
Return type: (N,) array
-
PyGT.tools.
check_detailed_balance
(pi, K)[source]¶ Check if Markov chain satisfies detailed balance condition, \(k_{ij} \pi_j = k_{ji} \pi_i\) for all \(i,j\).
Parameters: - pi (array-like (N,)) – stationary probabilities of full or reduced system
- K (sparse or dense matrix (N, N)) – transition rate matrix of full or reduced system
Returns: Self-explanatory
Return type: success, bool
-
PyGT.tools.
make_fastest_path
(G, i, f, depth=1, limit=None)[source]¶ Wrapper for scipy.sparse.csgraph.shortest_path which returns node indicies on as-determined shortest i->f path and those depth connections away. Used to determine which nodes to remove by graph transformation during sensitivity analysis applied to kinetic transition networks [Swinburne20a].
Parameters: - B ((N,N) sparse matrix) – Matrix that will be interpreted as weighted graph for path calculation
- i (int) – Initial node index
- f (int) – Initial node index
- depth (int, (default=1)) – Size of near-path region
Returns: - path (array_like) – indices of path nodes
- path_region (array_like) – indices of near-path nodes
-
PyGT.tools.
check_kemeny
(pi, tauM)[source]¶ Check that Markov chain satisfies the Kemeny constant relation, \(\sum_i\pi_i, \mathcal{T}_{ij} = \xi\) for all \(j\), where \(\xi\) is a constant and \(\mathcal{T}_{ij}\) is the j->i mean first passage time.
Parameters: - pi (array-like (N,)) – stationary probabilities of full or reduced system
- tauM (sparse or dense matrix (N, N)) – mean first passage time matrix of full or reduced system
Returns: - kemeny_constant, float – Average kemeny constant \(mean(xi)\) across nodes
- success, bool – Self-explanatory. False if \(std(xi)/mean(xi)>1e-9\)
-
PyGT.tools.
load_CTMC
(K)[source]¶ Setup a GT calculation for a transition rate matrix representing a continuous-time Markov chain.
Parameters: K (array-like (nnodes, nnodes)) – Rate matrix with elements \(K_{ij}\) corresponding to the \(i \leftarrow j\) transition rate and diagonal elements \(K_{ii} = \sum_\gamma K_{\gamma i}\) such that the columns of \(\textbf{K}\) sum to zero. Returns: - B (np.ndarray[float64] (nnodes, nnodes)) – Branching probability matrix in dense format, used as input to GT
- tau (np.ndarray[float64] (nnodes,)) – Vector of waiting times of nodes, used as input to GT
-
PyGT.tools.
load_DTMC
(T, tau_lag)[source]¶ Setup a GT calculation for a transition probability matrix representing a discrete-time Markov chain.
Parameters: - T (array-like (nnodes, nnodes)) – Discrete-time, column-stochastic transition probability matrix.
- tau_lag (float) – Lag time at which T was estimated.
Returns: - B (np.ndarray[float64] (nnodes, nnodes)) – Branching probability matrix in dense format, used as input to GT
- tau (np.ndarray[float64] (nnodes,)) – Vector of waiting times of nodes, used as input to GT
-
PyGT.tools.
eig_wrapper
(M)[source]¶ Wrapper of
scipy.linalg.eig
that returns real eigenvalues and orthonormal left and right eigenvector pairsParameters: M ((N,N) dense matrix) – Returns: - nu ((N,) array-like) – Real component of eigenvalues
- v ((N,N) array-like) – Matrix of left eigenvectors
- w ((N,N) array-like) – Matrix of right eigenvectors
-
class
PyGT.tools.
Analyze_KTN
(path, K=None, pi=None, commpi=None, communities=None, commdata=None)[source]¶ Estimate a coarse-grained continuous-time Markov chain given a partiioning \(\mathcal{C} = \{I, J, ...\}\) of the \(V\) nodes into \(N<V\) communities. Various formulations for the inter-community rates are implemented, including the local equilibrium approximation, Hummer-Szabo relation, and other routes to obtain the optimal coarse-grained Markov chain for a given community structure. [Kannan20a]
-
path
¶ path to directory with all relevant files
Type: str or Path object
-
K
¶ Rate matrix with elements \(K_{ij}\) corresponding to the i leftarrow j transition rate, and diagonal elements \(K_{ii} = -\sum_\gamma K_{\gamma i}\) such that the columns sum to zero.
Type: array-like (nnodes, nnodes)
-
pi
¶ Stationary distribution of nodes, \(\pi_i\), i.e. vector of equilibrium occupation probabilities.
Type: array-like (nnodes,)
-
commpi
¶ Stationary distribution of communities, \(\Pi_J = \sum_{j \in J} \pi_j\).
Type: array-like (ncomms,)
-
communities
¶ dictionary mapping community IDs (1-indexed) to node IDs (1-indexed).
Type: dict
-
commdata
¶ Filename, located in directory specified by path, of a single-column file where each line contains the community ID (0-indexed) of the node specified by the line number in the file.
Type: str
Note
Either communities or commdata must be specified.
-
construct_coarse_rate_matrix_LEA
()[source]¶ Calculate the coarse-grained rate matrix obtained using the local equilibrium approximation (LEA).
-
construct_coarse_rate_matrix_Hummer_Szabo
()[source]¶ Calculate the optimal coarse-grained rate matrix using the Hummer-Szabo relation, aka Eqn. (12) in Hummer & Szabo J. Phys. Chem. B. (2015).
-
construct_coarse_rate_matrix_KKRA
(mfpt=None, GT=False, **kwargs)[source]¶ Calculate optimal coarse-grained rate matrix using Eqn. (79) of Kells et al. J. Chem. Phys. (2020), aka the KKRA expression in Eqn. (10) of [Kannan20a].
Parameters: - mfpt ((nnodes, nnodes)) – Matrix of inter-microstate MFPTs between all pairs of nodes. Defaults to None.
- GT (bool) – If True, matrix of inter-microstate MFPTs is computed with GT. Kwargs can then be specified for GT (such as the pool_size for parallelization). Defaults to False.
-
get_intermicrostate_mfpts_linear_solve
()[source]¶ Calculate the matrix of inter-microstate MFPTs between all pairs of nodes by solving a system of linear equations given by Eq.(8) of [Kannan20a].
-
get_intermicrostate_mfpts_fundamental_matrix
()[source]¶ Calculate the matrix of inter-microstate MFPTs between all pairs of nodes using Eq. (6) of [Kannan20a].
-
get_intercommunity_MFPTs_linear_solve
()[source]¶ Calculate the true MFPTs between communities by inverting the non-absorbing rate matrix. Equivalent to Eqn. (14) in Swinbourne & Wales JCTC (2020).
-
get_intercommunity_weighted_MFPTs
(mfpt, diagzero=True)[source]¶ Comppute the matrix \(\widetilde{\mathcal{T}}_{\rm C}\) of appropriately weighted inter-community MFPTs, as defined in Eq. (18) in [Kannan20a].
Parameters: - mfpt (array-like (N,N)) – matrix of intermicrostate MFPTs.
- diagzero (bool) – Whether to define the inter-community weighted MFPTs such as the diagonal elements are zero. Defaults to True.
-
get_timescale_error
(m, K, R)[source]¶ Calculate the ith timescale error for i in {1,2,…m} of a coarse-grained rate matrix R compared to the full matrix K.
Parameters: - m (int) – Number of dominant eigenvalues (m < N)
- K (np.ndarray[float] (V, V)) – Rate matrix for full network
- R (np.ndarray[float] (N, N)) – Coarse-grained rate matrix
Returns: timescale_errors – Errors for m-1 slowest timescales
Return type: np.ndarray[float] (m-1,)
-
get_eigenfunction_error
(m, K, R)[source]¶ Calculate the \(i^{\rm th}\) eigenvector approximation error for \(i \in {1, 2, ... m}\) of a coarse-grained rate matrix R by comparing its eigenvector to the correspcorresponding eigenvector of the full matrix.
-
get_comm_stat_probs
(pi, log=False)[source]¶ Calculate the community stationary probabilities by summing over the stationary probabilities of the nodes in each community.
Parameters: pi (list (nnodes,)) – stationary probabilities of node in original Markov chain Returns: commpi – stationary probabilities of communities in coarse coarse_network Return type: list (ncomms,)
-
read_communities
(commdat)[source]¶ Read in a single column file called communities.dat where each line is the community ID (zero-indexed) of the minima given by the line number.
Parameters: commdat (dat file) – single-column file containing community IDs of each minimum Returns: communities – mapping from community ID (1-indexed) to minima ID (1-indexed) Return type: dict
-