PyGT.io

This module reads in the following input files:

Files defining sets of nodes

min.A: single-column[int], (N_A + 1, )

First line contains the number of nodes in community A. Subsequent lines contain the node ID (1-indexed) of the nodes belonging to A. Example:

10 # number of nodes in A
1
2
...
min.B: single-column[int], (N_B + 1, )
First line contains the number of nodes in community B. Subsequent lines contain the node ID (1-indexed) of the nodes belonging to B.
communities.dat : single-column[int] (nnnodes,)

Each line contains community ID (0-indexed) of the node specified by the line number in the file

Example:

0
2
1
0
...

Files describing minimia and saddle points of energy landscape

The following files are designed to describe a Markov chain in which nodes correspond to potential or free energy minima, and edges correspond to the transition states that connect them. These files are also used as input to the PATHSAMPLE program implemented in the Fortran language:

min.data: multi-column, (nnodes, 6)

Line numbers indicate node-IDs. Each line contains energy of local minimum [float], log product of positive Hessian eigenvalues \(\sum_i\log|m\omega_i^2|\) [float], isometry [int], sorted eigenvalues of inertia tensor itx [float], ity [float], itz [float]

Example (LJ13 dataset):

-44.3268014195  158.2464487383  120     9.3629926605    9.3629926606    9.3629926607 # node 1
-41.4719798478  153.8092000860  2       8.7543536583    10.6788841871   11.4404999401 # node 2
...
ts.data: multi-column, (nts, 8)

Each line contains energy of transition state [float], log product of positive Hessian eigenvalues \(\sum_i\log|m\omega_i^2|\) [float], isometry [int], ID of first minimum it connects [int], ID of second minimum it connects [int], sorted eigenvalues of inertia tensor itx [float], ity [float], itz [float]

Example (LJ13 dataset):

-40.4326640775  148.9095497699  1       2       1       9.0473340846    10.4342879996   10.9389332953
-40.9062828304  150.4182291647  2       3       1       8.6169912461    10.3990395875   11.4674850889
...

The PyGT.io module then calculates transition rates using unimolecular rate theory.

Files defining a continuous-time Markov chain

The following files describe the transition rates between nodes and their stationary probabilities in an arbitrary continuous-time Markov chain.

ts_conns.dat : double-column[int], (nedges,)

Edge table where each row contains the IDs (1-indexed) of the nodes connected by each edge.

Example:

1   238 #edge connecting node 1 and node 238
2   307
...
ts_weights.dat : single-column[float], (2*nedges,)

Each pair of lines are the edge weights, \(\ln(k_{i\leftarrow j})\) and \(\ln(k_{j\leftarrow i})\) for the \(i \leftrightarrow j\) bi-directional edge consistent with ts_conns.dat.

Example:

-5.6600770665 #ln[k(1<-238)]
-9.3074770665 #ln[k(238<-1)]
-5.8668770665 #ln[k(1<-307)]
-7.3402770665 #ln[k(307<-1)]
...
stat_prob.dat : single-column[float], (nnodes,)
Log stationary probabilities of nodes.
PyGT.io.load_ktn(path, beta=1.0, Nmax=None, Emax=None, screen=False, discon=False)[source]

Load in min.data and ts.data files, calculate rates, and find connected components.

Parameters:
  • path (str) – path to min.data and ts.data files
  • beta (float, optional) – value for \(1 / (k_B T)\). Default = 1.0
  • Nmax (int, optional) – maximum number of minima to include in KTN. Default = None (i.e. infinity)
  • Emax (float, optional) – maximum potential energy of minima/TS to include in KTN. Default = None (i.e. infinity)
  • screen (bool, optional) – whether to print progress. Default = False
  • discon (bool, optional) – Output data for disconnectivity graph construction (undocumented) Default = False
Returns:

  • B ((N,N) matrix) – sparse matrix of branching probabilities
  • K ((N,N) csr matrix) – sparse matrix where off-diagonal elements \(K_{ij}\) contain \(i \leftarrow j\) transition rates. Diagonal elements are 0.
  • tau ((N,) array_like) – vector of waiting times such that total escape rate in state \(i\) is \(1/\tau_i\) Full rate matrix is then given by \(K_{ij}-\delta_{ij}/\tau_i\)
  • N (int) – number of nodes in the largest connected component of the Markov chain
  • u ((N,) array_like) – energies of the N nodes in the Markov chain
  • s ((N,) array_like) – entropies of the N nodes in the Markov chain
  • Emin (float) – energy of global minimum (energies in u are rescaled so that Emin=0)
  • retained (np.ndarray[bool] (nnodes,)) – Boolean array selecting out largest connected component (retained.sum() = N).

PyGT.io.load_ktn_AB(data_path, retained=None)[source]

Read in A_states and B_states from min.A and min.B files, only keeping the states that are part of the largest connected set, as specified by retained.

Parameters:
  • data_path (str) – path to location of min.A, min.B files
  • retained (array-like[bool] (nnodes, )) – selects out indices of the maximum connected set
Returns:

  • A_states (array-like[bool] (retained.size, )) – boolean array that selects out the A states
  • B_states (array-like[bool] (retained.size, )) – boolean array that selects out the B states

PyGT.io.read_communities(file, retained, screen=False)[source]

Read in a single column file called communities.dat where each line is the community ID (zero-indexed) of the nodes given by the line number. Produces boolean arrays, one per community, selecting out the nodes that belong to each community.

Parameters:
  • file (str) – single-column file containing community IDs of each minimum
  • retained ((N,) boolean array) – selects out the largest connected component of the network
Returns:

communities – mapping from community ID (0-indexed) to a boolean array which selects out the states in that community.

Return type:

dictionary

PyGT.io.read_ktn_info(path, suffix='')[source]

Read input files stat_prob.dat, ts_weights.dat, and ts_conns.dat and return a rate matrix and vector of stationary probabilities.

Parameters:
  • path (str) – path to directory containing stat_prob.dat, ts_weights.dat, and ts_conns.dat files.
  • suffix (str) – Suffix for file names, i.e. ‘ts_weights{suffix}.dat’. Defaults to ‘’.
Returns:

  • pi (array-like (nnodes,)) – vector of stationary probabilities.
  • K (array-like (nnodes,nnodes)) – CTMC rate matrix in dense format.