A. Implementation details - Structure Formation of Biomolecules studied with Advanced Molecula

In this section we want to brieﬂy outline the important data structures and the hierarchy among them to provide insight into the design of the hdWE program. At the lowest hierarchical level, the WE algorithm organizes a number of trajectories which are evenly distributed along the binning coordinate. Trajectories are represented in hdWE as an instance of the class^Segment(). For

e full source code of hdWE was released under the license terms of the GPLv and can be obtained from https://github.com/enzyx/hdWE



the beer understanding, we want to look at a pseudo code representation of this class, which is reduced to the principle features.

class Segment(object):

def __init__(self, probability, parent_iteration_id, parent_bin_id, parent_segment_id, iteration_id, bin_id, segment_id):

self.probability = probability # float self.parent_iteration_id = parent_iteration_id # int self.parent_bin_id = parent_bin_id # int self.parent_segment_id = parent_segment_id # int

self.bin_id = bin_id # int

self.segment_id = segment_id # int

self.iteration_id = iteration_id # int

self.coordinates = None # list of floats

A ^Segment() object is located and identiﬁable in time and bin space with three ids deﬁning the iteration (iteration_id), bin (bin_id) and position in the bin (segment_id). In addition to that, the segment stores information about its recent history in the variables preﬁxed with

parent_. e ids deﬁne the position of a segment in the bin data structure above and are di-rectly linked to a trajectory coordinate ﬁle. e statistical weightw_i of the trajectory assigned by the WE algorithm is stored inprobabilityand the coordinate(s) of the trajectory on the binning coordinate(s) are stored in as an array incoordinates, e. g. in case of a single binning coordinate, the array contains only one element. e segments are organized in the^Bin()class, containing an arraysegmentsof segment instances.

class Bin(object):

def __init__(self, iteration_id, bin_id, target_number_of_segments, coordinate_ids):

self.iteration_id = iteration_id # int

self.bin_id = bin_id # int

self.target_number_of_segments = \

target_number_of_segments # int self.coordinate_ids = coordinate_ids

self.segments = []

self.initial_segments = []

self.resampling_history = []

eBin()is again identiﬁable viaiteration_idandbin_idwhich are redundantly stored in the segment instances for simple access to identiﬁers. e bin class carries information about its position along the binning coordinate(s) in thecoordinate_idsarray and stores the target number of segments. In order to preserve the full history about segments being merged and split during the resampling procedure, a copy of the^segmentsarray is stored ininitial_segments

before resampling. Every step of the resampling process is stored in theresampling_history

array. e resampling elements in this array are instances of the classesSplit()orMerge()

containing mainlysegment_idsof the merged and split segments.



class Merge(object):

"""

surviving_segment: The index of the surviving segment deleted_segments: List of segment_ids which are

deleted probability

"""

def __init__(self, surviving_segment_id, deleted_segments_ids):

self.surviving_segment_id = surviving_segment_id self.deleted_segments_ids = deleted_segments_ids def getType(self):

return type(self).__name__

class Split(object):

"""

parent_segment: segment_id of the split segment m: Number of segments which result from splitting

"""

def __init__(self, parent_segment_id, m):

self.parent_segment_id = parent_segment_id

self.m = m

def getType(self):

return type(self).__name__

e WE algorithm is round based and the underlying data structure representing one WE round is the classIteration(). e iteration class stores information about the binning coordinates (boundaries) which are referenced by the bin instances inbin.coordinate_ids. e iteration counter is stored initeration_idand the list of bin instances in thebinsarray.

class Iteration(object):

def __init__(self, iteration_id, boundaries, n_starting_structures):

self.iteration_id = iteration_id # int self.boundaries = boundaries # array

self.bins = []

Having introduced the data structure hierarchy, we now can analyze the main program loop in hdWE. For educational reasons, we skip the setup routines and jump directly into a pseudo-code representation. e code has been slightly simpliﬁed in terms of call parameters and additional function calls, compared to the actual hdWE implementation, to keep the level of detail at a minimum and focus on the essentials of the WE routine.

for iteration_counter in range(MAX_ITERATIONS):

# 1. Intialize iteration and sort segments into bins iterations.append(Iteration(iteration_counter,

iterations[-1]))

resorting.copyBinStructureToLastIteration(iterations)



resorting.resort(iterations)

# 2. Backup the segments lists of all bins for this_bin in iterations[-1]:

this_bin.backupInitialSegments()

# 3. Resampling (split/merge trajectories) resampler.resample(iterations[-1])

# 4. Reweighting

reweighter.doProbabilityReweighting(iterations)

# 5. Run MDs

md_module.runMDs(iterations[-1])

# 6. Calculate Segment Coordinates

md_module.calcCoordinates(iterations[-1])

e loop runs over a predeﬁned number of WE iterationsMAX_ITERATIONSand the instances of the Iteration() class are stored in the iterations array. First, a new instance of

Iteration()is created and appended to theiterations list. eresorting instance pro-vides functions to copy the bins and segments of the previous iteration to our new iteration object. Because segments have been propagated with MD at the end of the previous iteration, the trajectories (segments) have new coordinates in the bin spaces and need to be assigned to the bins with theresorting.resort()routine. If a trajectory accessed a previously empty bin region during the last iteration, a new bin instance is generated on-the-ﬂy. In a second step, the

segmentslist in the bins is backed up to theinitial_segmentslist before the resampling rou-tine merges and splits segments. e reweighting rourou-tine can (optionally) be applied, readjusting the probability weights of the segments. Internally, the reweighting routine calculates the aver-aged rate matrix over a given number of iterations. en the algorithm enters the propagation step which is handled by the^md_modulewhich provides an interface to the external Molecular Dynamics program. eMD_module()class functions as a wrapper to diﬀerent MD programs on a plugin basis. e speciﬁc call syntax of the individual MD programs is hidden behind gen-eral function calls (e. g. runMDs(), calcCoordinates) which have to be implemented when aiming to support an alternative MD soware package in hdWE. e ultimate step in the main loop is the calculation of the new bin coordinates (md_module.calcCoordinates()) for the propagated trajectories to allow resorting them into bins in the subsequent iteration.

Im Dokument Structure Formation of Biomolecules studied with Advanced Molecular Dynamics Simulations (Seite 125-128)