4 lasertrapr: a computational tool for automating the analysis of laser trap data

4.1 Introduction

The laser trap (or optical tweezers) has been revolutionary to the field of single molecule biophysics. Originally developed by Arthur Ashkin of Bell Laboratories (Ashkin 1986) the laser trap was eventually adopted by biologists to study the interactions of single molecular motors (e.g. myosin, kinesin, dynein) with their molecular tracks (e.g. actin, microtubules) by use of a three-bead assay (Finer, Simmons, and Spudich 1994; Kojima et al. 1997). These experiments permit researchers the ability to observe the interaction of two proteins within a millisecond-time and nanometer-spatial resolution providing unprecedented insight into the molecular machinery underlying a wide variety of biological functions including muscle contraction, intracellular cargo transport, and cell-division. Such experiments need to be performed with a low trap stiffness (0.02-0.04 pN/nm) as to not hinder the function or harm the integrity of the experimental proteins or setup. Since the position of a trapped bead is largely dominated by Brownian forces, a bead stuck in a trap with low stiffness has a large variance in its displacement signal as trap stiffness (\(\alpha\)trap) is inversely proportional to the variance (\(\sigma\)2) of the displacement signal (via the Equipartition Theorem) Svoboda and Block (1994).

\[ \alpha_{trap} = \frac{ k_B T_k } { \sigma^2 } \]

where kB is the Boltzmann constant and Tk the temperature in Kelvin. The high variance of the baseline displacement signal combined with the dampening effects of viscous drag forces masks the underlying mechanics of the two proteins interacting and cycling through a mechanochemical scheme that is common amongst biological motors used in these assays which makes the analytical task of identifying these events of interests quite challenging.

The variance of the displacement signal is a crucially important feature of single molecule laser trap data as the variance can be exploited to determine when protein interactions do occur. Since the biological motors used in these experiments are stiffer than the trapping laser, the interaction of the proteins can be characterized by a decrease in signal variance of the time series (position over time) signal, via Eq 1, which is also often accompanied by a displacement from the mean baseline position as in the case of a biological motor, like myosin, attaching to an actin filament and performing a powerstroke. In some cases, the signal-to-noise ratio of the baseline and event populations variance can exceed 2:1 which makes these interaction events readily discernible “by-eye”. However, while simple and easy, the analysis of data “by-eye” has been criticized in the past as this method was suggested to introduce subjectivity via user bias as evidenced by early inaccurate estimations of myosin’s displacement size (Finer, Simmons, and Spudich 1994; J. E. Molloy et al. 1995b). This exemplifies the fact that while the laser trap is a powerful and advanced scientific instrument, the reliability and accuracy of the information that can be extracted from the resulting data is limited by the validity of the techniques and programs used to analyze the data. While there are numerous techniques that can be used to identify binding events, a common theme between them is that most require advanced computer programming knowledge to implement. This is then compounded with a need to then automate those scripts by a preferential creation of user-friendly graphical user interfaces (GUIs). For most, the advanced computer skills required to build sophisticated analysis programs with GUIs are taught in classes that are not degree requirements for graduate students or researchers seeking degrees in many biology-related fields. This presents a technological barrier that hinders progress in understanding and interpretation of data for new students and researchers, and an additional monetary cost barrier is added to a laboratory if the creation of custom program must be outsourced. And, even in these situations then a research group is left with an custom and un-supported “black-box” program.

Unfortunately, there are currently no completely open-source projects whose primary aim is to automate the workflow of analyzing laser trap data (calibrations, processing, event identification, ensemble average, and summarizing statistics) written with an open-sourced programming language. Although it should be noted there have been recent publications of programs aimed at single molecule event identification, most notably the MATLAB based SPASM (Software for Precise Analysis of Single Molecules, Blackwell et al. (2021)). However, while SPASM itself is an open-source program, the underlying MATLAB language is proprietary/closed source language and has a steep financial barrier (currently a standard MATLAB license has an annual fee of $860 per their website at the time of writing). Here, we present {lasertrapr}, and open-source program for automating the analysis of laser trap data written in R, a free and open-source programming language (hence lasertrapr = laser trap + R; also note that it is common in the R-community to denote R-packages with {}). The tool has an easy-to-use GUI provided by the R-Shiny web-framework package. One of the main benefits of having a tool built with R/Shiny (R Core Team 2022; Chang et al. 2021) is that there is high portability of the app across different operating systems as it can be installed on Windows, MacOS, and Linux systems. Additionally, we do not view our application as a replacement or competitor to a program such as SPASM, but as an additional tool made available to the biophysics community that has some similar features (single molecule event identification and ensemble averaging) but for distinct data types (SPASM’s main event identification is built for 2 QPD systems using co-variance of bead position whereas our is for a 1 QPD system). Furthermore, {lasertrapr} fully embodies the notion of a free and open-source project whose main goal is automation and reproducibility of the entire workflow of analyzing laser trap data which includes folder/file creation and organization, signal calibrations, data cleaning and preparation, event identification of both single molecule and mini-ensemble data, ensemble averaging, generation of complete project summary statistics, and creation of publication quality figures. Lastly, the co-existence of multiple programs will only benefit the biophysics community by enabling researchers the ability to contribute to and use an analysis program best suited for their interests and experimental setups.

4.2 Results & Discussion

The following Results & Discussion serves as both a validation of the app and provides example use cases of what can be accomplished within the app. The paper presented in Chapter 5 was analyzed completely with this app so the present chapter’s aim is to provide evidence that the app provides a reproducible, precise, and accurate analysis tool. While reading this section if you decide that you really like the app and would like to try it yourself, there is a user-guide and complete documentation on how to install and use the app available on the app’s website https://lasertrapr.app/ including example videos. The documentation is also included in section 4.3. Additionally, the project can be found on Github at https://github.com/brentscott93/lasertrapr.

4.2.1 Single Molecule Analysis Validation: Simple

One of the main features of the app pertains to the analysis of single molecule laser trap data. Here, we will validate the single molecule analyzer that is based on a combination Hidden-Markov Model + Changepoint analysis. The full details of the analysis are provided in section 4.2.7 which walks through the actual single molecule analysis line-by-line. These tests will verify that the app can identify actomyosin binding events using simulated data which provide ideal and known conditions. In thie first simulation, every event has a 5nm displacement and 100ms attachment time. The single baseline-event-baseline data set is shown in Figure 4.1.

A simulated laser trap event. The event has 200ms of baseline preceeding a 5nm displacement that lasts for 100ms. Another 200ms of baseline data is appended after the displacement. This exact sequence was replicated and concatenated together 200 times which yields a data trace with 200 simulated binding events that are spaced 400ms a part. The exact simulated measurements are: 5nm displacement, 100ms attachment times, and 400ms time between events.

The 200 event simulation was analyzed to test the number of events the analysis could detect out of the original 200 created. One of the features of the app is that after the analysis it takes the original raw/simulated trapping data and “overlays” the analysis results on-top of the original data so the user can compare/check how the analyzer is performing. A screenshot of the results of the analysis provided by the app is shown in Figure 4.2. If you were using the app the figure displayed in Figure 4.2 would be an interactive graph that would allow you as a user interactive abilities to pan across the data and the analysis. This feature allows the user to become more familiar with their data and the app’s output.

Screenshot of the output from the single molecule analysis produced by lasertrapr. Black lines is the original trap trace, the green line signifies the results of the hidden markov model performed on the running window transormation, and the yellow highlights is the final results after changepoint is applied.

The output of app’s estimation of the displacement and attachment time for the 200 event simulation is shown in Table 4.1. The results show that the analysis correctly identifies and measures single molecule binding events. The app estimated a 5.1nm displacement and 99.8ms attachment time which accurately represents the known simulated values of 5nm and 100ms. Indeed, the 400ms time between events is also the same as the simulation input.

Table 4.1: App correctly identified the dispalcement size, attachment times. and time betweeen events
displacement_avg time_on_avg time_off_avg
5.109922 99.8 400.0251

4.2.2 Single Molecule Analysis Validation: Adding Complexity Simulating displacement distributions

Adding complexity to the simulations creates more realistic datasets that allows for more robust testing and assurance in the precision and accuracy of the program. In the previous data example, every event had the same exact displacement and attachment time. Here we simulate displacement distributions which more accurately represents how real single molecule data would be collected. Since each displacement in the trap is the summation of the displacement caused by myosin’s powerstroke and that of random brownian noise, displacement data in the single molecule laser trap have been shown to be normally distributed with a mean displacement equivalent to myosin’s powerstroke size and standard deviation that is dependent on the trap stiffness (J. E. Molloy et al. (1995b)).

Four datasets were simulated whose event population had displacements generated from four distinct Gaussian distributions whose mean values were 0, 5, 10, and 15nm, respectively. Being able to detect 0nm displacement events was an important test to conduct considering the possibility for the “slow mouse” S217A mutation to actually have a small or even no displacement as described in Chapter 1 within Aim 2. The data was analyzed with the single molecule analyzer within the app and summarized in Figure 4.3. The app’s estimated average displacement was modeled against the actual average displacement of the Gaussian distributions that generated the data and fit with a linear regression. The coefficient of determination (R2) is 0.99 indicating that the analysis is able to accurately predict average displacements across a wide range of distances. Additionally, out of a total of 4000 simulated events the app identified 3997 which is a 99.925% detection rate for these simulations.

Four simulations were performed with average dispalcements of 0, 5, 10, and 15nm. a) Simulated data traces. Top trace has 0nm average displacement. Bottom has a 15nm average displacement. b) Linear regression of the actual displacement (x-axis) versus the analysis estimated displacement size (y-axis) demonstrating a near perfect correlation. c) Full gaussian distributions of the analyzer's estimated displacement for each of the 4 simulations.

Figure 4.1: Four simulations were performed with average dispalcements of 0, 5, 10, and 15nm. a) Simulated data traces. Top trace has 0nm average displacement. Bottom has a 15nm average displacement. b) Linear regression of the actual displacement (x-axis) versus the analysis estimated displacement size (y-axis) demonstrating a near perfect correlation. c) Full gaussian distributions of the analyzer’s estimated displacement for each of the 4 simulations. Simulating attachment time distributions

In similar fashion to the displacement distributions, the attachment lifetimes for each myosin binding event is not the exact same for each interaction. The total event population for myosin’s attachment time is exponentially distributed, so we simulated additional data where we defined exponential distributions to generate total attachment times from for each event. Truncated exponential distributions were used in order to more easily generate data that would be confined to the needs of the simulations. For example, in the case of an exponential distribution and as the the PDF suggests, the smallest numbers are the most probable to generate upon random sampling. However, when simulating data we are confined to defining time in terms of datapoints per the sampling frequency. To avoid randomly generating infinitely small ADP-release/ATP-binding rates, which define the total attachment lifetimes, the exponentials are truncated at a minimum of 1 or more milliseconds. Additionally, a truncation of minimum attachment-times also increases the assurance that if we generate 100 events the simulator will actually simulate 100 observable and detectable events. For these sets of simulations, the rates supplied to the exponential distributions represent the average attachment lifetime of the population since for a given exponential distribution the arithmetic mean (expected value) is equivalent to the reciprocal of the decay rate \[ E[x] = 1/\lambda \]. Drawing a random value from each distribution yields the length of time that the attachment time should be to for a given event. Each distribution shown below reflect 10,0000 random draws from a respective distribution whose rate are 20, 10, 5, 1 with lower bounds of 0.02, 0.02, 0.02, 0.02 and upper bounds of 0.3, 1.856, 1.856, 3. As shown in Figure 4.4, decreasing the rate increases increases the time values that can be generated which results in longer attachment times. Furthermore, panel C represents exponential distribution with the same rates used except panel C is not-truncated. The distributions in panel C further display the use-case for truncating the data to make more economical (i.e. smaller file sizes) as it is an attempt to to avoid excruciatingly long attachment/detachment times when the rate constant is 1. For these simulations, the ATP binding rate was set to take 1ms (5 data points) on average, so its attachment time contributions can largely be ignored which made the comparisons easier. This would be analogous to conducting experiments at a very high ATP concentration (>1mM). Essentially as soon as ADP is released a new ATP is readily nearby and instantaneously can bind to myosin’s active site causing a detachment.

Simulated truncated (a & b) exponential distributions that are representative to the ones used to generate the simualted data

Figure 4.2: Simulated truncated (a & b) exponential distributions that are representative to the ones used to generate the simualted data

The data was analyzed with the single molecule analyzer where the attachment lifetimes were estimated by the app and then compared to the true values set in the simulations. Since truncated exponential distributions were used, the average attachment time is not as simple as taking the reciprocal of the decay rate. To get an estimate of the possible true rate/attachment times generated from the truncated distributions, 10 rounds of 10,000 random draws from the truncated distributions were generated. For each 1 set that contained 10,000 random draws the average was recorded (0.068759, 0.0693254, 0.0696117, 0.068123, 0.0689889, 0.0690398, 0.0698608, 0.0693016, 0.0689494, 0.0684859) and then those averages were averaged together to be compared against the mean attachment time as estimated from the app. The estimated attachment lifetimes were modeled against the known/true values, and fit with a linear regression. The coefficient of determination (R2) is 1 indicating that the analysis can accurately estimate attachment lifetimes for a wide variety of time periods indicating that the app/analysis should be able to be applied to a wide variety of myosin/molecular motors with differing ADP release rates and even under experiments at higher ATP conditions which decreases the attachment times.

Four simulations were performed with average attachment lifetimes of 50

Figure 4.3: Four simulations were performed with average attachment lifetimes of 50 Short Events

To further test the single molecule analyzer and test the limitations of reliable event detection a dataset was simulated with very short attachment times (~20ms average). The window width of the HM-Model was decreased to 100 datapoints making the event more readily detectable. Figure 4.6 shows the identified event in this short event simulation The data was simulated, analyzed and even Figure 4.5 was generated from within the app with the built-in “screenshot” tool which makes saving snippets of analyzed traces easy and provides color coding of identified events free of charge.