skip to main content


Title: First Year of Biophysica
“I can’t believe another year has passed already” is what most of us think when another birthday is upon us or when we see our children grow [...]  more » « less
Award ID(s):
2112675 2112710
NSF-PAR ID:
10342300
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Biophysica
Volume:
2
Issue:
2
ISSN:
2673-4125
Page Range / eLocation ID:
89 to 90
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This study investigates whether a legal natural language inference (NLI) model trained on the data from one US state can be transferred to another state. We fine-tuned a pre-trained model on the task of evaluating the validity of legal will statements, once with the dataset containing the Tennessee wills and once with the dataset containing the Idaho wills. Each model’s performance on the in-domain setting and the out-of-domain setting are compared to see if the models can across the states. We found that the model trained on one US state can be mostly transferred to another state. However, it is clear that the model’s performance drops in the out-of-domain setting. The F1 scores of the Tennessee model and the Idaho model are 96.41 and 92.03 when predicting the data from the same state, but they drop to 66.32 and 81.60 when predicting the data from another state. Subsequent error analysis revealed that there are two major sources of errors. First, the model fails to recognize equivalent laws across states when there are stylistic differences between laws. Second, difference in statutory section numbering system between the states makes it difficult for the model to locate laws relevant to the cases being predicted on. This analysis provides insights on how the future NLI system can be improved. Also, our findings offer empirical support to legal experts advocating the standardization of legal documents. 
    more » « less
  2. Light microscopy provides a window into another world that is not visible to the unaided eye. Because of this and its importance in biological discoveries, the light microscope is an essential tool for scientific studies. It can also be used with a variety of easily obtained specimens to provide dramatic demonstrations of previously unknown features of common plants and animals. Thus, one way to interest young people in science is to start with an introduction to light microscopy. This is an especially effective strategy for individuals who attend less advantaged or under-resourced schools, as they may not have been previously exposed to scientific concepts in their classes. However, introducing light microscopy lessons in the classroom can be challenging because of the high cost of light microscopes, even those that are relatively basic, in addition to their usual large size. Efforts are underway by our laboratory in collaboration with the Biophysical Society (BPS) to introduce young people to light microscopy using small, easy-to-assemble wooden microscopes developed by Echo Laboratories. The microscopes are available online as low-cost kits ($10 each with shipping), each consisting of 19 parts printed onto an 81⁄2 x 11 inch sheet of light-weight wood (Fig. 1). After punching out the pieces, they can be assembled into a microscope with a moveable stage and a low-power lens, also provided in the kit (Fig. 2). Photos taken with a cell phone through the microscope lens can give magnifications of ~16-18x, or higher. At these magnifications, features of specimens that are not visible to the unaided eye can be easily observed, e.g., small hairs on the margins of leaves or lichens [1]. As a member of the BPS Education Committee, one of us (SAE) wrote a Lesson Plan on Light Microscopy specifically for use with the wooden microscopes. SAE was also able to obtain a gift of 500 wooden microscope kits for the BPS from Echo Laboratories and Chroma Technology Corp in 2016. The wooden microscope kits, together with the lesson plan, have provided the materials for our present outreach efforts. Rather than giving out the wooden microscope kits to individuals, the BPS asked the Education Committee to maximize the impact of the gift by distributing the microscopes with the Lesson Plan on Light Microscopy to teachers, e.g., through teachers’ workshops or outreach sessions. This strategy was devised to enable the Society to reach a larger number of young people than by giving the microscopes to individuals. The Education Committee first evaluated the microscopes as a tool to introduce students to scientific concepts by providing microscopes to a BPS member at the National University of Colombia who conducted a workshop on Sept 19-24, 2016 in Tumaco, Columbia. During the workshop, which involved 120 high school girls and 80 minority students, including Afro-Colombian and older students, the students built the wooden microscopes and examined specimens, and compared the microscopes to a conventional light microscope. Assembling the wooden microscopes was found to be a useful procedure that was similar to a scientific protocol, and encouraged young girls and older students to participate in science. This was especially promising in Colombia, where there are few women in science and little effort to increase women in STEM fields. Another area of outreach emerged recently when one of us, USP, an undergraduate student at Duke University, who was taught by SAE how to assemble the wooden microscopes and how to use the lesson plan, took three wooden microscopes on a visit to her family in Bangalore, India in summer 2018 [2]. There she organized and led three sessions in state run, under-resourced government schools, involving classes of ~25-40 students each. This was very successful – the students enjoyed learning about the microscopes and building them, and the science teachers were interested in expanding the sessions to other government schools. USP taught the teachers how to assemble and use the microscopes and gave the teachers the microscopes and lesson plan, which is also available to the public at the BPS web site. She also met with a founder of the organization, Whitefield Rising, which is working to improve teaching in government schools, and taught her and several volunteers how to assemble the microscopes and conduct the sessions. The Whitefield Rising members have been able to conduct nine further sessions in Bangalore over the past ~18 months (Fig. 3), using microscope kits provided to them by the BPS. USP has continued to work with members of the Whitefield Rising group during her summer and winter breaks on visits to Bangalore. Recently she has been working with another volunteer group that has expanded the outreach efforts to New Delhi. The light microscopy outreach that our laboratory is conducting in India in collaboration with the BPS is having a positive impact because we have been able to develop a partnership with volunteers in Bangalore and New Delhi. The overall goal is to enhance science education globally, especially in less advantaged schools, by providing a low-cost microscope that can be used to introduce students to scientific concepts. 
    more » « less
  3. Peering is an interconnection arrangement between two networks for the purpose of exchanging traffic between these networks and their customers. Two networks will agree to settlement-free peering if this arrangement is superior for both parties compared to alternative arrangements including paid peering or transit. The conventional wisdom is that two networks agree to settlement-free peering if they receive an approximately equal value from the arrangement. Historically, settlement-free peering was only common amongst tier-1 networks, and these networks commonly require peering at a minimum specified number of interconnection points and only when the traffic ratio is within specified bounds. However, the academic literature does not explain how these requirements relate to the value to each network. More recently, settlement-free peering and paid peering have become common between ISPs and CDNs. In this paper, we construct a network cost model to understand the rationality of common requirements on the number of interconnection points and traffic ratio. We also wish to understand if it is rational to apply these requirements to interconnection between an ISP and a CDN. We construct a model of ISP traffic-sensitive network costs. We consider an ISP that offers service across the US. We parameterize the model using statistics about the population and locations of people in the contiguous US. We consider peering at the locations of the largest interconnection points in the US. We model traffic-sensitive network costs in the ISP’s backbone network, middle-mile networks, and access networks. These costs are thus functions of routing policies, distances, and traffic volumes. To qualify for settlement-free peering, large ISPs commonly require peering at a minimum of 4 to 8 mutually agreeable interconnection points. The academic literature provides little insight into this requirement or how it is related to cost. We show that the traffic-sensitive network cost decreases as the number of interconnection points increases, but with decreasing returns. The requirement to peer at 4 to 8 interconnection points is thus rational, and requiring interconnection at more than 8 points is of little value. Another common requirement is that the ratio of downstream to upstream traffic not exceed 2:1. This is commonly understood to relate to approximately equal value, but the academic literature does not explain why. We show that when downstream traffic exceeds upstream traffic, an ISP gains less from settlement-free peering, and that when the traffic ratio exceeds 2:1 an ISP is likely to perceive insufficient value. Finally, we turn to interconnection between an ISP and a CDN. Large ISPs often assert that CDNs should meet the same requirements on the number of interconnection points and traffic ratio to qualify for settlement-free peering. We show that if the CDN delivers traffic to the ISP locally, then a requirement to interconnect at a minimum number of interconnection points is rational, but a limit on the traffic ratio is not rational. We also show that if the CDN delivers traffic using hot potato routing, the ISP is unlikely to perceive sufficient value to offer settlement-free peering. 
    more » « less
  4. Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)
    The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data [1]. This heterogeneous cluster uses innovative scheduling technology, Slurm [2], that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus [2]. We use TensorFlow [3] as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should produce identical results. (3) A job should produce comparable results if the data is presented in a different order. System optimization requires an ability to directly compare error rates for algorithms evaluated under comparable operating conditions. However, it is a difficult task to exactly reproduce the results for large, complex deep learning systems that often require more than a trillion calculations per experiment [5]. This is a fairly well-known issue and one we will explore in this poster. Researchers must be able to replicate results on a specific data set to establish the integrity of an implementation. They can then use that implementation as a baseline for comparison purposes. A lack of reproducibility makes it very difficult to debug algorithms and validate changes to the system. Equally important, since many results in deep learning research are dependent on the order in which the system is exposed to the data, the specific processors used, and even the order in which those processors are accessed, it becomes a challenging problem to compare two algorithms since each system must be individually optimized for a specific data set or processor. This is extremely time-consuming for algorithm research in which a single run often taxes a computing environment to its limits. Well-known techniques such as cross-validation [5,6] can be used to mitigate these effects, but this is also computationally expensive. These issues are further compounded by the fact that most deep learning algorithms are susceptible to the way computational noise propagates through the system. GPUs are particularly notorious for this because, in a clustered environment, it becomes more difficult to control which processors are used at various points in time. Another equally frustrating issue is that upgrades to the deep learning package, such as the transition from TensorFlow v1.9 to v1.13, can also result in large fluctuations in error rates when re-running the same experiment. Since TensorFlow is constantly updating functions to support GPU use, maintaining an historical archive of experimental results that can be used to calibrate algorithm research is quite a challenge. This makes it very difficult to optimize the system or select the best configurations. The overall impact of all of these issues described above is significant as error rates can fluctuate by as much as 25% due to these types of computational issues. Cross-validation is one technique used to mitigate this, but that is expensive since you need to do multiple runs over the data, which further taxes a computing infrastructure already running at max capacity. GPUs are preferred when training a large network since these systems train at least two orders of magnitude faster than CPUs [7]. Large-scale experiments are simply not feasible without using GPUs. However, there is a tradeoff to gain this performance. Since all our GPUs use the NVIDIA CUDA® Deep Neural Network library (cuDNN) [8], a GPU-accelerated library of primitives for deep neural networks, it adds an element of randomness into the experiment. When a GPU is used to train a network in TensorFlow, it automatically searches for a cuDNN implementation. NVIDIA’s cuDNN implementation provides algorithms that increase the performance and help the model train quicker, but they are non-deterministic algorithms [9,10]. Since our networks have many complex layers, there is no easy way to avoid this randomness. Instead of comparing each epoch, we compare the average performance of the experiment because it gives us a hint of how our model is performing per experiment, and if the changes we make are efficient. In this poster, we will discuss a variety of issues related to reproducibility and introduce ways we mitigate these effects. For example, TensorFlow uses a random number generator (RNG) which is not seeded by default. TensorFlow determines the initialization point and how certain functions execute using the RNG. The solution for this is seeding all the necessary components before training the model. This forces TensorFlow to use the same initialization point and sets how certain layers work (e.g., dropout layers). However, seeding all the RNGs will not guarantee a controlled experiment. Other variables can affect the outcome of the experiment such as training using GPUs, allowing multi-threading on CPUs, using certain layers, etc. To mitigate our problems with reproducibility, we first make sure that the data is processed in the same order during training. Therefore, we save the data from the last experiment and to make sure the newer experiment follows the same order. If we allow the data to be shuffled, it can affect the performance due to how the model was exposed to the data. We also specify the float data type to be 32-bit since Python defaults to 64-bit. We try to avoid using 64-bit precision because the numbers produced by a GPU can vary significantly depending on the GPU architecture [11-13]. Controlling precision somewhat reduces differences due to computational noise even though technically it increases the amount of computational noise. We are currently developing more advanced techniques for preserving the efficiency of our training process while also maintaining the ability to reproduce models. In our poster presentation we will demonstrate these issues using some novel visualization tools, present several examples of the extent to which these issues influence research results on electroencephalography (EEG) and digital pathology experiments and introduce new ways to manage such computational issues. 
    more » « less
  5. <italic>Research Abstract</italic>

    We examine how the US Federal Government governs R&D contracts with private‐sector firms. The government chooses between two contractual forms: grants and cooperative agreements. The latter provides the government substantially greater discretion over, and monitoring of, project progress. Using novel data on R&D contracts and on the technical expertise available in specific government bureau locations, we test implications from the organizational economics and capabilities literatures. We find that cooperative agreements are more likely to be used for early‐stage projects and those for which local government scientific personnel have relevant technical expertise; in turn, cooperative agreements yield greater innovative output as measured by patents, controlling for endogeneity of contract form. The results are consistent with multitask agency and transaction‐cost approaches that emphasize decision rights and monitoring.

    <italic>Managerial Abstract</italic>

    When one private firm outsources an R&D project to another, it can use a range of sophisticated contractual provisions to elicit proper innovative effort. However, government entities are often constrained from employing such provisions due to legal and regulatory restrictions. Policymakers thus face a difficult challenge when contracting with private firms for innovation. We study the US Federal government's R&D contracts, which are restricted to two contractual types: “grants,” which offer little in‐process oversight, and “cooperative agreements,” which provide decision rights during the project. We demonstrate that policymakers can enhance outcomes by using cooperative agreements for earlier‐stage, higher‐uncertainty projects, but only when government scientists with relevant expertise are located near the firm's R&D site.

     
    more » « less