Sequence mappability is an important task in genome resequencing. In the (
Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wetlab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming.
In this work, we propose a more efficient model, called deep hash learning proteinandprotein interaction (DHLPPI), to predict allagainstall PPI relationships in a database of proteins. First, DHLPPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the prescreening stage of DHLPPI, the string matching problem of comparing a protein sequence against a database with
The experimental results confirmed that DHLPPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHLPPI is shown to be superior or competitive when compared to the other stateoftheart methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHLPPI reduced the time complexity from
 Award ID(s):
 1816005
 NSFPAR ID:
 10368623
 Publisher / Repository:
 Springer Science + Business Media
 Date Published:
 Journal Name:
 BMC Bioinformatics
 Volume:
 23
 Issue:
 1
 ISSN:
 14712105
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Abstract k ,m )mappability problem, for a given sequenceT of lengthn , the goal is to compute a table whosei th entry is the number of indices such that the length$$j \ne i$$ $j\ne i$m substrings ofT starting at positionsi andj have at mostk mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of . We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for$$k=1$$ $k=1$ , works in$$k=O(1)$$ $k=O\left(1\right)$ space and, with high probability, in$$O(n)$$ $O\left(n\right)$ time. Our algorithm requires a careful adaptation of the$$O(n \cdot \min \{m^k,\log ^k n\})$$ $O(n\xb7min\{{m}^{k},{log}^{k}n\})$k errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the allpairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop time algorithms to compute$$O(n^2)$$ $O\left({n}^{2}\right)$all (k ,m )mappability tables for a fixedm and all or a fixed$$k\in \{0,\ldots ,m\}$$ $k\in \{0,\dots ,m\}$k and all . Finally, we show that, for$$m\in \{k,\ldots ,n\}$$ $m\in \{k,\dots ,n\}$ , the ($$k,m = \Theta (\log n)$$ $k,m=\Theta (logn)$k ,m )mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018. 
Abstract Background Given a sequencing read, the broad goal of read mapping is to find the location(s) in the reference genome that have a “similar sequence”. Traditionally, “similar sequence” was defined as having a high alignment score and read mappers were viewed as heuristic solutions to this welldefined problem. For sketchbased mappers, however, there has not been a problem formulation to capture what problem an exact sketchbased mapping algorithm should solve. Moreover, there is no sketchbased method that can find all possible mapping positions for a read above a certain score threshold.
Results In this paper, we formulate the problem of read mapping at the level of sequence sketches. We give an exact dynamic programming algorithm that finds all hits above a given similarity threshold. It runs in
time and$$\mathcal {O} (t + p + \ell ^2)$$ $O\left(\rightt+p+{\ell}^{2})$ space, where $$\mathcal {O} (\ell \log \ell )$$ $O(\ell log\ell )$t  is the number of mers inside the sketch of the reference, $$k$$ $k$p  is the number of mers inside the read’s sketch and$$k$$ $k$ is the number of times that$$\ell$$ $\ell $ mers from the pattern sketch occur in the sketch of the text. We evaluate our algorithm’s performance in mapping long reads to the T2T assembly of human chromosome Y, where ampliconic regions make it desirable to find all good mapping positions. For an equivalent level of precision as minimap2, the recall of our algorithm is 0.88, compared to only 0.76 of minimap2.$$k$$ $k$ 
Abstract Let
be an elliptically fibered$$X\rightarrow {{\mathbb {P}}}^1$$ $X\to {P}^{1}$K 3 surface, admitting a sequence of Ricciflat metrics collapsing the fibers. Let$$\omega _{i}$$ ${\omega}_{i}$V be a holomorphicSU (n ) bundle overX , stable with respect to . Given the corresponding sequence$$\omega _i$$ ${\omega}_{i}$ of Hermitian–Yang–Mills connections on$$\Xi _i$$ ${\Xi}_{i}$V , we prove that, ifE is a generic fiber, the restricted sequence converges to a flat connection$$\Xi _i_{E}$$ ${\Xi}_{i}{}_{E}$ . Furthermore, if the restriction$$A_0$$ ${A}_{0}$ is of the form$$V_E$$ ${V}_{E}$ for$$\oplus _{j=1}^n{\mathcal {O}}_E(q_j0)$$ ${\oplus}_{j=1}^{n}{O}_{E}({q}_{j}0)$n distinct points , then these points uniquely determine$$q_j\in E$$ ${q}_{j}\in E$ .$$A_0$$ ${A}_{0}$ 
Abstract Matrix reduction is the standard procedure for computing the persistent homology of a filtered simplicial complex with
m simplices. Its output is a particular decomposition of the total boundary matrix, from which the persistence diagrams and generating cycles are derived. Persistence diagrams are known to vary continuously with respect to their input, motivating the study of their computation for timevarying filtered complexes. Computing persistence dynamically can be reduced to maintaining a valid decomposition under adjacent transpositions in the filtration order. Since there are such transpositions, this maintenance procedure exhibits limited scalability and is often too fine for many applications. We propose a coarser strategy for maintaining the decomposition over a 1parameter family of filtrations. By reduction to a particular longest common subsequence problem, we show that the minimal number of decomposition updates$$O(m^2)$$ $O\left({m}^{2}\right)$d can be found in time and$$O(m \log \log m)$$ $O(mloglogm)$O (m ) space, and that the corresponding sequence of permutations—which we call aschedule —can be constructed in time. We also show that, in expectation, the storage needed to employ this strategy is actually sublinear in$$O(d m \log m)$$ $O(dmlogm)$m . Exploiting this connection, we show experimentally that the decrease in operations to compute diagrams across a family of filtrations is proportional to the difference between the expected quadratic number of states and the proposed sublinear coarsening. Applications to video data, dynamic metric space data, and multiparameter persistence are also presented. 
Abstract Let
M (x ) denote the largest cardinality of a subset of on which the Euler totient function$$\{n \in \mathbb {N}: n \le x\}$$ $\{n\in N:n\le x\}$ is nondecreasing. We show that$$\varphi (n)$$ $\phi \left(n\right)$ for all$$M(x) = \left( 1+O\left( \frac{(\log \log x)^5}{\log x}\right) \right) \pi (x)$$ $M\left(x\right)=\left(1,+,O,\left(\frac{{(loglogx)}^{5}}{logx}\right)\right)\pi \left(x\right)$ , answering questions of Erdős and Pollack–Pomerance–Treviño. A similar result is also obtained for the sum of divisors function$$x \ge 10$$ $x\ge 10$ .$$\sigma (n)$$ $\sigma \left(n\right)$