skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HTCondor data movement at 100 Gbps
HTCondor is a major workload management system used in distributed high throughput computing (dHTC) environments, e.g., the Open Science Grid. One of the distinguishing features of HTCondor is the native support for data movement, allowing it to operate without a shared filesystem. Coupling data handling and compute scheduling is both convenient for users and allows for significant infrastructure flexibility but does introduce some limitations. The default HTCondor data transfer mechanism routes both the input and output data through the submission node, making it a potential bottleneck. In this document we show that by using a node equipped with a 100 Gbps network interface (NIC) HTCondor can serve data at up to 90 Gbps, which is sufficient for most current use cases, as it would saturate the border network links of most research universities at the time of writing.  more » « less
Award ID(s):
2030508
PAR ID:
10357989
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2021 IEEE 17th International Conference on eScience (eScience)
Issue:
September 2021
Page Range / eLocation ID:
239 to 240
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Network embedding has attracted a surge of attention in recent years. It is to learn the low-dimensional representation for nodes in a network, which benefits downstream tasks such as node classification and link prediction. Most of the existing approaches learn node representations only based on the topological structure, yet nodes are often associated with rich attributes in many real-world applications. Thus, it is important and necessary to learn node representations based on both the topological structure and node attributes. In this paper, we propose a novel deep attributed network embedding approach, which can capture the high non-linearity and preserve various proximities in both topological structure and node attributes. At the same time, a novel strategy is proposed to guarantee the learned node representation can encode the consistent and complementary information from the topological structure and node attributes. Extensive experiments on benchmark datasets have verified the effectiveness of our proposed approach. 
    more » « less
  2. Biscarat, C.; Campana, S.; Hegner, B.; Roiser, S.; Rovelli, C.I.; Stewart, G.A. (Ed.)
    CMS is tackling the exploitation of CPU resources at HPC centers where compute nodes do not have network connectivity to the Internet. Pilot agents and payload jobs need to interact with external services from the compute nodes: access to the application software (CernVM-FS) and conditions data (Frontier), management of input and output data files (data management services), and job management (HTCondor). Finding an alternative route to these services is challenging. Seamless integration in the CMS production system without causing any operational overhead is a key goal. The case of the Barcelona Supercomputing Center (BSC), in Spain, is particularly challenging, due to its especially restrictive network setup. We describe in this paper the solutions developed within CMS to overcome these restrictions, and integrate this resource in production. Singularity containers with application software releases are built and pre-placed in the HPC facility shared file system, together with conditions data files. HTCondor has been extended to relay communications between running pilot jobs and HTCondor daemons through the HPC shared file system. This operation mode also allows piping input and output data files through the HPC file system. Results, issues encountered during the integration process, and remaining concerns are discussed. 
    more » « less
  3. Ensuring high availability (HA) for software-based networks is a critical design feature that will help the adoption of software-based network functions (NFs) in production networks. It is important for NFs to avoid outages and maintain mission-critical operations. However, HA support for NFs on the critical data path can result in unacceptable performance degradation. We present REINFORCE, an integrated framework to support efficient resiliency for NFs and NF service chains. REINFORCE includes timely failure detection and consistent failover mechanisms. REINFORCE replicates state to standby NFs (local and remote) while enforcing correctness. It minimizes the number of state transfers by exploiting the concept of external synchrony, and leverages opportunistic batching and multi-buffering to optimize performance. Experimental results show that, even at line-rate packet processing (10 Gbps), REINFORCE achieves chain-level failover across servers in a LAN (or within the same node) within 10ms (100/μs), incurring less than 10% (1%) performance overhead, and adds average latency of only ~400/μs (5/μs), with a worst-case latency of less than 1ms (10/μs). 
    more » « less
  4. A source node updates its status as a point process and also forwards its updates to a network of observer nodes. Within the network of observers, these updates are forwarded as point processes from node to node. Each node wishes its knowledge of the source to be as timely as possible. In this network, timeliness is measured by a discrete form of age of information: each status change at the source is referred to as a version and the age at a node is how many versions out of date is its most recent update from the source. This work introduces a method for evaluating the average version age at each node in the network when nodes forward updates using a memoryless gossip protocol. This method is then demonstrated by version age analysis for a collection of simple networks. For gossip on a complete graph with symmetric updating rates, it is shown that each node has average age that grows as the logarithm of the network size. 
    more » « less
  5. Network alignment, which aims to find the node correspondence across multiple networks, is a fundamental task in many areas, ranging from social network analysis to adversarial activity detection. The state-of-the-art in the data mining community often view the node correspondence as a probabilistic cross-network node similarity, and thus inevitably introduce an O(n2) lower bound on the computational complexity. Moreover, they might ignore the rich patterns (e.g., clusters) accompanying the real networks. In this paper, we propose a multilevel network alignment algorithm (Moana) which consists of three key steps. It first efficiently coarsens the input networks into their structured representations, and then aligns the coarsest representations of the input networks, followed by the interpolations to obtain the alignment at multiple levels including the node level at the finest granularity. The proposed coarsen-align-interpolate method bears two key advantages. First, it overcomes the O(n2) lower bound, achieving a linear complexity. Second, it helps reveal the alignment between rich patterns of the input networks at multiple levels (e.g., node, clusters, super-clusters, etc.). Extensive experimental evaluations demonstrate the efficacy of the proposed algorithm on both the node-level alignment and the alignment among rich patterns (e.g., clusters) at different granularities. 
    more » « less