- Award ID(s):
- 1835135
- NSF-PAR ID:
- 10295517
- Date Published:
- Journal Name:
- 2020 IEEE International Conference on Cluster Computing (CLUSTER)
- Page Range / eLocation ID:
- 454 to 458
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Large-scale parallel file systems (PFSs) play an essential role in high-performance computing (HPC). However, despite their importance, their reliability is much less studied or understood compared with that of local storage systems or cloud storage systems. Recent failure incidents at real HPC centers have exposed the latent defects in PFS clusters as well as the urgent need for a systematic analysis. To address the challenge, we perform a study of the failure recovery and logging mechanisms of PFSs in this article. First, to trigger the failure recovery and logging operations of the target PFS, we introduce a black-box fault injection tool called PFault , which is transparent to PFSs and easy to deploy in practice. PFault emulates the failure state of individual storage nodes in the PFS based on a set of pre-defined fault models and enables examining the PFS behavior under fault systematically. Next, we apply PFault to study two widely used PFSs: Lustre and BeeGFS. Our analysis reveals the unique failure recovery and logging patterns of the target PFSs and identifies multiple cases where the PFSs are imperfect in terms of failure handling. For example, Lustre includes a recovery component called LFSCK to detect and fix PFS-level inconsistencies, but we find that LFSCK itself may hang or trigger kernel panics when scanning a corrupted Lustre. Even after the recovery attempt of LFSCK, the subsequent workloads applied to Lustre may still behave abnormally (e.g., hang or report I/O errors). Similar issues have also been observed in BeeGFS and its recovery component BeeGFS-FSCK. We analyze the root causes of the abnormal symptoms observed in depth, which has led to a new patch set to be merged into the coming Lustre release. In addition, we characterize the extensive logs generated in the experiments in detail and identify the unique patterns and limitations of PFSs in terms of failure logging. We hope this study and the resulting tool and dataset can facilitate follow-up research in the communities and help improve PFSs for reliable high-performance computing.more » « less
-
Binder is a publicly accessible online service for executing interactive notebooks based on Git repositories. Binder dynamically builds and deploys containers following a recipe stored in the repository, then gives the user a browser-based notebook interface. The Binder group periodically releases a log of container launches from the public Binder service. Archives of launch records are available here. These records do not include identifiable information like IP addresses, but do give the source repo being launched along with some other metadata. The main content of this dataset is in the
binder.sqlite
file. This SQLite database includes launch records from 2018-11-03 to 2021-06-06 in theevents
table, which has the following schema.CREATE TABLE events( version INTEGER, timestamp TEXT, provider TEXT, spec TEXT, origin TEXT, ref TEXT, guessed_ref TEXT ); CREATE INDEX idx_timestamp ON events(timestamp);
version
indicates the version of the record as assigned by Binder. Theorigin
field became available with version 3, and theref
field with version 4. Older records where this information was not recorded will have the corresponding fields set to null.timestamp
is the ISO timestamp of the launchprovider
gives the type of source repo being launched ("GitHub" is by far the most common). The rest of the explanations assume GitHub, other providers may differ.spec
gives the particular branch/release/commit being built. It consists of<github-id>/<repo>/<branch>
.origin
indicates which backend was used. Each has its own storage, compute, etc. so this info might be important for evaluating caching and performance. Note that only recent records include this field. May be null.ref
specifies the git commit that was actually used, rather than the named branch referenced byspec
. Note that this was not recorded from the beginning, so only the more recent entries include it. May be null.- For records where
ref
is not available, we attempted to clone the named reference given byspec
rather than the specific commit (see below). Theguessed_ref
field records the commit found at the time of cloning. If the branch was updated since the container was launched, this will not be the exact version that was used, and instead will refer to whatever was available at the time (early 2021). Depending on the application, this might still be useful information. Selecting only records with version 4 (or non-nullref
) will exclude these guessed commits. May be null.
The Binder launch dataset identifies the source repos that were used, but doesn't give any indication of their contents. We crawled GitHub to get the actual specification files in the repos which were fed into repo2docker when preparing the notebook environments, as well as filesystem metadata of the repos. Some repos were deleted/made private at some point, and were thus skipped. This is indicated by the absence of any row for the given commit (or absence of both
ref
andguessed_ref
in theevents
table). The schema is as follows.CREATE TABLE spec_files ( ref TEXT NOT NULL PRIMARY KEY, ls TEXT, runtime BLOB, apt BLOB, conda BLOB, pip BLOB, pipfile BLOB, julia BLOB, r BLOB, nix BLOB, docker BLOB, setup BLOB, postbuild BLOB, start BLOB );
Here
ref
corresponds toref
and/orguessed_ref
from theevents
table. For each repo, we collected spec files into the following fields (see the repo2docker docs for details on what these are). The records in the database are simply the verbatim file contents, with no parsing or further processing performed.runtime
:runtime.txt
apt
:apt.txt
conda
:environment.yml
pip
:requirements.txt
pipfile
:Pipfile.lock
orPipfile
julia
:Project.toml
orREQUIRE
r
:install.R
nix
:default.nix
docker
:Dockerfile
setup
:setup.py
postbuild
:postBuild
start
:start
The
ls
field gives a metadata listing of the repo contents (excluding the.git
directory). This field is JSON encoded with the following structure based on JSON types:- Object: filesystem directory. Keys are file names within it. Values are the contents, which can be regular files, symlinks, or subdirectories.
- String: symlink. The string value gives the link target.
- Number: regular file. The number value gives the file size in bytes.
CREATE TABLE clean_specs ( ref TEXT NOT NULL PRIMARY KEY, conda_channels TEXT, conda_packages TEXT, pip_packages TEXT, apt_packages TEXT );
The
clean_specs
table provides parsed and validated specifications for some of the specification files (currently Pip, Conda, and APT packages). Each column gives either a JSON encoded list of package requirements, or null. APT packages have been validated using a regex adapted from the repo2docker source. Pip packages have been parsed and normalized using the Requirement class from the pkg_resources package of setuptools. Conda packages have been parsed and normalized using theconda.models.match_spec.MatchSpec
class included with the library form of Conda (distinct from the command line tool). Users might want to use these parsers when working with the package data, as the specifications can become fairly complex.The
missing
table gives the repos that were not accessible, andevent_logs
records which log files have already been added. These tables are used for updating the dataset and should not be of interest to users. -
null (Ed.)Performance analysis is critical for pinpointing bottlenecks in parallel applications. Several profilers exist to instrument parallel programs on HPC systems and gather performance data. Hatchet is an open-source Python library that can read profiling output of several tools, and enables the user to perform a variety of programmatic analyses on hierarchical performance profiles. In this paper, we augment Hatchet to support new features: a query language for representing call path patterns that can be used to filter a calling context tree, visualization support for displaying and interacting with performance profiles, and new operations for performing analyses on multiple datasets. Additionally, we present performance optimizations in Hatchet’s HPCToolkit reader and the unify operation to enable scalable analysis of large datasets.more » « less
-
In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, communication, and load balancing becomes challenging in a cluster environment. In this work, we present MPI-Vector-IO, a parallel I/O library that we have designed using MPI-IO specifically for partitioning and reading irregular vector data formats such as Well Known Text. It makes MPI aware of spatial data, spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. These abstractions along with parallel I/O support are useful for parallel Geographic Information System (GIS) application development on HPC platforms. Performance evaluation is done on Lustre and GPFS filesystems. MPI-Vector-IO scales well with MPI processes and file size and achieves bandwidth up to 22 GB/s for common spatial data access patterns. We observed that independent file read functions performed better than collective functions in MPI-IO for contiguous access pattern on Lustre. In general, the I/O is improved by one to two orders of magnitude over real-world datasets using up to 1152 CPU cores. Spatial Join query is used as an exemplar to demonstrate an end-to-end application using MPI-Vector-IO.more » « less
-
Content delivery networks (CDNs) commonly use DNS to map end-users to the best edge servers. A recently proposed EDNS0-Client-Subnet (ECS) extension allows recursive resolvers to include end-user subnet information in DNS queries, so that authoritative DNS servers, especially those belonging to CDNs, could use this information to improve user mapping. In this paper, we study the ECS behavior of ECS-enabled recursive resolvers from the perspectives of the opposite sides of a DNS interaction, the authoritative DNS servers of a major CDN and a busy DNS resolution service. We find a range of erroneous (i.e., deviating from the protocol specification) and detrimental (even if compliant) behaviors that may unnecessarily erode client privacy, reduce the effectiveness of DNS caching, diminish ECS benefits, and in some cases turn ECS from facilitator into an obstacle to authoritative DNS servers' ability to optimize user-to-edge-server mappings.more » « less