skip to main content

Search for: All records

Creators/Authors contains: "Song, Jie"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Statistical data manipulation is a crucial component of many data science analytic pipelines, particularly as part of data ingestion. This task is generally accomplished by writing transformation scripts in languages such as SPSS, Stata, SAS, R, Python (Pandas) and etc. The disparate data models, language representations and transformation operations supported by these tools make it hard for end users to understand and document the transformations performed, and for developers to port transformation code across languages. Tackling these challenges, we present a formal paradigm for statistical data transformation. It consists of a data model, called Structured Data Transformation Data Model (SDTDM),more »inspired by the data models of multiple statistical transformations frameworks; an algebra, Structural Data Transformation Algebra (SDTA), with the ability to transform not only data within SDTDM but also metadata at multiple structural levels; and an equivalent descriptive counterpart, called Structured Data Transformation Language (SDTL), recently adopted by the DDI Alliance that maintains international standards for metadata as part of its suite of products. Experiments with real statistical transformations on socio-economic data show that SDTL can successfully represent 86.1% and 91.6% respectively of 4,185 commands in SAS and 9,087 commands in SPSS obtained from a repository. We illustrate with examples how SDTA/SDTL could assist with the documentation of statistical data transformation, an important aspect often neglected in metadata of datasets.We propose a system called C2Metadata that automatically captures the transformation and provenance information in SDTL as a part of the metadata. Moreover, given the conversion mechanism from a source statistical language to SDTA/SDTL, we show how functional-equivalent transformation programs could be converted to other functionally equivalent programs, in the same or different language, permitting code reuse and result reproducibility, We also illustrate the possibility of using of SDTA to optimize SDTL transformations using rule-based rewrites similar to SQL optimizations.« less
  2. Laser powder bed fusion (LPBF) has been increasingly used in the fabrication of dense metallic structures. However, the corrosion related properties of LPBF alloys, in particular environment-assisted cracking, such as corrosion fatigue properties, are not well understood. In this study, the corrosion and corrosion fatigue characteristics of LPBF 316L stainless steels (SS) in 3.5 wt.% NaCl solution have been investigated using an electrochemical method, high cycle fatigue, and fatigue crack propagation testing. The LPBF 316L SSs demonstrated significantly improved corrosion properties compared to conventionally manufactured 316L, as reflected by the increased pitting and repassivation potentials, as well as retarded crackmore »initiation. However, the printing parameters did not strongly affect the pitting potentials. LPBF samples also demonstrated enhanced capabilities of repassivation during the fatigue crack propagation. The unique microstructural features introduced during the printing process are discussed. The improved corrosion and corrosion fatigue properties are attributed to the presence of columnar/cellular subgrains formed by dislocation networks that serve as high diffusion paths to transport anti-corrosion elements.« less
  3. Among metal additive manufacturing technologies, additive friction stir deposition stands out for its ability to create freeform and fully-dense structures without melting and solidification. Here, we employ a comparative approach to investigate the process-microstructure linkages in additive friction stir deposition, utilizing two materials with distinct thermomechanical behavior—an Al-Mg-Si alloy and Cu—both of which are challenging to print using beam-based additive processes. The deposited Al-Mg-Si is shown to exhibit a relatively homogeneous microstructure with extensive subgrain formation and a strong shear texture, whereas the deposited Cu is characterized by a wide distribution of grain sizes and a weaker shear texture. Wemore »show evidence that the microstructure in Al-Mg-Si primarily evolves by continuous dynamic recrystallization, including geometric dynamic recrystallization and progressive lattice rotation, while the heterogeneous microstructure of Cu results from discontinuous recrystallization during both deposition and cooling. In Al-Mg-Si, the continuous recrystallization progresses with an increase of the applied strain, which correlates with the ratio between the tool rotation rate and travel velocity. Conversely, the microstructure evolution in Cu is found to be less dependent on , instead varying more with changes to . This difference originates from the absence of Cu rotation in the deposition zone, which reduces the influence of tool rotation on strain development. We attribute the distinct process-microstructure linkages and the underlying mechanisms between Al-Mg-Si and Cu to their differences in intrinsic thermomechanical properties and interactions with the tool head.« less
  4. Structured Data Transformation Language (SDTL) provides structured, machine actionable representations of data transformation commands found in statistical analysis software.   The Continuous Capture of Metadata for Statistical Data Project (C2Metadata) created SDTL as part of an automated system that captures provenance metadata from data transformation scripts and adds variable derivations to standard metadata files.  SDTL also has potential for auditing scripts and for translating scripts between languages.  SDTL is expressed in a set of JSON schemas, which are machine actionable and easily serialized to other formats.  Statistical software languages have a number of special features that have been carried into SDTL. more »We explain how SDTL handles differences among statistical languages and complex operations, such as merging files and reshaping data tables from “wide” to “long”.« less
  5. Datasets are often derived by manipulating raw data with statistical software packages. The derivation of a dataset must be recorded in terms of both the raw input and the manipulations applied to it. Statistics packages typically provide limited help in documenting provenance for the resulting derived data. At best, the operations performed by the statistical package are described in a script. Disparate representations make these scripts hard to understand for users. To address these challenges, we created Continuous Capture of Metadata (C2Metadata), a system to capture data transformations in scripts for statistical packages and represent it as metadata in amore »standard format that is easy to understand. We do so by devising a Structured Data Transformation Algebra (SDTA), which uses a small set of algebraic operators to express a large fraction of data manipulation performed in practice. We then implement SDTA, inspired by relational algebra, in a data transformation specification language we call SDTL. In this demonstration, we showcase C2Metadata’s capture of data transformations from a pool of sample transformation scripts in at least two languages: SPSS®and Stata®(SAS®and R are under development), for social science data in a large academic repository. We will allow the audience to explore C2Metadata using a web-based interface, visualize the intermediate steps and trace the provenance and changes of data at different levels for better understanding of the process.« less