Engineering of an artificial intelligence safety data sheet document processing system for environmental, health, and safety compliance

Fenton, Kevin; Simske, Steven

doi:10.1145/3469096.3474933

Chemical Safety Data Sheets (SDS) are the primary method by which chemical manufacturers communicate the ingredients and hazards of their products to the public. These SDSs are used for a wide variety of purposes ranging from environmental calculations to occupational health assessments to emergency response measures. Although a few companies have provided direct digital data transfer platforms using xml or equivalent schemata, the vast majority of chemical ingredient and hazard communication to product users still occurs through the use of millions of PDF documents that are largely loaded through manual data entry into downstream user databases. This research focuses on the reverse engineering of SDS document types to adapt to various layouts and the harnessing of meta-algorithmic and neural network approaches to provide a means of moving industrial institutions towards a digital universal SDS processing methodology. The complexities of SDS documents including the lack of format standardization, text and image combinations, and multi-lingual translation needs, combined, limit the accuracy and precision of optical character recognition tools. The approach in this document is to translate entire SDSs from thousands of chemical vendors, each with distinct formatting, to machine-encoded text with a high degree of accuracy and precision. Then the system will "read" and assess these documents as a human would; that is, ensuring that the documents are compliant, determining whether chemical formulations have changed, ensuring reported values are within expected thresholds, and comparing them to similar products for more environmentally friendly alternatives.

More Like this