To facilitate a better understanding of building codes, the visualization of the embedded structures of the provisions and requirements of the codes is needed. Existing research efforts in building code compliance checking mostly do not purposefully represent building codes in formats that facilitate human understanding and interaction with the codes, such as XML and hypertext (text with links to other text). Visual programming commonly represents building codes more visually as flowcharts. However, flowcharts are static, and the generation of flowcharts is still manual. To address this lack of interactive visual representation of building code requirement structures, this paper proposes an automated building code structure extraction and visualization method for visualizing building code contents in a way that clearly shows the inter-connections between requirements and allows intuitive user interaction. In this method, to extract the chapter-section-subsection hierarchical structure and cross-reference structure, a new extraction method named Building Code Network Generator (BCNG) is proposed to automatically generate an interactive visualization using a directed network. The performance of the proposed BCNG was empirically tested on Chapters 5 and 10 of the International Building Code 2015, with a resulting precision, recall, and F1-score of 99.4%, 96.3%, and 97.8%, respectively. In addition, the extracted hierarchical and cross-reference structures were displayed using an open-source network visualization tool to facilitate human understanding and interactions with the building code requirements in automated compliance checking systems.
more »
« less
FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding
Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex scientific flowcharts and simulated flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature and the simulated subset contains 10,000 flowcharts created using a customizable script. The dataset is enriched with annotations for visual components, OCR, Mermaid code representation, and VQA question-answer pairs. Despite the proven capabilities of Large Vision-Language Models (LVLMs) in various visual understanding tasks, their effectiveness in decoding flowcharts—a crucial element of scientific communication—has yet to be thoroughly investigated. The FlowLearn test set is crafted to assess the performance of LVLMs in flowchart comprehension. Our study thoroughly evaluates state-of-the-art LVLMs, identifying existing limitations and establishing a foundation for future enhancements in this relatively underexplored domain. For instance, in tasks involving simulated flowcharts, GPT-4V achieved the highest accuracy (58\%) in counting the number of nodes, while Claude recorded the highest accuracy (83\%) in OCR tasks. Notably, no single model excels in all tasks within the FlowLearn framework, highlighting significant opportunities for further development.
more »
« less
- Award ID(s):
- 2333789
- PAR ID:
- 10544557
- Publisher / Repository:
- European Conference on Artificial Intelligence (ECAI)
- Date Published:
- Format(s):
- Medium: X
- Location:
- Spain
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Applying AI power to predict syntheses of novel materials requires high-quality, large-scale datasets. Extraction of synthesis information from scientific publications is still challenging, especially for extracting synthesis actions, because of the lack of a comprehensive labeled dataset using a solid, robust, and well-established ontology for describing synthesis procedures. In this work, we propose the first unified language of synthesis actions (ULSA) for describing inorganic synthesis procedures. We created a dataset of 3040 synthesis procedures annotated by domain experts according to the proposed ULSA scheme. To demonstrate the capabilities of ULSA, we built a neural network-based model to map arbitrary inorganic synthesis paragraphs into ULSA and used it to construct synthesis flowcharts for synthesis procedures. Analysis of the flowcharts showed that (a) ULSA covers essential vocabulary used by researchers when describing synthesis procedures and (b) it can capture important features of synthesis protocols. The present work focuses on the synthesis protocols for solid-state, sol–gel, and solution-based inorganic synthesis, but the language could be extended in the future to include other synthesis methods. This work is an important step towards creating a synthesis ontology and a solid foundation for autonomous robotic synthesis.more » « less
-
Computational thinking has widely been recognized as a crucial skill for engineers engaged in problem-solving. Multidisciplinary learning environments such as integrated STEM courses are powerful spaces where computational thinking skills can be cultivated. However, it is not clear the best ways to integrate computational thinking instruction or how students develop computational thinking in those spaces. Thus, we wonder: To what extent does engaging students in integrated engineering design and physics labs impact their development of computational thinking? We have incorporated engineering design within a traditional introductory calculus-based physics lab to promote students’ conceptual understanding of physics while fostering scientific inquiry, mathematical modeling, engineering design, and computational thinking. Using a generic qualitative research approach, we explored the development of computational thinking for six teams when completing an engineering design challenge to propose an algorithm to remotely control an autonomous guided vehicle throughout a warehouse. Across five consecutive lab sessions, teams represented their algorithms using a flowchart, completing four iterations of their initial flowchart. 24 flowcharts were open coded for evidence of four computational thinking facets: decomposition, abstraction, algorithms, and debugging. Our results suggest that students’ initial flowcharts focused on decomposing the problem and abstracting aspects that teams initially found to be more relevant. After each iteration, teams refined their flowcharts using pattern recognition, algorithm design, efficiency, and debugging. The teams would benefit from having more feedback about their understanding of the problem, the relevant physics concepts, and the logic and efficiency of the flowchartsmore » « less
-
null (Ed.)CNNs (Convolutional Neural Networks) are becoming increasingly important for real-time applications, such as image classification in traffic control, visual surveillance, and smart manufacturing. It is challenging, however, to meet timing constraints of image processing tasks using CNNs due to their complexity. Performing dynamic trade-offs between the inference accuracy and time for image data analysis in CNNs is challenging too, since we observe that more complex CNNs that take longer to run even lead to lower accuracy in many cases by evaluating hundreds of CNN models in terms of time and accuracy using two popular data sets, MNIST and CIFAR-10. To address these challenges, we propose a new approach that (1) generates CNN models and analyzes their average inference time and accuracy for image classification, (2) stores a small subset of the CNNs with monotonic time and accuracy relationships offline, and (3) efficiently selects an effective CNN expected to support the highest possible accuracy among the stored CNNs subject to the remaining time to the deadline at run time. In our extensive evaluation, we verify that the CNNs derived by our approach are more flexible and cost-efficient than two baseline approaches. We verify that our approach can effectively build a compact set of CNNs and efficiently support systematic time vs. accuracy trade-offs, if necessary, to meet the user-specified timing and accuracy requirements. Moreover, the overhead of our approach is little/acceptable in terms of latency and memory consumption.more » « less
-
Historical data sources, like medical records or biological collections, consist of unstructured heterogeneous content: handwritten text, different sizes and types of fonts, and text overlapped with lines, images, stamps, and sketches. The information these documents can provide is important, from a historical perspective and mainly because we can learn from it. The automatic digitization of these historical documents is a complex machine learning process that usually produces poor results, requiring costly interventions by experts, who have to transcribe and interpret the content. This paper describes hybrid (Human- and Machine-Intelligent) workflows for scientific data extraction, combining machine-learning and crowdsourcing software elements. Our results demonstrate that the mix of human and machine processes has advantages in data extraction time and quality, when compared to a machine-only workflow. More specifically, we show how OCRopus and Tesseract, two widely used open source Optical Character Recognition (OCR) tools, can improve their accuracy by more than 42%, when text areas are cropped by humans prior to OCR, while the total time can increase or decrease depending on the OCR selection. The digitization of 400 images, with Entomology, Bryophyte, and Lichen specimens, is evaluated following four different approaches: processing the whole specimen image (machine-only), processing crowd cropped labels (hybrid), processing crowd cropped fields (hybrid), and cleaning the machine-only output. As a secondary result, our experiments reveal differences in speed and quality between Tesseract and OCRopus.more » « less
An official website of the United States government

