Leveraging Data Characteristics for Bug Localization in Deep Learning Programs

Manke, Ruchira; Wardat, Mohammad; Khomh, Foutse; Rajan, Hridesh

doi:10.1145/3708473

Citation Details

This content will become publicly available on July 31, 2026

Leveraging Data Characteristics for Bug Localization in Deep Learning Programs

Deep Learning (DL) is a class of machine learning algorithms that are used in a wide variety of applications. Like any software system, DL programs can have bugs. To support bug localization in DL programs, several tools have been proposed in the past. As most of the bugs that occur due to improper model structure known as structural bugs lead to inadequate performance during training, it is challenging for developers to identify the root cause and address these bugs. To support bug detection and localization in DL programs, in this article, we propose Theia, which detects and localizes structural bugs in DL programs. Unlike the previous works, Theia considers the training dataset characteristics to automatically detect bugs in DL programs developed using two DL libraries,KerasandPyTorch. Since training the DL models is a time-consuming process, Theia detects these bugs at the beginning of the training process and alerts the developer with informative messages containing the bug’s location and actionable fixes which will help them to improve the structure of the model. We evaluated Theia on a benchmark of 40 real-world buggy DL programs obtained fromStack Overflow. Our results show that Theia successfully localizes 57/75 structural bugs in 40 buggy programs, whereas NeuraLint, a state-of-the-art approach capable of localizing structural bugs before training localizes 17/75 bugs. more »

Award ID(s):: 2512857 2512858

PAR ID:: 10636843

Author(s) / Creator(s):: Manke, Ruchira; Wardat, Mohammad; Khomh, Foutse; Rajan, Hridesh

Publisher / Repository:: ACM

Date Published:: 2025-07-31

Journal Name:: ACM Transactions on Software Engineering and Methodology

Volume:: 34

Issue:: 6

ISSN:: 1049-331X

Page Range / eLocation ID:: 1 to 29

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on July 31, 2026
Journal Article:
https://doi.org/10.1145/3708473

More Like this