LSTM Neural Network Assisted Regex Development for Qualitative Coding

Cai, Z.; Eagan, B.; Marquart C.; Shaffer, D

Citation Details

Regular expression (regex) based automated qualitative coding helps reduce researchers’ effort in manually coding text data, without sacrificing transparency of the coding process. However, researchers using regex based approaches struggle with low recall or high false negative rate during classifier development. Advanced natural language processing techniques, such as topic modeling, latent semantic analysis and neural network classification models help solve this problem in various ways. The latest advance in this direction is the discovery of the so called “negative reversion set (NRS)”, in which false negative items appear more frequently than in the negative set. This helps regex classifier developers more quickly identify missing items and thus improve classification recall. This paper simulates the use of NRS in real coding scenarios and compares the required manual coding items between NRS sampling and random sampling in the process of classifier refinement. The result using one data set with 50,818 items and six associated qualitative codes shows that, on average, using NRS sampling, the required manual coding size could be reduced by 50% to 63%, comparing with random sampling. more »

Award ID(s):: 2100320

PAR ID:: 10354430

Author(s) / Creator(s):: Cai, Z.; Eagan, B.; Marquart C.; Shaffer, D

Editor(s):: Barany, A.; Damsa, C.

Date Published:: 2022-01-01

Journal Name:: Advances in Quantitative Ethnography: Fourth International Conference, International Conference on Quantitative Ethnography 2022

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
The DOI is not currently available.

More Like this