Barany, A.; Damsa, C.
                            (Ed.)
                        
                    
            
                            Regular expression (regex) based automated qualitative coding helps reduce researchers’ effort in manually coding text data, without sacrificing transparency of the coding process. However, researchers using regex based approaches struggle with low recall or high false negative rate during classifier development. Advanced natural language processing techniques, such as topic modeling, latent semantic analysis and neural network classification models help solve this problem in various ways. The latest advance in this direction is the discovery of the so called “negative reversion set (NRS)”, in which false negative items appear more frequently than in the negative set. This helps regex classifier developers more quickly identify missing items and thus improve classification recall. This paper simulates the use of NRS in real coding scenarios and compares the required manual coding items between NRS sampling and random sampling in the process of classifier refinement. The result using one data set with 50,818 items and six associated qualitative codes shows that, on average, using NRS sampling, the required manual coding size could be reduced by 50% to 63%, comparing with random sampling. 
                        more » 
                        « less   
                     An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    