PhD Dissertation Defense - Stephen Ash

Improving Accuracy of Patient Demographic Matching and Identity Resolution

Stephen Ash, PhD Candidate

Wednesday, March 15, 2017, 9:30 am
Dunn Hall 311

Committee Members:

Prof. Vasile Rus, Chair
Prof. David Lin
Prof. Lan Wang
Prof. Vinhthuy Phan

The American healthcare system does not utilize a national patient identifier to locate medical information about an individual. Instead, they must rely on demographic searches, which are imprecise due to natural changes in attributes over time and common typographical variance. To clean up the erroneous duplicate records introduced by this process, many systems utilize simple string similarity techniques and the Fellegi-Sunter Probabilistic Theory of Record Linkage. Our work focuses on improving accuracy in patient record matching by leveraging modern Information Retrieval and Natural Language Processing techniques.

First, we empirically demonstrate the importance of incorporating rich semantic parsing techniques and dependence relationships in the Fellegi-Sunter framework. Second, we explore grapheme to phoneme (G2P) translation using supervised machine learning methods. This approach allows us to build phonetic encoders that are optimized to increase recall in multicultural personal name queries. Lastly, we propose a method of generating synthetic patient demographic records using statistical profiles from real data. The lack of high-quality public datasets to use in benchmarking hinders innovation for the problem of demographic matching. Previous synthetic data generators produce datasets that are measurably different from real data in ways that over-simplify the matching problem. We suggest a simulation-based method using probabilistic graphical models and statistical disclosure control techniques. To quantify our results, we propose a number of measures to evaluate the data quality and complexity of semi-structured demographic attributes.