Addressing the risk of data bias
It also takes a clinical expert to understand the limitations of a model or algorithm, and to see how bias could inadvertently creep in.
A ‘data-driven’ approach has the connotation of being objective. This may blind us to the fact that the output of a deep learning program depends on the data it was trained on, and the input it is given.
For example, if I train a deep learning program to suggest a care plan for patients with congestive heart failure, the model will only work for the particular type of patients it was trained on. If I apply the same model to a patient who has a related but different heart condition, like hypertrophic cardiomyopathy, the model will still give me a recommendation – but it won’t be a reliable one.
You could see how bias could easily creep in, if your training data doesn’t fully reflect the target population. As an industry, we need to keep this in mind as we scale AI-enabled solutions across the globe. For example, a deep learning program trained on data from patients in Beijing could lead to erroneous conclusions when applied to patients in New York, and vice versa. Of course, the program can be retrained with local data, or other corrections can be applied. But again, it takes a clinical expert to understand contextual nuances like this, and to deal with them wisely.
What these example also highlight, is that current-day applications of AI are inherently narrow: they perform a specific task, which can be extremely useful, but they can’t reason beyond that (as opposed to general AI, which emulates human thinking, and which only exists in sci-fi movies today).
If I describe my car to a deep learning program that was designed to detect measles, and if I tell it that my car has rust spots and tends to overheat, the best the program can do is tell me that my car has the measles!
Of course, this is another absurd example, but it shows why data science and clinical domain knowledge need to go hand in hand.