If you put 10,000 scientists and researchers together, can they rapidly come up with better ways to diagnose lung cancer? That was the goal of the 2017 Data Science Bowl (DSB), which is described as a competition that harnesses the power of data science and crowdsourcing to tackle some of the world’s toughest problems.
|For more articles on lung cancer, click here.
“This year’s competition brought together specialized image-based analytics experts to solve a narrowly focused problem: teaching machines to read low-dose computed tomography (CT) scans,” explained Eric Syphard, Chief Technologist at Booz Allen Hamilton, which sponsors the annual competition.
For this event, nearly 10,000 participants had access to high-resolution lung scans from an extremely large dataset provided by the National Cancer Institute (NCI). The goal: To create computer algorithms that accurately determine when lesions in the lungs are cancerous. If developed, such algorithms could dramatically decrease the high false positive rate of current low-dose CT technology.
In this interview, Keyvan Farahani, PhD, who provided scientific guidance for the competition’s design and dataset, discusses the potential for computer-assisted detection of lung cancer and how a competition such as the DSB can advance the field of precision medicine.
MDLinx: What is the current status of computer-assisted diagnostics for lung cancer detection?
Dr. Farahani: Most of what’s commercially available for computer-aided detection and diagnosis of lung cancer is based on older technologies and not very accurate. New approaches based on machine learning, including algorithms that were submitted in the DSB challenge, are expected to be much more accurate. Of course, more data from a variety of sources are needed to further train, validate, and translate these models, but the early results—even based on the 2017 DSB challenge—are promising.
MDLinx: How can an open, crowd-sourced competition such as the Data Science Bowl help to develop new algorithms?
Dr. Farahani: The purpose of a crowd-sourcing approach, as taken in DSB, is to reach a wider global pool of developers, some of whom may come from other disciplines and may potentially offer innovative approaches. Virtually all of the top 10 teams competing in the DSB lung screening challenge utilized deep learning in their algorithms. Furthermore, the requirement for winning a prize, which was making the solutions (codes) open source and publicly available, would put these solutions in the public domain and allow others to build on the success of the winning teams to further advance the field. In addition, there may be a possibility for combining the ‘best of breed’ in developing a super algorithm that could offer potential advantages over original algorithms.
MDLinx: The goal was to develop machine-learning algorithms that could decrease the false positive rate of low-dose CT scans. What hurdles did participants have to overcome to attempt to reach this goal?
Dr. Farahani: The fundamental problem is the precise identification of suspicious nodules and their classification as cancer or non-cancer indications. As a result, there could be wide-ranging observer variability among radiologists (or machines for that matter) as far as the number and size of nodules and their type and cancer classification (benign vs malignant).
MDLinx: What types of imaging can machine-learning algorithms take into account? Can different types of imaging be combined to improve the accuracy of computer-assisted diagnostics?
Dr. Farahani: Current approaches in deep learning in medical imaging are largely based on a single imaging modality, either CT or magnetic resonance imaging (MRI). Theoretically it is possible to use knowledge gained from the deep learning application of one data type to further train an algorithm operating on another data type—a concept that is referred to as transfer learning.
Currently, there are two types of imaging applied to lung cancer screening: X-ray radiography and low-dose CT. However, X-ray radiography has not proved to be very effective in screening. Again, the best hope is to have access to additional data sets to improve the performance of deep-learning algorithms. In addition, it is likely that addition of other clinical and/or demographic data may help improve the accuracy of computer-assisted techniques. This has already been shown in what is known as the McWilliams (or the PanCan) model.
MDLinx: Will computers replace radiologists and/or oncologists for identifying and diagnosing lung cancer?
Dr. Farahani: Computers could play an increasingly important role in medicine, particularly in imaging screening exams, where large amounts of data need to be processed to find what may be subtle signs of a disease. Therefore, they could prove to be valuable tools in support of radiologists and/or oncologists, not only in detection and diagnosis, but also prediction and evaluation of response to therapies, and implementation of precision medicine.
About Dr. Farahani: Keyvan Farahani, PhD, is Program Director of Image-Guided Interventions in the Cancer Imaging Program at the National Cancer Institute, in Bethesda, MD.