Machine Learning for High Content Screening
Biological insights that might take months to generate using time-consuming lab experiments and human visual inspection can be revealed much faster using automated computer algorithms. Machine learning has the potential to shrink drug discovery timelines, helping patients gain quicker access to new therapies.
Existing approaches to train neural networks that use large images require to either crop or down-sample data during preprocessing, use small batch sizes, or split the model across devices mainly due to the prohibitively limited memory capacity available on GPUs and emerging accelerators. These techniques often lead to longer time to convergence or time to train (TTT), and in some cases, lower model accuracy. CPUs, on the other hand, can leverage significant amounts of memory. While much work has been done on parallelizing neural network training on multiple CPUs, little attention has been given to tune neural network training with large images on CPUs. In this work, we train a multiscale convolutional neural network (M-CNN) to classify large biomedical images for high content screening in one hour. The ability to leverage large memory capacity on CPUs enables us to scale to larger batch sizes without having to crop or down-sample the input images. In conjunction with large batch sizes, we find a generalized methodology of linearly scaling of learning rate and train M-CNN to state-of-the-art (SOTA) accuracy of 99% within one hour. We achieve fast time to convergence using 128 two socket Intel Xeon 6148 processor nodes with 192GB DDR4 memory connected with 100Gbps Intel Omnipath architecture.
Read the latest scientific paper to learn more
Citation: Kushal Datta, Imtiaz Hossain, Sun Choi, Vikram Saletore, Kyle Ambert, William J. Godinez, Xian Zhang; Training Multiscale-CNN for Large Microscopy Image Classification in One Hour, Workshop on Scalable Data Analytics in Scientific Computing, International SuperComputing 2019, Frankfurt, Germany, arXiv:1910.04852

Kristina Kermanshahche, Founder and CEO of Perspicace Inc., presented “Machine Learning for High-Content Screening” at the DDN Life Sciences Field Day, hosted by Rockefeller University, New York.
Biological insights that might take months to generate using time-consuming lab experiments and human visual inspection can be revealed much faster using automated computer algorithms. Machine learning has the potential to shrink drug discovery timelines, helping patients gain quicker access to new therapies. Learn how TensorFlow MCNNs can dramatically scale the phenotyping of high-content cellular images.
Watch the video.
Access the slides.
Learn from the other presenters, #LSFD18.
Read the scientific article to learn more about the algorithms and methodology applied.
Citation: William J. Godinez, Imtiaz Hossain, Stanley E. Lazic, John W. Davies, Xian Zhang; A multi-scale convolutional neural network for phenotyping high-content cellular images, Bioinformatics, Volume 33, Issue 13, 1 July 2017, Pages 2010–2019, https://doi.org/10.1093/bioinformatics/btx069.
Watch Novartis scientist, Mark Bray, explain how machine learning is used to accelerate drug discovery
View Novartis press release, "Machine learning poised to accelerate drug discovery"
Watch Intel research scientist Kushal Datta present at the 2018 Intel AI Developers Conference in San Francisco.
Achieved state-of-the-art accuracy with a time-to-train of 31 minutes, an overall speed up of 21.7x going from 1x Intel® Xeon Phi™ 7290F to 8x Intel® Xeon® 6148 processor, Intel-optimized TensorFlow 1.7, Horovod, MKL-DNN, MPI, 4 workers per node, Intel® Omni-Path Fabric
Access the slides
Watch Novartis scientist Jeremy Jenkins describe his vision for machine learning at Novartis Institute for BioMedical Research (NIBR)
Learn More
Novartis Institutes for BioMedical Research (NIBR), visit https://www.novartis.com/tags/novartis-institutes-biomedical-research
Intel® AI Builders, visit https://builders.intel.com/ai
Intel® AI Academy, visit https://software.intel.com/en-us/ai-academy
Get the latest Intel optimizations — Intel® optimized TensorFlow Wheel, Intel® optimized Python Anaconda
Read Perspicace Data Points blog including additional machine learning resources
Acknowledgements
Novartis: Imtiaz Hossain, Xian Zhang, William J. Godinez, Mark Bray, Jeremy Jenkins, Steve Litster, Michael Derby, Michael Steeves, Wolfgang Zipfel
Intel: Kushal Datta, Sun Choi, Vikram Saletore, Rakib Sarwar, Kyle Ambert, Ananth Sankaranarayanan, Joe Bailey, Oleg Stepanov, Mike Demshki, Patrick Messmer, Nisrine Belhaj-Soulami
Marketing & Video Production: Julie Choi, Kristine Raabe, Kathleen Ellertson, Amber Jackson