Investigators at Nationwide Children’s Hospital have developed an analysis “pipeline” that slashes the time it takes to search a person’s genome for disease-causing variations from weeks to hours. An article describing the fast, highly scalable software was published recently in Genome Biology.
To overcome the challenges of analyzing that large amount of data, the team developed a computational pipeline called “Churchill.” By using novel computational techniques, Churchill allows efficient analysis of a whole genome sample in as little as 90 minutes. The output of Churchill was validated using National Institute of Standards and Technology (NIST) benchmarks. In comparison with other computational pipelines, Churchill was shown to have the highest sensitivity at 99.7 percent, highest accuracy at 99.99 percent, and highest overall diagnostic effectiveness at 99.66 percent.
By examining the computational resource use during the data analysis process, the team was able to demonstrate that Churchill was both highly efficient (>90 percent resource utilization) and scaled very effectively across many servers. Alternative approaches limit analysis to a single server and have resource utilization as low as 30 percent. This efficiency and capability to scale enables population-scale genomic analysis to be performed.
To demonstrate Churchill’s capability to perform population scale analysis, the researchers received an award from Amazon Web Services (AWS) in Education Research Grants program that enabled them to successfully analyze phase 1 of the raw data generated by the 1000 Genomes Project—an international collaboration to produce an extensive public catalog of human genetic variation, representing multiple populations from around the globe. Using cloud-computing resources from AWS, Churchill was able to complete analysis of 1,088 whole genome samples in seven days and identified millions of new genetics variants.Read the study abstract on the Genome Biology website