By Allison Proffitt
June 25, 2008 | Sophic has announced $1.3 million of funding from the National Cancer Institute to complete the Cancer Gene Index Project over the next 12 months. Sophic started the project in June 2004 with the goal of mining 8.8 million Medline abstracts to identify suspected cancer genes and manually annotate gene-disease and gene-compound relationships. So far 4,658 cancer genes have been made publically available on the NCI website.
“We’ve completed four years of work and this phase is the completion phase, which means we will have completely analyzed and annotated the 6,610 identified cancer genes with manual annotations for role codes and evidence codes,” Patrick Blake, Sophic’s CEO told Bio-IT World. “I just attended the caBIG conference and they are identifying this dataset as the backbone for cancer research across the cancer community. We’re proud and we’re thrilled to be able to offer this asset to people who are fighting this terrible disease.”
The fifth phase of the project, announced on Monday, will bring the total number of cancer related genes indexed to 6,610. Sophic has completed the work in conjunction with NCI and Biomax Informatics AG of Munich, Germany in what Blake calls “a true collaboration” using Biomax’ BioLT literature mining tool. “We developed a ‘factory assembly line’ methodology that allows the automated text mining results to be fed into the scientific team who curate and annotate the information in an efficient, quality-controlled, work-flow process,” said Klaus Heumann, CEO of Biomax. The phase-based strategy has been designed so that “nothing is missed” and that all cancer genes, cancer types, and compounds and treatments related to cancer genes are examined.