Pfizer Data Mining Focuses on Clinical Trials
Pfizer is stepping up its efforts to get more information from existing clinical trial data. The company is turning to sophisticated data mining techniques to help improve the design of new trials, to better understand possible new uses for existing drugs, and to help examine how drugs are being used after they have been approved.
"We want to milk as much out of the data as possible," says Mani Lakshminarayanan, director/statistical scientist, at Pfizer.
While mining data from past clinical trials makes great sense, companies rarely take another look at such date. Typically, pharmaceutical companies working on an FDA submission do a set of trials as part of new drug application and integrate a summarization of the drug's efficacy and safety into that submission. After that, all other information from clinical trials is simply saved, but not examined again unless additional analyses are requested by regulatory agencies or warranted by internal marketing requirements.
"Today, once a company submits a new drug application to the FDA all the data [from clinical trials] sits collecting dust whether or not the drug is approved," says Michael O'Connell, director of life science solutions at Insightful Corporation.
Deviating from this routine, Pfizer is doing additional exploratory analysis of clinical trial data. "We're using data mining techniques to look for specific or unknown patterns," says Lakshminarayanan.
One way the information gleaned from secondary analysis is being used is to help design new studies. "There is a huge amount of clinical trial data available," says Lakshminarayanan. "We're going through that data (after a submission) and mining the data to better design new studies."
To that end, the information obtained from data mining completed studies might be used to find a sample size or population when designing a new trial. For instance, if a company wants to bring a drug approved in the U.S. to Japan, the company would have to do a bridging study to show that the drug works within the Japanese population.
"From existing data, you can look at the statistics from old trial data and use this information to design a new study," says Lakshminarayanan.
In a similar vein, a company might re-examine clinical trial data once a drug is, say, half way through its patent life. "A company might look to see if there are other uses for an already approved drug or to explore subgroups within the trial population," says O'Connell.
Or a company may simply look, in more detail, for ways to minimize risks associated with a drug. For instance, a company could use data mining techniques to look across many studies for drug interaction or safety issues that impact a particular population (e.g., all people with brown hair and blue eyes).
One factor helping Pfizer with its data mining effort is the advent of newer analysis tools. Specifically, while the work Lakshminarayanan is doing can be done using many standard statistical analysis applications, new tools (in this case, the data mining and analysis workbench Insightful Miner) are helping Lakshminarayanan and his group work closely with other researchers.
In the past, statisticians might use analysis tools that required lots of programming skills. Newer tools, like the Insightful Miner, give statisticians and researchers the ability to apply a wide variety of analysis techniques to a dataset without having to be experts at writing command line programming code. With a tool like Insightful Miner, icons representing analysis steps can be dropped and dragged onto a workflow pallet. And as this is being done, the software handles much of the underlying programming, off-loading some of these tasks from the user.
Are you mining existing clinical trial data? What statistical analysis tools are you using? For what purpose are you undertaking the task? Drop me a note at [email protected] and let me know what you are doing along these lines.
BlueArc Doubles Its Performance
Storage vendor BlueArc has introduced its next-generation Titan 2 family of storage systems targeting high-performance computing (HPC) environments commonly found in life science research.
The Titan 2 is designed for situations where high performance is needed, but a storage area network (SAN) is deemed too costly a solution.
Such high-performance network-attached storage (NAS) systems are increasingly being eyed as a key to improving the speed of high-performance computing applications.
Compared to its previous Titan product line, the Titan 2 offers twice the performance and twice the throughput. Titan 2 scales to 512 terabytes, which is twice the capacity of the original Titan line. Additionally, the new systems support 10-Gigabit Ethernet.
As noted, one approach to improving HPC performance would be to install a SAN, which offers high performance. But the equipment, infrastructure, and management costs of SANs are often much higher than the costs associated with using network-attached storage (NAS).
Unfortunately, many NAS systems bog down and become a performance limiter, particularly when used in applications where many cluster nodes all simultaneously read and write to a shared storage system. This scenario is common in many life science applications.
BlueArc is aiming its systems for somewhere between high-cost SANs and the generic NAS alternatives.
To deliver higher performance than a traditional NAS, BlueArc has taken an architectural approach to its storage systems that mimics what router vendors have done over the years to improve their products' performance.
Specifically, routers and storage servers both started as PCs running software. With early routers, the PCs would run algorithms like RIP or OSPF (routing information protocol and open shortest path first, respectively). With early storage servers, PCs would run file server protocols CIFS or NFS (common Internet file system and network file system, respectively).
As performance demands increased, router and storage vendors started offering devices that included, respectively, a dedicated router or a dedicated file server appliance running a custom operating system. These systems typically incorporated higher-performance backplanes and data input/output interfaces (compared to their PC counterparts) to improve data handling chores. But they still relied on software to perform routing and file services, respectively.
As performance demands increased even more, router vendors went to dedicated routing appliances that performed the actual routing in hardware, not software, as was the case with the previous generation of products.
BlueArc has done the same thing with its Titan storage systems. The architecture is highly parallelized so as not to get bogged down under heavy simultaneous requests to read and write data.
Within the life sciences, BlueArc sees its storage systems being used in high-throughput applications that deal with very large databases.
High-performance storage will be one of the many IT-related topics discussed at the upcoming Life Sciences Conference & Expo to be held in Boston from April 3 to 5. A session titled "Data Storage and Availability" will include speakers from BlueArc, EMC, Isilon, and Revivio. For more information about this and other sessions, go to www.lifesciencesexpo.com.
IT Solutions for Drug Discovery
The lineup for this year's "IT/Informatics Solutions for Drug Discovery" track at the Bio-IT World annual conference (April 3-5, 2006; Boston) makes for impressive reading. Over the course of three days, more than 30 speakers will present the latest examples of technology driving computational drug design, workflow management, integrative informatics, grid computing, the Semantic Web, data storage, and much more.
Here is the complete breakdown of sessions, speakers, and topics. Readers of Inside IT can receive a special early-bird discount by registering for either a Platinum Pass (3 days) or Gold Pass (1 day), by going to www.lifesciencesexpo.com. This offer has been extended until March 10 for readers of Inside IT.
Session 201 Computational Drug Discovery
Derek Debe (Sertanty-Eidogen) De "Know"-vo Drug Discovery
Zhenbin Li (Neurogen) Compound management and data mining in a drug discovery platform using JChem
Jim Wikel (Coalesix) In silico development and optimization of small molecule libraries
Daniel Keesman (GeneData) Understanding bioactivity: Opening discovery informatics to high-content screening
Session 202 Discovery Informatics
John Reynders (Eli Lilly) Integrative informatics
Bryan Koontz (Tripos) Harnessing the collective wisdom of discovery project teams
John Keilty (Infinity Pharmaceuticals) Managing data for "disposable" applications
Dimitris Agrafiotis (Johnson & Johnson) A year into ABCD—has anything changed?
Session 203 IT Infrastructure
Timothy Dion (Biogen-Idec) The road less traveled—A biotech's journey to enterprise Architecture
Samuel Aaronson (HPCGG) IT support for the practice of genetic- and genomic-based personalized medicine
Edwin Addison (TeraDisc) System X and the TeraScale Discovery Consortium
Don Rule (Microsoft) Applying user-centric application design to life science inquiry
Session 204 The Semantic Web
Eric Neumann (Teranode) / Tonya Hongsermeier (Partners HealthCare) / Eric Miller (W3C) The semantic Web for healthcare and life sciences: Applications in translational medicine
Susie Stephens (Oracle) Applications of the semantic Web in life sciences and healthcare
Ken Baclawski (Northeastern University) The Bayesian Web: Adding reasoning with uncertainty to the semantic Web
Matt Shanahan (Teranode) (Title TBA)
Session 205 Text, Data Mining and the Web
Manuel Peitsch (Novartis) The UltraLink: An expert system for contextual hyperlinking in knowledge management
Mike McManus (Fujitsu) High-speed searching of large biological and chemical datasets
Don Taylor (Vivisimo) Bridging information repositories and bioinformatics tools via intelligent query routing, federated search and post-retrieval methodologies.
Eric Gerritsen (BioPeer) Blogs, tags, user-generated taxonomies, and the semantic Web
Session 206 Data Storage and Availability
Sean Lanagan (EMC) Intelligent Archive for Life Sciences—A new class of Storage Solutions
Sam Grocott (Isilon) The Paradigm Shift to Clustered Computing and Storage—and What it Means for Health & Life Science
Mich Fisher (Revivio) Design, Architecture and Attributes of Continuous Data Protection Solutions
James Reaney (Blue Arc) Bigger, faster, more capable: Meeting storage challenges for the life sciences
Session 207 Pipelines and Workflow environment
Stephen Calvert (GSK) Evolution of workflow tools—Meeting the challenges of today's "electronic" science
Mike Peeler (SciTegic) A flexible framework for the creation and publishing of best practices in drug discovery
Christopher Ahlberg (Spotfire) Pharmaceuticals visualize success: Analytics reveal competitive advantage
Liz Kerr (Apple) Visualizing scientific data: Seeing the forest without losing sight of the trees
Chris Dwan (BioTeam) Web services for bioinformatics
Related Bio-IT Stories
Notes from the Lab: Multicore and More
Google, Venter Mum on Collaboration Reports
Oracle Discusses Agenda for Life Sciences, Healthcare