Thursday, December 9, 2010
Wednesday, December 1, 2010
Richa Agarwala and Alejandro Schäffer are working together and separately on various software packages for analysis of genetic data. This page briefly summarizes several ongoing projects and provides hyperlinks to a more detailed page about each project, download software, and references for papers.
Summary of ongoing projects
Alejandro Schäffer has led the development of the FASTLINK software package for genetic linkage analysis. Genetic linkage analysis is a statistical technique used to map genes and find the approximate locations of disease genes. FASTLINK aims to replace the main programs of the widely used package LINKAGE by doing the same computations faster. FASTLINK can also run in parallel either on a shared-memory computer or on a network of workstations. FASTLINK adds much new documentation. FASTLINK has been used in over 1000 published genetic studies. FASTLINK is freely available by ftp; follow the hyperlink to the FASTLINK page for more details.
In collaboration with Sandeep Gupta, Alejandro Schäffer developed a significantly faster and more space-efficient version of the program MSA to do multiple sequence alignment. Follow the hyperlink to the MSA page to retrieve the paper and software.
Richa Agarwala, Jeremy Buhler (Washington U.), and Alejandro Schäffer have developed software to do conditional linkage analysis of polygenic diseases such as diabetes, asthma, and glaucoma. The software is called CASPAR (Computerized Affected Sibling Pair Analyzer and Reporter). Other participants in the design of CASPAR are: Kenneth Gabbay (Baylor College of Medicine), Prof. Marek Kimmel (Rice University) and David Owerbach (Baylor College of Medicine). Follow the hyperlink to the CASPAR page to retrieve the software.
Richa Agarwala has developed software called PedHunter to query a genealogical database. Among the problems PedHunter solves is how best to connect a set of relatives with the same disease into a pedigree suitable for input to genetic linkage analysis. PedHunter is currently being used at NCBI to query the Amish Genealogy database(AGDB), a database of over 295,000 members of the Amish and Mennonite religious groups, and their relatives. Other participants in the design of PedHunter and AGDB include Leslie Biesecker (NHGRI/NIH), Clair Francomano (now at NIA/NIH), and Alejandro Schäffer. PedHunter is being used by other research groups to query other genealogical databases. PedHunter query software comes in two flavors that depend on how the genealogy is stored: in a SYBASE database or in ASCII text files. Follow one of the two PedHunter hyperlinks to retrieve a paper and software.
Software to analyze comparative genomic hybridization data
Richard Desper and Alejandro Schäffer have developed software, called oncotrees, to analyze data on tumors to study models of oncogenesis. The software is designed to analyze data generated by a technique called comparative genomic hybridization, but it has also been used to analyze cytogenetic breakpoint data. The focus of the software is to infer tree models that relate genetic aberrations to tumor progression. Participants in the design of the software include Olli Kallioniemi (NHGRI/NIH) and Christos Papadimitriou (UCBerkeley).
Software for radiation hybrid mapping and map integration
Richa Agarwala and Alejandro Schäffer developed software, called rh_tsp_map, to construct radiation hybrid maps and to integrate maps that contain overlapping marker sets. Many improvements in version 3.0 of rh_tsp_map were implemented by Edward Rice. He is also first author of an extensive tutorial and set of man pages that now accompany the rh_tsp_map download shown as a link on the left. The radiation hybrid mapping methods are based on: a new strategy to select framework markers, a known reduction from the radiation hybrid mapping problem to the traveling salesman problem, and using the existing software CONCORDE to solve large instances of the traveling salesman problem. The map construction software was used at NCBI to construct dense human radiation hybrid maps. Follow the link on the right to learn more about these maps. The software has also been used to construct maps of the cat and the dog, which are described in some of the references, as well as other vertebrates. Participants at NCBI include Donna Maglott, Greg Schuler, and Alejandro Schäffer. David Applegate and William Cook, co-developers of CONCORDE, collaborated on its usage for radiation hybrid map construction. William Murphy (Texas A&M) supplied the data for and collaborated on constructing maps of the cat. Christophe Hitte (University of Rennes, France) constructed the maps of the dog and independently compared our software to other, competing packages.
Software to analyze microarray data
In collaboration with Javed Khan (NCI), Richard Desper and Alejandro Schäffer have developed a software package as an aid to classification problems generated by gene expression data. The software package METrics on EXPression data (METREX) calculates any of a variety of metrics on gene expression data.
Expression data typically comes in the form of a matrix of values for a number of genes that have each been measured in a number of different tissues, tumors, or cell lines. One common problem is that the number of variables can be enormous and defy simple comprehension. A number of techniques have been developed to classify the genes (or the cell lines or tumors) based on the patterns seen in the data matrix.
The main program metrex provides metrics on the data matrix that can be used by various classification programs to classify the rows or columns of the input matrix. The input format is described in the file readme.metrex that comes with the distribution. The program outputs a distance matrix in the popular Phylip format that can be used as input to most phylogeny building programs, including Fitch and Neighbor from the Phylip package of Joseph Felsenstein, the FastME program of Desper and Gascuel, and the comprehensive phylogeny program Paup of Swofford
Research Overview The research program in the Computational Biology Branch is carried out by Senior Investigators, tenure track Investigators, Staff Scientists, Postdoctoral Fellows, and students. The program focuses on theoretical, analytical and applied approaches to a broad range of fundamental problems in molecular biology.
The expertise of the group is concentrated in sequence analysis, protein structure/function analysis, and gene identification, yet research interests cover a wide range of topics in computational biology and information science. Briefly, these include but are not limited to: database searching algorithms, low-complexity sequences, sequence signals, mathematical models of evolution, statistical methods in virology, dynamic behavior of chemical reaction systems, statistical text-retrieval algorithms, protein structure and function prediction, comparative genomics, taxonomic trees, and population genetics.
Many of the basic research projects conducted by CBB investigators serve to enhance and strengthen NCBI’s suite of publicly available databases and software application tools. Collaborative research efforts, among NCBI investigators as well as with the external research community, have led to the development of innovative algorithms (BLAST, PSI-BLAST, SEG, VAST, and COGs) and novel research approaches (text neighboring) that have transformed the field of computational biology. Algorithms and applications currently under development have the potential to further advance scientific discovery.
Members of the CBB contribute significantly to the validity and reliability of NCBI’s online resources by reviewing the quality and accuracy of the data deposited in the databases, as well as the accuracy of the information used to annotate the data. Members also provide leadership and guidance to the extramural community by planning and organizing scientific consortia to determine the most effective use of public sequence resources for large-scale or high-throughput experimental biology. Researchers collaborate to define known research gaps and to identify mechanisms to bridge these gaps.
Computational Biology Branch
NCBI, NLM, NIH
8600 Rockville Pike MSC 6075
Building 38A, Room 6N601
Bethesda, MD 20894-6075
Revised: July 1, 2010