Bioinformatics Tools Survey



This survey gives short descriptions for some of the bioinformatics tools we use for our research. For some of these tools, we also provide tutorial materials, such as PowerPoint presentations for BLAST, primer design, and so on. One thing we always keep in mind is that evaluating a bioinformatics tool before taking its output for granted. For the same reason, your evaluations and comments on this survey or other bioinformatics tools are very welcome and highly appreciated. Please send your comments to Li Liu, MD.


Search by Topic:

Back to Top


BLAST (Basic Local Alignment Search Tool)


If you think you know BLAST after doing several BLAST searches, then you are wrong! Although it looks simple and straightforward, BLAST is actually a set of complicated programs and have many features that will help you a lot if you know it. Unfortunately, many people just run BLAST blindly, which causes them to lose a lot of valuable information that can be extracted from BLAST searches. Therefore, if you haven't had a systemic review on BLAST searches, it's highly recommended you read BLAST Tutorial at NCBI. We also provide a PowerPoint presentation on BLAST search for biologists who have done some BLASTing, but are not fully aware of BLAST features and outputs.


BLAST is a set of "Basic Local Alignment Search Tools". It's designed to detect sequence similarity locally, compared to other global alignment tools, for example, ClustalW, that we'll discuss later. When we say local, it means BLAST reports similar regions without trying to align sequences in their entire length (with blastp program as an exception under certain circumstances). BLAST is often the first choice when people try to find the identification of an unknown sequence. It's also a good choice to select some candidate genes that can be submitted for global alignment analysis.


BLAST is highly flexible and scalable. In addition to the website at NCBI, you can also obtain a copy of standalone BLAST programs that can run on your own computer. Furthermore, you can create a BLASTable database from your own sequences and run BLAST programs on them. Even batch processing is possible. We have developed a relational database called "BlastQuest" to store BLAST results for our sequencing projects. This database allows us to search any customized databases we build, store the results permanently, and further analyze and manipulate these data in various ways. Like our BlastQuest database, there're many programs on the Internet that are extensions of BLAST programs, which puts valuable adds-on features for addressing various problems. It's worth it to explore them if you use BLAST frequently and want to answer some questions that can be concluded or deduced from BLAST results.


After learning and trying some BLAST searches, you may think you are able to find homologous genes to your query sequences. Please keep in mind that BLAST detects sequence similarity using some statistical methods. Although sequence similarity can be extended into sequence homology, it's not always true.


Back to Top


FASTA


Besides BLAST, FASTA is another set of similarity search tools. While FASTA and BLAST have slightly differences in their underlying algorithms, they essentially both do pairwise local alignment. We've tested using both of them to search for the same sequence, and got almost the same set of hits if the hit sequences do show significant similarities with our query sequence. So, we think the results produced by FASTA and BLAST are pretty consistent with each other in most cases. Although FASTA has the reputation for performing at a high speed based on its algorithms, it's actually hard to tell which one of them is faster if you use them via some public server. It's actually the working load on the servers that determines the response time, given the fact that EBI and NCBI servers are always overloaded.


For people who are interested in FASTA, please refer to FASTA Help Document provided at EBI, which is the official FASTA tutorial site.


Back to Top


ClustalW & ClustalX


Compared with BLAST and FASTA, ClustalW does multiple global alignment. Multiple means ClustalW is able to align two or more than two sequences. Global means it tries to make the best alignment for sequences in their entire length. You would expect the same two sequences be aligned differently using BLAST/FASTA and ClustalW. However, BLAST/FASTA is often used to obtain the initial set of sequences that will be aligned by ClustalW.


The most basic and common usage for multiple global alignment tools, such as ClustalW, is to identify conservative regions and variable regions. Based on these regions, you can design primers, construct new hypothesis, design experiments to test and modify the function of specific proteins, predict the function and structure of proteins, identify new members of protein families, and so on.


One problem we see often is that some bioinformatics novices tend to align sequences from a wide range. They sometimes mix genomic DNA and mRNA together, sometimes mix related and non-related sequences together. Those noises introduced by including too many sequences will affect the alignments negatively. If possible, try to align paralogs or orthologs, or at least some sequences you expect similarities. Keep in mind the "Apple-and-Orange" problem.


In addition to command line ClustalW, another version of this program - ClustalX with graphical user interface is available at EBI ftp site. First choose the right operating system. MS Windows users shall choose "DOS". Then, you can go to ClustalW or ClustalX folder to start the downloading.


Back to Top


Phred/Phrap System


Phred/Phrap package is the most popular software to process chromatograms generated from sequencing project. In this package, phred is the base-caller that retrieves the sequence information and gives every base a quality value, named "phd value"; phrap is the assembler that helps scientists to assemble sequence fragments into contigs or singlets.


After running phred, the sequences retrieved from chromatograms need to be trimmed based on phd values and the vector sequences used for cloning before they are sent to Phrap for assembly, though people have different protocols on how to do the quality and vector trimming.


Phrap is a program for assembling shotgun DNA sequence data. Phrap authors are planning to add EST sequence assembly feature to the next version of phrap. Currently, we do not recommend phrap used as the assembler for EST sequences although many people do it.


Back to Top


TranscriptAssembler


ICBR Sanger Sequencing Lab is now using Paracel TranscriptAssembler as the standard program to assembly EST sequences. It is a complete, high-capacity solution for EST-based transcript reconstruction. TranscriptAssembler provides a comprehensive pipeline for all of the steps required to accurately filter, mask, cluster, and assemble transcripts. In addition to sequence assembling, TranscriptAssembler supports alternative splice form detection, chimera detection, and user-friendly visualization as well. Based on our experience with TranscriptAssembler, we are very satisfied with its performance, accuracy, flexibility and scalability. It's very easy to construct various assembly pipeline for sequences from different species, or for different purposes.


Back to Top


Glimmer & GlimmerM


Glimmer and GlimmerM are for identification of possible genes for prokaryotes sequences and eukaryotes sequences respectively. They are based on interpolated Markov models to identify the coding regions and distinguish them from noncoding DNA. As all the other Markov chain based programs, building a good model for your interested species is the critical step. Our bioinformatics group at ICBR has successfully built three models for three prokaryotic species for our sequencing projects. Based on our experience, genes predicted by Glimmer has very high consistency with blastx search results. We haven't had many chances to test the performance for GlimmerM. But TIGR provides some models built for Arabidopsis thaliana, Oryza sativa (rice), and Plasmodium falciparum (the malaria parasite), and should work well on closely related organisms.


Back to Top


Grail & GrailExp


Grail & GrailExp are developed and used for gene discovery at Oak Ridge National Laboratory. In addition to exon prediction, it also provides Cpg islands prediction and repetitive elements prediction. At the same time, you can choose to align your sequences with known mRNAs. These results, combined together, usually reveal important information in gene structure prediction.


Back to Top


GenScan


If you are working on human/vertebrate sequences, then GenScan is a very choice to predict exons for those organisms. The accuracy of GenScan prediction is quite satisfying. It's been used at NCBI to annotate vertebrate sequences, and those putative genes can even be BLASTed.


Back to Top


ESTScan


Since sequencing errors are commonly found in EST projects, using gene structure prediction tools that do not consider such errors is not appropriate. Those indel errors and errors in start/stop codons can cause big problems in gene structure prediction. ESTScan can be used to detect and correct such errors. It works pretty good on mammalian sequences. If you're interested in other organism, you can always train ESTScan with coding sequences from that species or related species. However, ESTScan requires very high quality model. Otherwise, you'll get some unexpected genes predicted.


Back to Top


General Molecular Biology Tutorial Sites



Back to Top


BioTools (GeneTool & PepTool)


BioTools Suite includes GeneTool and PepTool for nucleotide and peptide sequence analysis respectively. Compared to GCG and Vector NTI, BioTools has the most user-friendly interface. It covers almost all routine bioinformatics jobs while the analysis function is less powerful than GCG and Vector NTI. However, we find BioTools is a very good software for bioinformatics novices and basic analysis. It's much better developed and more reliable than those freeware you can get from the Internet. Here we list some features of BioTools.

In some graduate courses offered at ICBR, BioTools is used as the platform to do basic bioinformatics training. We also have a regular workshop on BioTools for people who are interested in using BioTools to assist their researches. ICBR is authorized to license BioTools to UF students, faculties and staffs. If you're interested in licensing BioTools, please contact ICBR biological computing group.


Back to Top


BioEdit


In addition to a sequence alignment editor, BioEdit is also connected with a wide range of free bioinformatics programs on the Internet. The topics cover from BLAST search to promoter prediction. If you want to get some freeware, BioEdit is a good one you may want to have a look at.


Back to Top


pDRAW32


pDRAW32 produces good graphical displays for sequence annotation, restriction map and gel simulation, in addition to some other analysis functions, such as primer design and similarity search. It's a good freeware to do in silicon cloning. We provide a tutorial for cloning in silico using pDRAW32 as a demo.


Back to Top


BRB Tools


BRB ArrayTools is a powerful microarray data analysis package. Both supervised and unsupervised analysis are supported, such as permutation t-test or F-test, cluster analysis, principal component analysis, and so on. Users can perform data cleaning, normalization, class comparison, class prediction, survival test, and other statistical analysis on their data based on specific experiment design. Annotations can also be retrieved from NCI or Affymetrix if available. It works as an Excel adds-in and is very easy to use.


Back to Top


ICBR AnalyzeIt Tools


AnalyzeIt is designed and developed at ICBR and works as one of the major analysis tools in our research. Based on different experiment design, 1-way, 2-way or 3-way ANOVA analysis followed by TukeyHSD test can be applied in a batch mode. The result is written into an Excel file for further processing. Probes from Affymetrix chips can be annotated in a second and used to generate a Gene Ontology tree showing the functional relationships among differentially expressed genes. AnalyzeIt also has some utility tools, such as set operations and data preparation for GenMAPP pathway, to facilitate customized data analysis. On the next version of AnalyzeIt, data analysis for paired design, and pathway analysis using KEGG data will be added. If you are interested in getting AnalyzeIt, please contact ICBR Bioinformatics group.


Back to Top


Cluster & TreeView


Cluster performs cluster analysis using various algorithms, including Hierarchical Clustering, K-Means Clustering, Self Organizing Maps, and PCA. Some simple data cleaning, transformation and normalization are also supported. The resulted clusters can be viewed using TreeView.


Back to Top


Pathway & Function Analysis (KEGG, GenMAPP & Gene Ontology)


Pathway and function analysis is an unbreakable part for microarray data analysis. GenMAPP and KEGG pathway information and Gene Ontology terms are included in many probe annotations. Linking gene expression data to those pathway and function information will reveal very important expression and regulation patterns.


GenMAPP provides a program that can apply user-defined color schema to its pathway mapps, based on gene expression data. An utility tool that helps users to prepare the data set ready for GenAMPP from Affymetrix chips are integrated into ICBR AnalyzeIt software. Currently, GenMAPP program only works on its own pathway maps. It doesn't take pathway maps in KEGG database unless users would like to manually construct GenMAPP formatted files for them. ICBR bioinformatics group is seeking a way to integrate KEGG database into AnalyzeIt software so that KEGG pathway maps can be used efficiently in microarray data analysis.


Gene Ontology is a set of controlled vocabulary used to classifying molecular function, cellular component, and biological process. Terms in Gene Ontology is maintained in a hierarchical structure, which we call "gene ontology tree". AnalyzeIt software allows users to populate this gene ontology tree with their expression data to investigate if there are some observable functional groups or regulation patterns.


Back to Top


GPC VisualGrid


Visual Grid is a free scaled down sample version of BioChip Explorer developed at GPC Biotech AG. It allows both automatic, semi-automatic, and manual adjustment for the grid position overlayed on the microarray/macroarray images. Foreground and background intensities are estimated and can be exported in a spreadsheet. We found there are two useful features in VisualGrid that we can't find in other image processing tools. One is that a correlation graph can be generated if there are replicate spots on the membrane, so that we can identify which genes do not generate reproducible data. The other useful feature is that spots can be sorted based on their estimated intensity values. Since the spot image is displayed alongside its intensity value, it's very easy for users to identify which cell is not spotted right or the grid over it is not positioned right. We find it works better than other software if you spot your own high-density membranes where clones/genes are not spotted as perfectly as what found in industrial world.


Back to Top


TIGR SpotFinder


TIGR Spotfinder reads paired 16-bit TIFF image files generated by most microarray scanners. It's a good choice for processing 2-dye cDNA array images.


Back to Top


ScanAlyze


ScanAlyze processes 2-dye fluorescent images of microarrays. However, the spots on the array cannot be too dense or screwed too much. Otherwise, the performance of Spotfinder is not very satisfactory.


Back to Top


RasMol


RasMol is a very popular program to view PDF formatted structure files. It allows users to move, zoom, rotate, or change display colors so that the best image or viewpoint can be obtained. However, if your interest is more than just viewing the structure, you may need to refer to other tools, such as Cn3D.


Back to Top


Cn3D


Cn3D recognizes mmCIF format file, which structure records in NCBI databases take. Unlike other structure visualization tools, Cn3D allows users to correlate structure and sequence information. As you may expect, because NCBI tries to add analysis data into mmCIF structure files, you can access those new features through Cn3D, for example, display structure alignments, highlight conservative domains, search for motifs/patterns, and so on.


Taking a look at Cn3D Tutorial at NCBI will give you better understand on Cn3D and NCBI Structure Database.


Back to Top


2D Structure Prediction Tools


Because scientists have more insights for patterns reserved in alpha-helix, beta-sheet, transmembrane segment, and RNA structures, programs predicting such structures usually give better results than for other structures. Since there are many good 2D structure prediction tools, we will not list all of them here. Please search the Internet for programs suitable for your research, or refer to Daisuke Kihara's list of 2D tools. As what we always say, evaluate the software before you take their output for granted.


If you would like to get more knowledge on 2D structure prediction and current research stages, an online PowerPoint presentation from SWBIC is a good overview.


Back to Top


Swiss-Model


Compared to 2D structure prediction, 3D structure prediction is still in early research field. Among all current algorithms, homology search based prediction is probably the most reliable one, given that significant sequence homology can be identified. Swiss-Model supports template search based on sequence similarity. Users paste their query sequence, if there are some macromolecules with know 3D structures in Swiss-Model database that show similarities with the query sequence, then their structures are retrieved and displayed to users as templates. Although some first-time users may expect to see 3D structures for their query sequences, it's actually the exact structures stored in the database displayed. Therefore, if you get a huge structure for your 20-residue query sequence, don't be surprised. Check the regions where the similarities are observed and take structures from only that regions to be the possible 3D structure your query sequence may take.


Back to Top


Other Bioinformatics Resources Surveys


Back to Top


InterProScan


InterPro is a database of protein families, domains and functional sites that have been identified in known proteins. For unknown protein sequences, if such pattern/motif/family can be observed, we can assume that those unknown proteins may take the corresponding functions. InterProScan is such a program that scans the query sequence for patterns stored in InterPro database. As member databases of InterPro, PROSITE, Pfam, PRINTS, ProDom, Smart, TIGRFAMS, and PIR SuperFamily are searched by InterProScan at the same time.


Back to Top


HMMER


If you're interested in finding members for some profile-defined protein family, HMMER can be employed to do sensitive database searching. With a query profile (statistical descriptions of a sequence family's consensus), HMMER searches an arbitrary sequence database and retrieves sequences that show significant similarity to the query consensus sequence based on profile hidden Markov models. Notice that the query consensus sequence is a profile that can be generated by BLAST search, ClustalW, or some other tools, but not just some simple sequence.


Back to Top


Protein Clustering (GeneRAGE & TRIBE-MCL)


In many sequencing projects, researchers want to discover different protein families. The major problems related to protein clustering include multi-domain proteins, peptide fragments, and proteins possessing promiscuous domains. It's a very active research field. Bioinformatics scientists have been trying various algorithms to reduce the false positive rate. GeneRAGE uses Smith-Waterman dynamic programming alignment algorithm. TribeMCL uses Markov Clustering (MCL) method. Users need to balance the performance and sensitivity.


Back to Top


Protein Structure Classification (SCOP & CATH)


As you may expect, in addition to sequence similarity, protein can also be classified based on their 2D and 3D structure that may reveal important structure-function relationships. Actually, it's been argued that nearly all proteins have structural similarities with other proteins and, in some of these cases, share a common evolutionary origin. The SCOP (Structural Classification of Proteins) database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known. As such, it provides a broad survey of all known protein folds, detailed information about the close relatives of any particular protein, and a framework for future research and classification. All PDB entries have been analyzed to build SCOP database.


CATH is a novel hierarchical classification of protein domain structures, which clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and Homologous superfamily (H). Class is derived from secondary structure content. Architecture describes the gross orientation of secondary structures, independent of connectivities. The topology level clusters structures according to their topological connections and numbers of secondary structures. The homologous superfamilies cluster proteins with highly similar structures and functions. The assignments of structures to topology families and homologous superfamilies are made by sequence and structure comparisons.


Back to Top


MACAW


MACAW is a program for locating, analyzing, and editing blocks of localized sequence similarity among multiple sequences and linking them into a composite multiple alignment. A Gibbs Sampling strategy is taken for multiple alignment. As such, it only works for detecting pattern/motif that occurs only once in your interested region.


Back to Top


Oligo


Given a template sequence, OLIGO helps researchers to design oligonucleotides primers/probes used for PCR reaction, DNA sequencing, site-directed mutagenesis, and various hybridization applications. Primers/probes are designed based on a set of well-accepted criteria. In case that no perfect primers/probes identified, it allows user to choose some optimal oligonucleotide sequences and report the biochemical features of them, such as delta-G profile, hybridization temperature, possible secondary structure, and so on. It is also a good tool for construction of synthetic genes, finding an appropriate sequencing primer among those already synthesized, finding and multiplexing consensus primers and probes, and even finding potential restriction sites in a protein.


Back to Top


Primer3


Primer3 is a free software developed at Whitehead Institute/MIT Centor for Genome Research. It supports a wide range of parameters user can play with to get the primers suitable for their need and PCR reaction environment. We provide a primer and probe design presentation using Primer3 as a demo.


Back to Top


Melting Temperature Calculation Tools (BioMath)


Melting Temperature (Tm) is a very important feature for primers/probes. Currently, there're three algorithms used for Tm calculation (please refer to our primer and probe design presentation). Each has different application field. Tms calculated using these three algorithms sometimes give similar results, sometimes gives temperature with big differences. Therefore, it's important to know the application field for each of them and make decision based on your wet lab experience.


BioMath provides a good Tm calculation service that gives results using all three algorithms.


Back to Top


NCBI (National Center for Biotechnology Information) Databases


As the main biotechnology information resource in US, NCBI maintains a large collection of databases and bioinformatics tools for molecular biology research. Knowing how to retrieve the exact information you need in an efficient way is the fundamental and most important skill for people interested in Bioinformatics. Very often, you get clearer definition, or deeper understanding, or even the answers for your questions by searching these databases before you use other bioinformatics tools.


On the other hand, every NCBI database is designed and created for some specific purposes. The most common mistake Bioinformatics novices make is to try to search information in an inappropriate database. Therefore, we highly recommend researchers to explore NCBI resources thoroughly to identify which databases are more suitable to answer your questions in mind before you jump to search some common databases, like "Nucleotide" or "Protein" database.


NCBI provides a general description about those databases and tools it maintains at NCBI Site Map. Each database also has its own "Help" page describing its usages in more details. It's worth to read this documentation before you start your search. Most time, for people who search databases blindly will get wrong or incomplete answers for their questions. Entrez is an integrated search engine for many NCBI databases. Therefore, knowing exactly how Entrez works and various search options in Entrez will help you retrieve complete dataset in an efficient way. Entrez Help Document will be a good reference. However, there are some databases not supported by Entrez, for example, LocusLink. In such cases, you need to learn what is best way to query that specific database.


We provide a PowerPoint presentation on introduction to searching NCBI databases. It's a general description, and definitely not in depth. However, if you want to get a quick view about some important databases maintained at NCBI and some searching tips, it's a good start point.


Here we list some useful links where people can get all levels of tutorials on NCBI resources:


Back to Top


EBI (European Bioinformatics Institute) databases and DDBJ (DNA Data Bank of Japan) Databases


NCBI, EBI and DDBJ are the 3 major bioinformatics databases in the world. They exchange their sequence data on a daily base to ensure that the basic sequence information stored in their "primary databases" are equivalent. However, each of them also maintain other databases that have data derived from those primary sequence information. We call those databases "derived databases". It's not necessary that information in these "derived databases" are equivalent. Moreover, even for the same sequence, these 3 databases assigns different ID/Accession numbers to it. Among the collection of databases maintained at EBI, Swiss-Prot is regarded as the most reliable database that stores curated information, which is not always true for other databases at NCBI, EBI and DDBI. This brings out another point about errors in databases - since many records stored in these databases are direct submissions from end-users, it's not uncommon to find errors in those sequence data.


Like NCBI, EBI and DDBJ also developed their own search engines and other bioinformatics tools to facilitate the data mining process in their databases. Both EBI and DDBJ uses SRS developed by LION Bioscience AG as its major search engine, while the user interfaces are not the same. SRS is famous for its powerful cross-database searching and project management features. We can't say which database is better than others. As long as you get familiar with the structure of one of them, you'll find yourself seldom need to refer to the other two databases to get the information you are interested in. Most time, these three databases have cross-references to each other's data.


On the other hand, some bioinformatics tools developed at NCBI, EBI and DDBJ serve for different purposes and have different advantages and limitations. You'll find some of them are more suitable for your project than others. We'll discuss some of these tools in other sections based on which categories they fall in.


For people who need to use SRS, please read SRS Documentation before you start your search.


Back to Top


TIGR (The Institute for Genomic Research) Databases


TIGR's bioinformatics department has conducted extensive analysis results for all of TIGR's genome sequencing projects, creating and maintaining curated databases of the genes for each organism. We find it as an excellent resource to search for important pathogenic microbial sequence information. However, it also provides comprehensive and high-quality data for other organisms' genomes that are sequenced at TIGR.


During the analysis and annotation process for genomes sequenced at TIGR, many bioinformatics software are developed by their bioinformatics department. Since these tools are developed for real sequencing projects and tested on a large scale data set at TIGR, we find that they are more practical and usually generate more reliable data than some tools developed in pure academic field. We'll discuss some of these tools in other sections based on which categories they fall in.


Back to Top


PDB (Protein Data Bank) Database


PDB database is the first and still working as the single worldwide repository for the processing and distribution of 3D biological macromolecular structure data. Although PDB only stores the original structure data, which means there's no further analysis data stored in this database, it doesn't hurt its position as the most collective 3D structure resource for biological macromeolecules.


With more organizations, such as NCBI, started to collect 3D structure data, data stored in PDB are processed in various ways and used to populate those databases. Among them, MMDB (Molecular Modeling DataBase) at NCBI is the most popular one. As what we said above, since PDB stores original data, we would expect some errors and space for further processing. That's the reason why NCBI started to build MMDB. All structures in MMDB are retrieved from PDB, but with errors corrected and other information added, such as conservative domains. But still, PDB is regarded as the biggest repository for 3D biological macromolecule structure data.


Back to Top


Gene Symbol Database ( HUGO Gene Nomenclature )


To solve the problem that people use different names to refer to the same gene or vice versa, HUGO (The Human Genome Organization) start to build an official gene nomenclature system. They have approved symbols for nearly one half of the genes in the human genome. It is highly recommended to use these official gene symbols to search databases since it eliminates the ambiguities. Another useful feature at HUGO is that for each gene symbol, it provides links to some major databases, such as Ensembl, KEGG, GDB, LocusLink, OMIM, RefSeq, SwissProt, and so on.


Back to Top


KEGG (Kyoto Encyclopedia of Genes and Genomes) Pathway Database


Rather than trying to store all kinds of sequence information in biology world, KEGG is designed as a bioinformatics resource for understanding higher order functional meanings and utilities of the cell or the organism from its genome information. It maintains data in two main categories, namely "Pathway Information" and "Genomic Information". We use KEGG's pathway information very often when analyzing data generated from sequencing projects and microarray experiments. This is a good place to go beyond just looking at an individual gene. By putting genes into a network that is more natural in living organisms, you'll get better understanding about how genes work together and coregulated.


Back to Top


EPD (Eukaryote Promoter Database)


EPD is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. The data in EPD are not limited to promoter sites only, but include cross-references to other databases and bibliographic references as well, and is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis.


An intuitive extension of EPD to facilitate comparative sequence analysis is to store gene expression data for promoters. EPDEX is such a complementary database which allows users to view available gene expression data for human EPD promoters. That is a valuable adds-on for EPD data.


Back to Top


Transcription Factor Database ( TRANSFAC )


TRANSFAC is a database on eukaryotic cis-acting regulatory DNA elements and trans-acting factors. While this database stores useful data on transcription factors, its performance is not satisfactory. You'll expect some dead links and slow traffic when accessing their data.


Back to Top