NFINtb Nothobranchius furzeri transcriptome browser

Welcome to the NFINtb Nothobranchius furzeri transcriptome browser

The NFINtb Nothobranchius furzeri Information Network transcriptome browser is a database of cDNA contig assemblies of the short-lived teleost fish Nothobranchius furzeri supplemented with the best currently available annotation.

Content:

  1. Introduction
  2. Start page
  3. Statistics page
  4. Query page
  5. Transcript table page
  6. Sequence page
  7. Blast page
  8. GO page
  9. Help page
  10. Contact page
  11. Methods table
  12. Data resources
  13. How to cite


1. Introduction

The NFINtb Nothobranchius furzeri transcriptome browser provides access to the annotated transcript catalogue, including detailed BLAST results, predicted protein domains, associated gene ontology terms, microsatellites and gene expression data. Further information includes sequenced libraries, distribution of contig lengths and annotation numbers, respective GO terms and corresponding transcript contigs is given. A query mask allows keyword (e.g. annotation, gene symbol, GO category) and BLAST searches by sequence similarity. N. furzeri sequences can be retrieved by batch download including annotation details.

 


2. Start page

help_start_page_1

This page is shown when you enter the NFINtb Nothobranchius furzeri transcriptome browser or when you click on the logo.

1. First, choose the catalogue of interest. A table of all assembled catalogues is given further below. 2. Then, you can click on several links of the link bar to get the information you are interested in. A more comprehensive explanation of each pages follows. 3. Last, there is also a quick search field where you can search either by NFINtbID or by gene symbol. If you search by gene symbol, mark the check box to get only the best (longest) transcript contig per gene.

 


3. Statistics page

help_summary_page_1

The statistics page shows general information about the transcript catalogue.

This includes:

  1. Information about assembled sequencing data and libraries
  2. Transcript contig length distribution
  3. BLAST, protein domain prediction and annotation numbers
  4. Detected SNP (single nucleotid polymorphism) and SSR (simple sequence repeats)
  5. Information about possible contamination

 


4. Query page

help_query_page_1

The query page provides an interface for complex queries to search the transcript catalogue.

4.1 Transcript contigs can be searched by:

  1. NFINtbID
  2. Minimum and maximum length
  3. Gene symbol
  4. Annotation (Text search)
  5. Annotation status - yes/no
  6. Show only the best transcript contig per annotated protein
  7. Show only the best transcript contig per annotated gene
  8. Minimum coding sequence coverage in percent
  9. Search BLAST results (Text search)
  10. GOslim annotation in one of the three ontologies/descriptions
  11. Search protein domain predictions (Text search)
  12. Predicted miRNA homology
  13. SNP ID
  14. SNP type
  15. SSR ID or type
  16. Contamination status

Fields marked with 'Text search' allow some advanced search strategies like wild cards and pattern grouping as well as some logical operators. Rules are explained below:

4.2 Text Search means:

4.3 Search Operators include:

+
A leading plus sign indicates that this word MUST be present in a result. Example: +protein +kinase
-
A leading minus sign indicates that the word MUST NOT be present in a result. Example: +protease -kinase
( )
Parentheses group words to look for several possibilities. Example: (RNA|DNA) polymerase
*
Use the '*' operator to look for truncated words. Example: Protea*
" "
A phrase that is enclosed within double quote (") characters matches only results that contain the phrase literally. Example: "this is a example"

 


5. Transcript table page

help_catalogue_page_1

This page appears summarises the results of a search. Generally, it shows a number of transcript contigs as a table and provides tools to batch retrieve these.

The page is composed as follows:

  1. The number of retrieved transcript contigs.
  2. The table with transcript contigs: Each row contains NFINtbID, length, gene symbol, annotation and e-value of annotation. NFINtbIDs are linked to individual transcript contig pages. Gene symbols are linked to the NCBI Gene database. Also, as a side remark, e-values of unannoated transcript contigs are set to 1 for technical reasons.
  3. The navigation allows you to go the next page, set the number of transcript contigs per page and change the order of transcript contigs listed in the table. Sort criteria are NFINtbID, length, gene, annotation and e-value. 'Asc' means ascending and 'desc' means descending.
  4. Tools allow you to retrieve the set of transcript contigs as sequence file in fasta format or list file in csv format. Further (yet inactivated) tools include the download of gene expression data, the download of corresponding zebrafish, medaka, stickleback and tetraodon orthologs and some basic functional gene enrichment analysis via babelomics.

 


6. Sequence page

 


7. Blast page

help_blast_page_1

The BLAST server allows you to do similarity searches to find transcript contigs of interest.

Here are some general remarks for its use:

 


8. GO page

help_blast_page_1

The GO page

 


9. Help page

 


10. Contact page

 


8. Methods table

TaskProgramVersionArgumentsDescription
1. Read processing
Sanger quality clipping Lucy 1.19p -error 0.02 0.015 -bracket 10 0.015 -window 50 0.08 10 0.3 -vector vector_db insert_sites Trims low-quality regions and remove remaining vector sequences in Sanger reads.
Sanger and 454/Roche vector/contaminant/poly(A) removal SeqClean 1.0 -n 5000 -l 80 -v vector_db -s contaminant_db -N -L Trims additional vector and poly (A) tails in Sanger and 454/Roche reads, and removes contaminant reads.
Sanger and 454/Roche low-complexity masking SeqClean 1.0 -n 5000 -l 80 -A -M Masks low-complexity repeats in Sanger and 454/Roche reads.
Sanger and 454/Roche repeat masking RepeatMasker 3.2.9 -norna -xsmall -noint -qq -gccalc -lib nfurzeri_repeats Masks complex repeats in Sanger and 454/Roche reads.
Solexa/Illumina read filtering SGA 0.9.12 sga preprocess --phred64 -p 1 -f 20 --dust Filters Solexa/Illumina reads for quality and low complexity.
Solexa/Illumina error correction SGA 0.9.12 sga index reads;sga correct reads; Corrects obvious sequencing errors in Solexa/Illumina reads .
Solexa/Illumina duplicate read removal SGA 0.9.12 sga index reads;sga filter --no-kmer-check reads; Removes exact duplicates in Solexa/Illumina reads.
2. Assembly
Sanger and 454/Roche assembly PAVE 2.0 default cfg file Assembly of Sanger and 454/Roche reads
Solexa/Illumina reads, PAVE contigs assembly CLC Assembly Cell 3.2.2 -m 300 -p fb se 0 10000 Assembly of Solexa/Illumina reads onto PAVE contigs
Solexa/Illumina/PAVE contigs reassembly TGICL 2.0 -l 100 -O '-d 30000 -f 5000 -p 90' Extends and joins contigs with relaxed parameters.
Solexa/Illumina/PAVE - removal of redundant contigs CD-HIT-EST 4.5.5 -T 8 -c 0.99 -aS 0.9 -n 10 -B 1 -M 0 Removes highly redundant/overlapping contigs.
3. Annotation
BLAST Contamination screen WU-tBLASTx 2.0MP-WashU wordmask=seg W=4 T=999 hitdist=40 nogap V=25 B=25 Transcript contigs are compared against a contamination database.
BLAST annotation WU-BLASTx,WU-BLASTn,WU-tBLASTx 2.0MP-WashU wordmask=seg lcmask W=4 T=20 (protein);M=1 N=-3 Q=3 R=3 W=15 wordmask=seg lcmask (nucleotide) Transcript contigs are compared against several protein and nucleotide database for annotation.
Open reading frame (ORF) prediction prot4EST 2.0 BLASTx against NCBI nr, ESTScan with N.furzeri matrix, longest ORF Determines ORF in the transcript contigs that are translated to proteins.
Protein domain prediction HMMER 3.0 default parameters Searches translated ORF for conserved protein domains.
Protein domain prediction HMMER 3.0 default parameters Searches translated ORF for conserved protein domains.
Orthologs in other fish species WU-BLASTx,WU-tBLASTn 2.0MP-WashU wordmask=seg lcmask W=4 T=20;wordmask=seg W=4 T=20 Transcript contigs are compared against proteins of other fish. Fish proteins are compared against transcript contigs. Best bidirectional hits are calculated.
Detection of paralogs WU-BLASTx 2.0MP-WashU wordmask=seg lcmask W=4 T=20 Transcript contigs are compared against other fish's proteins. The associated gene symbols of the proteins are used to fetch

 


9. Data resources

Sequencing data and assembly are online available in different databases. The NCBI BioProject page "Sequencing and assembly of the Nothobranchius furzeri transcriptome" (PRJNA85613) provides a good overview and starting point. 454/Roche and Solexa/Illumina data is also summarised in the NCBI SRA, study acession SRP010930.

Individual libraries in public databases:

Transcriptome Shotgun Assembly (TSA)/GenBank:

Note that TSA does not allow streches of Ns > 14. Therefore, such transcript contigs were split, and the parts were numbered accordingly. Moreover, since TSA only accepts contigs > 200 bp, some parts may not be submitted at all.

 


11. How to cite

Please cite the following paper if you use data from this browser:

Petzold A, Reichwald K, Hartmann N, Groth M, Taudien S, Shagin D, Priebe S, Englert C, Platzer M: The transcript catalogue of the short-lived fish Nothobranchius furzeri provides insights into age-dependent changes of mRNA levels. BMC Genomics 2013, 14:185. doi: 10.1186/1471-2164-14-185.
(http://www.ncbi.nlm.nih.gov/pubmed/23496936)