NFINtb - Help

NFINtb Nothobranchius furzeri transcriptome browser

Welcome to the NFINtb Nothobranchius furzeri transcriptome browser

The NFINtb Nothobranchius furzeri Information Network transcriptome browser is a database of cDNA contig assemblies of the short-lived teleost fish Nothobranchius furzeri supplemented with the best currently available annotation.

1. Introduction

The NFINtb Nothobranchius furzeri transcriptome browser provides access to the annotated transcript catalogue, including detailed BLAST results, predicted protein domains, associated gene ontology terms, microsatellites and gene expression data. Further information includes sequenced libraries, distribution of contig lengths and annotation numbers, respective GO terms and corresponding transcript contigs is given. A query mask allows keyword (e.g. annotation, gene symbol, GO category) and BLAST searches by sequence similarity. N. furzeri sequences can be retrieved by batch download including annotation details.

2. Start page

help_start_page_1

This page is shown when you enter the NFINtb Nothobranchius furzeri transcriptome browser or when you click on the logo.

1. First, choose the catalogue of interest. A table of all assembled catalogues is given further below. 2. Then, you can click on several links of the link bar to get the information you are interested in. A more comprehensive explanation of each pages follows. 3. Last, there is also a quick search field where you can search either by NFINtbID or by gene symbol. If you search by gene symbol, mark the check box to get only the best (longest) transcript contig per gene.

3. Statistics page

help_summary_page_1

The statistics page shows general information about the transcript catalogue.

This includes:

Information about assembled sequencing data and libraries
Transcript contig length distribution
BLAST, protein domain prediction and annotation numbers
Detected SNP (single nucleotid polymorphism) and SSR (simple sequence repeats)
Information about possible contamination

4. Query page

help_query_page_1

The query page provides an interface for complex queries to search the transcript catalogue.

4.1 Transcript contigs can be searched by:

NFINtbID
Minimum and maximum length
Gene symbol
Annotation (Text search)
Annotation status - yes/no
Show only the best transcript contig per annotated protein
Show only the best transcript contig per annotated gene
Minimum coding sequence coverage in percent
Search BLAST results (Text search)
GOslim annotation in one of the three ontologies/descriptions
Search protein domain predictions (Text search)
Predicted miRNA homology
SNP ID
SNP type
SSR ID or type
Contamination status

Fields marked with 'Text search' allow some advanced search strategies like wild cards and pattern grouping as well as some logical operators. Rules are explained below:

4.2 Text Search means:

the search is always case insensitive
all words must be larger than 3 characters
words that are too common like 'the' or 'some' are ignored
all words are matched completely
use '*' for incomplete matches (Example: Pol*)
use search operators for inexact searches

4.3 Search Operators include:

+: A leading plus sign indicates that this word MUST be present in a result. Example: +protein +kinase
-: A leading minus sign indicates that the word MUST NOT be present in a result. Example: +protease -kinase
( ): Parentheses group words to look for several possibilities. Example: (RNA|DNA) polymerase
*: Use the '*' operator to look for truncated words. Example: Protea*
" ": A phrase that is enclosed within double quote (") characters matches only results that contain the phrase literally. Example: "this is a example"

5. Transcript table page

help_catalogue_page_1

This page appears summarises the results of a search. Generally, it shows a number of transcript contigs as a table and provides tools to batch retrieve these.

The page is composed as follows:

The number of retrieved transcript contigs.
The table with transcript contigs: Each row contains NFINtbID, length, gene symbol, annotation and e-value of annotation. NFINtbIDs are linked to individual transcript contig pages. Gene symbols are linked to the NCBI Gene database. Also, as a side remark, e-values of unannoated transcript contigs are set to 1 for technical reasons.
The navigation allows you to go the next page, set the number of transcript contigs per page and change the order of transcript contigs listed in the table. Sort criteria are NFINtbID, length, gene, annotation and e-value. 'Asc' means ascending and 'desc' means descending.
Tools allow you to retrieve the set of transcript contigs as sequence file in fasta format or list file in csv format. Further (yet inactivated) tools include the download of gene expression data, the download of corresponding zebrafish, medaka, stickleback and tetraodon orthologs and some basic functional gene enrichment analysis via babelomics.

6. Sequence page

7. Blast page

help_blast_page_1

The BLAST server allows you to do similarity searches to find transcript contigs of interest.

Here are some general remarks for its use:

the query sequence can either be copy-and-pasted or uploaded as a file
the search can be restricted to a subsequence of the query
the BLAST server also contains parameter sets to optimize different searches - it will choose the approbiate set depending on your input
additional parameters can be set below - but will be overridden by the optimised parameter sets above
the search can be made more stringent by choosing a smaller e-value (Expect)
the output is in pairwise alignment but other formats are also available
to import the results in excel, choose the format "Hit Table" and load the table as a CSV file
limit the number of hits and alignments to avoid large blast reports
IMPORTANT: this is only a small-scale BLAST server - please blast only one sequence at a time - longer searches will be terminated by timeout

8. GO page

help_blast_page_1

The GO page

9. Help page

10. Contact page

8. Methods table

Task	Program	Version	Arguments	Description
1. Read processing
Sanger quality clipping	Lucy	1.19p	-error 0.02 0.015 -bracket 10 0.015 -window 50 0.08 10 0.3 -vector vector_db insert_sites	Trims low-quality regions and remove remaining vector sequences in Sanger reads.
Sanger and 454/Roche vector/contaminant/poly(A) removal	SeqClean	1.0	-n 5000 -l 80 -v vector_db -s contaminant_db -N -L	Trims additional vector and poly (A) tails in Sanger and 454/Roche reads, and removes contaminant reads.
Sanger and 454/Roche low-complexity masking	SeqClean	1.0	-n 5000 -l 80 -A -M	Masks low-complexity repeats in Sanger and 454/Roche reads.
Sanger and 454/Roche repeat masking	RepeatMasker	3.2.9	-norna -xsmall -noint -qq -gccalc -lib nfurzeri_repeats	Masks complex repeats in Sanger and 454/Roche reads.
Solexa/Illumina read filtering	SGA	0.9.12	sga preprocess --phred64 -p 1 -f 20 --dust	Filters Solexa/Illumina reads for quality and low complexity.
Solexa/Illumina error correction	SGA	0.9.12	sga index reads;sga correct reads;	Corrects obvious sequencing errors in Solexa/Illumina reads .
Solexa/Illumina duplicate read removal	SGA	0.9.12	sga index reads;sga filter --no-kmer-check reads;	Removes exact duplicates in Solexa/Illumina reads.
2. Assembly
Sanger and 454/Roche assembly	PAVE	2.0	default cfg file	Assembly of Sanger and 454/Roche reads
Solexa/Illumina reads, PAVE contigs assembly	CLC Assembly Cell	3.2.2	-m 300 -p fb se 0 10000	Assembly of Solexa/Illumina reads onto PAVE contigs
Solexa/Illumina/PAVE contigs reassembly	TGICL	2.0	-l 100 -O '-d 30000 -f 5000 -p 90'	Extends and joins contigs with relaxed parameters.
Solexa/Illumina/PAVE - removal of redundant contigs	CD-HIT-EST	4.5.5	-T 8 -c 0.99 -aS 0.9 -n 10 -B 1 -M 0	Removes highly redundant/overlapping contigs.
3. Annotation
BLAST Contamination screen	WU-tBLASTx	2.0MP-WashU	wordmask=seg W=4 T=999 hitdist=40 nogap V=25 B=25	Transcript contigs are compared against a contamination database.
BLAST annotation	WU-BLASTx,WU-BLASTn,WU-tBLASTx	2.0MP-WashU	wordmask=seg lcmask W=4 T=20 (protein);M=1 N=-3 Q=3 R=3 W=15 wordmask=seg lcmask (nucleotide)	Transcript contigs are compared against several protein and nucleotide database for annotation.
Open reading frame (ORF) prediction	prot4EST	2.0	BLASTx against NCBI nr, ESTScan with N.furzeri matrix, longest ORF	Determines ORF in the transcript contigs that are translated to proteins.
Protein domain prediction	HMMER	3.0	default parameters	Searches translated ORF for conserved protein domains.
Protein domain prediction	HMMER	3.0	default parameters	Searches translated ORF for conserved protein domains.
Orthologs in other fish species	WU-BLASTx,WU-tBLASTn	2.0MP-WashU	wordmask=seg lcmask W=4 T=20;wordmask=seg W=4 T=20	Transcript contigs are compared against proteins of other fish. Fish proteins are compared against transcript contigs. Best bidirectional hits are calculated.
Detection of paralogs	WU-BLASTx	2.0MP-WashU	wordmask=seg lcmask W=4 T=20	Transcript contigs are compared against other fish's proteins. The associated gene symbols of the proteins are used to fetch

9. Data resources

Sequencing data and assembly are online available in different databases. The NCBI BioProject page "Sequencing and assembly of the Nothobranchius furzeri transcriptome" (PRJNA85613) provides a good overview and starting point. 454/Roche and Solexa/Illumina data is also summarised in the NCBI SRA, study acession SRP010930.

Individual libraries in public databases:

Sanger libraries, NCBI dbEST:
1. JZ200028 - JZ330399, please note that dbEST only accepts processed high-quality sequences (no raw data), therefore the number of submitted sequences is lower than given in the manuscript's Table
454/Roche libraries, NCBI SRA:
Solexa/Illumina libraries, NCBI SRA:
1. SRX120894
2. SRX120895, SRX120896
3. SRX120897, SRX120898
4. SRX121034
5. SRX121035
6. SRX121036
7. SRX121037
8. SRX121038
9. SRX121039

Transcriptome Shotgun Assembly (TSA)/GenBank:

assembly 2011/09: GAIB01000000 (GAIB01000001 - GAIB01210031)

Note that TSA does not allow streches of Ns > 14. Therefore, such transcript contigs were split, and the parts were numbered accordingly. Moreover, since TSA only accepts contigs > 200 bp, some parts may not be submitted at all.

11. How to cite

Please cite the following paper if you use data from this browser:

Petzold A, Reichwald K, Hartmann N, Groth M, Taudien S, Shagin D, Priebe S, Englert C, Platzer M: The transcript catalogue of the short-lived fish Nothobranchius furzeri provides insights into age-dependent changes of mRNA levels. BMC Genomics 2013, 14:185. doi: 10.1186/1471-2164-14-185.
(http://www.ncbi.nlm.nih.gov/pubmed/23496936)

Beutenbergstraße 11
D-07745 Jena • Germany

Phone: +49 3641 65-6000
Fax: +49 3641 65-6351

E-mail: info@leibniz-fli.de
www.leibniz-fli.de

Imprint
Data Privacy