Introduction to RNA-seq and functional interpretation 2024

Next steps in gene prioritisation

This page has links to materials and example data used during the "Next steps in gene prioritisation" session of the Introduction to RNA-seq and functional interpretation course run in 2024 at the European Bioinformatics Institute.

Slides

next-steps-in-gene-prioritisation.pdf

Exercises

next-steps-in-gene-prioritisation-exercises.pdf

Data

Before starting the exercises, copy all of the data from "penelopeCloud" to your home directory or download the data from this page.

Experiment:

Zebrafish were exposed to amphetamine, nicotine or oxycodone from 24 hours post fertilisation to 5 days post fertilisation and behavioural assays were performed on the larvae. At 5 dpf, 6 samples, each consisting of pools of 6-7 embryos, were collected for each condition (plus unexposed controls). In total, 24 samples were collected, although two later failed QC and were excluded from the analysis.

RNA was extracted and sequencing libraries were made using Illumina’s TruSeq Stranded mRNA kit. They were sequenced on one lane of NovaSeq SP PE50, resulting in 16-24 million reads per sample. The reads were aligned to the GRCz11 reference genome with STAR and differentially expressed genes were determined with DESeq2.

Each of the 22 samples has a name like "Cnt_3", where "Cnt" indicates a control sample (the others being "Amp", "Nic" and "Oxy") and 3 is a number indicating the replicate.

Downloads:

Treatment	Differentially Expressed Genes	Metadata
Treatment	Differentially Expressed Genes	All Metadata (including counts)	Just Gene Annotation	Just Ensembl Stable ID
Amphetamine	All (irrespective of adjusted -p-value)	Amp.counts.all.tsv	Amp.annotation.all.tsv	Amp.id.all.tsv
Amphetamine	Significant (adjusted p-value <= 0.05)	Amp.counts.sig.tsv	Amp.annotation.sig.tsv	Amp.id.sig.tsv
Nicotine	All (irrespective of adjusted -p-value)	Nic.counts.all.tsv	Nic.annotation.all.tsv	Nic.id.all.tsv
Nicotine	Significant (adjusted p-value <= 0.05)	Nic.counts.sig.tsv	Nic.annotation.sig.tsv	Nic.id.sig.tsv
Oxycodone	All (irrespective of adjusted -p-value)	Oxy.counts.all.tsv	Oxy.annotation.all.tsv	Oxy.id.all.tsv
Oxycodone	Significant (adjusted p-value <= 0.05)	Oxy.counts.sig.tsv	Oxy.annotation.sig.tsv	Oxy.id.sig.tsv

Headings:

The column headings are:

1	Gene	Ensembl ID
2	pval	p-value
3	adjp	Adjusted p-value
4	log2fc	Log2 fold change
5	Chr	Chromosome (or scaffold) name
6	Start	Gene start (in bp)
7	End	Gene end (in bp)
8	Strand	Gene strand (1 or -1)
9	Biotype	Gene biotype (e.g. protein coding or lincRNA)
10	Name	Gene name
11	Description	Gene description
12	Cnt_1 count	Counts for 1st control replicate
13	Cnt_2 count	Counts for 2nd control replicate
...	...	...
23	Cnt_1 normalised count	Normalised counts for 1st control replicate
24	Cnt_2 normalised count	Normalised counts for 2nd control replicate
...	...	...

The "counts" files contain all of the above headings. The "annotation" files just contain the first 11 columns. The "id" files just contain the first column.