Busch Lab

Introduction to RNA-seq and functional interpretation 2024

Next steps in gene prioritisation

This page has links to materials and example data used during the "Next steps in gene prioritisation" session of the Introduction to RNA-seq and functional interpretation course run in 2024 at the European Bioinformatics Institute.

Slides

Exercises

Data

Before starting the exercises, copy all of the data from "penelopeCloud" to your home directory or download the data from this page.

Experiment:

Zebrafish were exposed to amphetamine, nicotine or oxycodone from 24 hours post fertilisation to 5 days post fertilisation and behavioural assays were performed on the larvae. At 5 dpf, 6 samples, each consisting of pools of 6-7 embryos, were collected for each condition (plus unexposed controls). In total, 24 samples were collected, although two later failed QC and were excluded from the analysis.

RNA was extracted and sequencing libraries were made using Illumina’s TruSeq Stranded mRNA kit. They were sequenced on one lane of NovaSeq SP PE50, resulting in 16-24 million reads per sample. The reads were aligned to the GRCz11 reference genome with STAR and differentially expressed genes were determined with DESeq2.

Each of the 22 samples has a name like "Cnt_3", where "Cnt" indicates a control sample (the others being "Amp", "Nic" and "Oxy") and 3 is a number indicating the replicate.

Downloads:

Treatment Differentially Expressed Genes Metadata
All Metadata (including counts) Just Gene Annotation Just Ensembl Stable ID
Amphetamine All (irrespective of adjusted -p-value) Amp.counts.all.tsv Amp.annotation.all.tsv Amp.id.all.tsv
Significant (adjusted p-value <= 0.05) Amp.counts.sig.tsv Amp.annotation.sig.tsv Amp.id.sig.tsv
Nicotine All (irrespective of adjusted -p-value) Nic.counts.all.tsv Nic.annotation.all.tsv Nic.id.all.tsv
Significant (adjusted p-value <= 0.05) Nic.counts.sig.tsv Nic.annotation.sig.tsv Nic.id.sig.tsv
Oxycodone All (irrespective of adjusted -p-value) Oxy.counts.all.tsv Oxy.annotation.all.tsv Oxy.id.all.tsv
Significant (adjusted p-value <= 0.05) Oxy.counts.sig.tsv Oxy.annotation.sig.tsv Oxy.id.sig.tsv

Headings:

The column headings are:

1GeneEnsembl ID
2pvalp-value
3adjpAdjusted p-value
4log2fcLog2 fold change
5ChrChromosome (or scaffold) name
6StartGene start (in bp)
7EndGene end (in bp)
8StrandGene strand (1 or -1)
9BiotypeGene biotype (e.g. protein coding or lincRNA)
10NameGene name
11DescriptionGene description
12Cnt_1 countCounts for 1st control replicate
13Cnt_2 countCounts for 2nd control replicate
.........
23Cnt_1 normalised countNormalised counts for 1st control replicate
24Cnt_2 normalised countNormalised counts for 2nd control replicate
.........

The "counts" files contain all of the above headings. The "annotation" files just contain the first 11 columns. The "id" files just contain the first column.