Introduction to RNA-seq and functional interpretation 2024
Next steps in gene prioritisation
This page has links to materials and example data used during the "Next steps in gene prioritisation" session of the Introduction to RNA-seq and functional interpretation course run in 2024 at the European Bioinformatics Institute.
Slides
Exercises
Data
Before starting the exercises, copy all of the data from "penelopeCloud" to your home directory or download the data from this page.
Experiment:
Zebrafish were exposed to amphetamine, nicotine or oxycodone from 24 hours post fertilisation to 5 days post fertilisation and behavioural assays were performed on the larvae. At 5 dpf, 6 samples, each consisting of pools of 6-7 embryos, were collected for each condition (plus unexposed controls). In total, 24 samples were collected, although two later failed QC and were excluded from the analysis.
RNA was extracted and sequencing libraries were made using Illumina’s TruSeq Stranded mRNA kit. They were sequenced on one lane of NovaSeq SP PE50, resulting in 16-24 million reads per sample. The reads were aligned to the GRCz11 reference genome with STAR and differentially expressed genes were determined with DESeq2.
Each of the 22 samples has a name like "Cnt_3", where "Cnt" indicates a control sample (the others being "Amp", "Nic" and "Oxy") and 3 is a number indicating the replicate.
Downloads:
| Treatment | Differentially Expressed Genes | Metadata | ||
|---|---|---|---|---|
| All Metadata (including counts) | Just Gene Annotation | Just Ensembl Stable ID | ||
| Amphetamine | All (irrespective of adjusted -p-value) | Amp.counts.all.tsv | Amp.annotation.all.tsv | Amp.id.all.tsv | 
| Significant (adjusted p-value <= 0.05) | Amp.counts.sig.tsv | Amp.annotation.sig.tsv | Amp.id.sig.tsv | |
| Nicotine | All (irrespective of adjusted -p-value) | Nic.counts.all.tsv | Nic.annotation.all.tsv | Nic.id.all.tsv | 
| Significant (adjusted p-value <= 0.05) | Nic.counts.sig.tsv | Nic.annotation.sig.tsv | Nic.id.sig.tsv | |
| Oxycodone | All (irrespective of adjusted -p-value) | Oxy.counts.all.tsv | Oxy.annotation.all.tsv | Oxy.id.all.tsv | 
| Significant (adjusted p-value <= 0.05) | Oxy.counts.sig.tsv | Oxy.annotation.sig.tsv | Oxy.id.sig.tsv | |
Headings:
The column headings are:
| 1 | Gene | Ensembl ID | 
|---|---|---|
| 2 | pval | p-value | 
| 3 | adjp | Adjusted p-value | 
| 4 | log2fc | Log2 fold change | 
| 5 | Chr | Chromosome (or scaffold) name | 
| 6 | Start | Gene start (in bp) | 
| 7 | End | Gene end (in bp) | 
| 8 | Strand | Gene strand (1 or -1) | 
| 9 | Biotype | Gene biotype (e.g. protein coding or lincRNA) | 
| 10 | Name | Gene name | 
| 11 | Description | Gene description | 
| 12 | Cnt_1 count | Counts for 1st control replicate | 
| 13 | Cnt_2 count | Counts for 2nd control replicate | 
| ... | ... | ... | 
| 23 | Cnt_1 normalised count | Normalised counts for 1st control replicate | 
| 24 | Cnt_2 normalised count | Normalised counts for 2nd control replicate | 
| ... | ... | ... | 
The "counts" files contain all of the above headings. The "annotation" files just contain the first 11 columns. The "id" files just contain the first column.