Aim: Detect genes and pathways that are differentially expressed or spliced in response to disease in ATG7 patients.



1 Background

1.1 Patient metadata

Table: ATG7 patient metadata - ATG7_patient_metadata.csv The 5 patient of interest are clinically diagnosed with mitochondrial disease and genetically diagnosed with mutations in the gene ATG7. The table describes the families, patient ids and ATG7s for each patient.

load(here::here("results/aberrant_expression/OUTRIDER/50_controls/gene_count_ods.rda"))

# remove the technical replicate
gene_count_ods <- gene_count_ods[, colData(gene_count_ods)[["samp_id_tidy"]] != "S2557"]

which_ATG7 <- which(colData(gene_count_ods)[["gene_name"]] == "ATG7")
which_ATG7 <- colData(gene_count_ods)[["samp_id_tidy"]][which_ATG7] %>% 
  as.character()

ATG7_patient_metadata <- colData(gene_count_ods)[which_ATG7,] %>% 
  as_tibble() %>% 
  dplyr::select(samp_id_tidy, gene_name, mutation_type, mutation_hgvs)

# convert list cols to strings 
ATG7_patient_metadata$mutation_type <- 
  ATG7_patient_metadata$mutation_type %>% 
  lapply(FUN = str_c, collapse = ";") %>% 
  unlist()

ATG7_patient_metadata$mutation_hgvs <- 
  ATG7_patient_metadata$mutation_hgvs %>% 
  lapply(FUN = str_c, collapse = ";") %>% 
  unlist()

# add family data 
ATG7_patient_metadata <- 
  ATG7_patient_metadata %>% 
  mutate(family = c(2, 2, 3, 4, 1)) %>% 
  arrange(family) 

ATG7_patient_metadata
# save patient metadata
ATG7_patient_metadata %>%
  write_delim(here::here("results/tables_figures/ATG7_patient_metadata.csv"), delim = ",")

1.2 OUTRIDER

OUTRIDER is a bioinformatics software that aims at detecting aberrantly expressed genes from RNA-seq data. It is specifically designed for a rare disease context (1 vs all) by correcting for co-variations that are not known a-priori using an autoencoder to to help distinguish outliers more readily. OUTRIDER then applies a negative binomial test to find outlier expressed genes.

1.3 DESeq2

DESeq2 aims at finding differentially expressed genes by modelling reads using a negative binomial distribution (similar to OUTRIDER), then using a binomial test to ascertain significance.

1.4 dasper

dasper uses an outlier detection model to find aberrant splicing events. As input it uses junction (reads with a gapped alignment that signify the splicing out of an intron) counts and coverage in associated regions as features. Then for each patient using a 1 vs all method, compares these features to a set of controls.



2 Methods

All source code can be found at: https://github.com/dzhang32/ATG7_RNAseq

Patient and control samples

RNA-sequencing was performed on a total of 58 individuals. 5 of these were ATG7 patients detailed in Table: ATG7 patient metadata - ATG7_patient_metadata.csv. In particular, samples were obtained from patients M1856.17, M0920.18, M0921.18, M1111.19 and M1716.19 from families 1, 2, 2, 3 and 4 respectively. The remaining 53 samples were used as the control samples consisting of a cohort of 46 patients diagnosed with mitochondrial disease, 7 patients with congenital muscular dystrophy and 2 unaffected individuals.

Culturing fibroblasts

Fibroblast cell lines were obtained from 58 individuals were obtained through collaboration with Dr. Haiyan Zhou and the Lily Consortium consisting of Dr. Charu Deshpande, Dr. Ines Barbosa, Prof. Joanna Poulton, Prof. Michael Simpson, Prof. Robert McFarland, Dr. Robert Pitceathly and Prof. Robert Taylor. DMEM supplemented with 10% Fetal Bovine Serum and 0.05 g/ml Uridine was used for culturing cells. Harvesting cells was performed by detaching cells using TrypLE Enzyme, washing cells once with washed with DPBS before storage at -80°C.

RNA-sequencing

RNA from fibroblast pellets was either extracted and sequenced at Eurofins Genomics or extracted through Bioxpedia, then sequenced at UCL Genomics. RNA integrity numbers (RIN) were measured using Agilent Technologies 2100 Bioanalyzer or Agilent 4200 Tapestation by Eurofins/Bioxpedia respectively. All RIN values exceeded 8.0. RNA library preparation was performed using the INVIEW Transcriptome Discover protocol from Eurofins. Specifically, the cDNA library was prepared using a random primed, strand specific, poly-A selection protocol. Paired-end sequencing was performed on illumina NovaSeq 6000 Sequencing System machines to an coverage of ~100 million pair-end reads per sample. Reads of 150-bp length were used to increase the proportion of reads with a gapped alignment, which represent splicing events.

RNA-sequencing alignment and quality control of patient samples

Pre-alignment quality control including adapter trimming, read filtering and base correction was performed using fastp, an all-in-one FASTQ preprocessor (v0.20.0) (1 - fastp). Reads were aligned using STAR 2-pass (v2.7.0) to the hg38 build of the reference human genome (hg38) using gene annotation from Ensembl v97 (2 - STAR). 1st pass alignment was used to discover novel junctions, which were used as input to the 2nd pass to improve the sensitivity of junction detection. Via the option --outFilterMultimapNmax 1 reads were required to be uniquely mapped to only a single position in the genome. The minimum required overhang length of an annotated and unannotated junction was required to be 8 and 3 base pairs respectively. The output BAM files underwent post-alignment QC using RSeQC (v2.6.4) with all samples passing quality control after manual assessment (3 - RSeQC).

Detecting abberantly expressed genes and pathways

Gene count matrices were generated using RNA-SeQC (v2.3.4) with Ensembl v97 reference annotation matching the protocol used in https://github.com/broadinstitute/gtex-pipeline (4 - RNA-SeQC). OUTRIDER (v1.6.1) was used to find outlier expression of genes in each ATG7 patient, hence an all vs 1 experiment design was used, whereby each ATG7 patient were compared to the remaining 53 controls (5 - OUTRIDER). OUTRIDER was run with default setting and the autoencoder was fit using the optimal encoding dimension found to the be 12. P-values were corrected for multiple testing using the Benjamin Hochberg method and 0.05 was used as the signifance threshold. DESeq2 (v1.26.0) was applied to detect common genes or pathways that were dysregulated across all ATG7 patients (6 - DESeq2). Thus, all 5 ATG7 were grouped together as the case cohort and compared to the 53 control samples. The Benjamin Hochberg method was used for multiple test correction and genes with a p-value less than 0.05 and an absolute log-fold change greater than 1.5 were considered differentially expressed. gprofiler2 (v0.1.9) was then run to with significantly differentially expressed genes as input, ranked by p-value to find dysregulated pathways (7 - gprofiler2).

Detecting abberantly expressed splicing events

dasper (v0.99.0) was used to detect aberrant splicing events in each of the 5 ATG7 patients. Foremost, BAM files were converted into the BigWig format using megadepth (8 - dasper, 9/10 - megadepth). Junctions outputted by STAR and BigWig files were inputted into the dasper pipeline. Foremost, dasper loads, filters junction data. Here, we required junctions to have at least 5 counts in at least 1 sample, a length between 20-1,000,000 base pairs and to not overlap any ENCODE blacklist regions (11 - ENCODE blacklist regions). Junctions were then annotated using Ensembl v97 reference annotation and classified in the categories "annotated", "novel_acceptor", "novel_donor", "novel_combo", "novel_exon_skip", "ambig_gene" and "unannotated". Junctions were then clustered by their shared acceptor or donor sites and their counts were normalised. Coverage data associated with each junction was loaded and normalised for each sample. Junction and coverage normalised counts were scored using the z-score approach. Resulting z-scores for each patient were placed into a isolation forest model to rank splicing events by how abberrant they looked in relation to controls. Sashimi plots describing the splicing across the ATG7 patients were plotted for each of the 5 patients using dasper.



3 Results

3.1 Aberrant expression

3.1.1 ATG7 is detected an expression outlier in family 1

3.1.1.1 Text

OUTRIDER detected ATG7 as an expression outlier in family 1 (pvalue: 0.002421725, zscore: -4.59) (Table: ATG7 OUTRIDER - ATG7_OUTRIDER_results.csv). Consistent with this, when plotting the expression of ATG7 as fragments per kilobase of exon model per million reads mapped (FPKM) against the remaining patient samples, family 1 was observed to have lower ATG7 expression compared to the remaining ATG7 patients and controls. All other families looked to have similar ATG7 expression to the remaining patient samples (Figure: ATG7 expression - ATG7_expr_plot.csv).



3.1.1.2 Tables

Table: ATG7 OUTRIDER - ATG7_OUTRIDER_results.csv

OUTRIDER results for the gene ATG7. The full set of OUTRIDER results can be found at ATG7_OUTRIDER_results.csv.

load(here::here("results/aberrant_expression/OUTRIDER/50_controls/ATG7_res_all.rda"))

ATG7_res_ATG7_only <- ATG7_res_all %>% 
  filter(sampleID %in% ATG7_patient_metadata[["samp_id_tidy"]]) %>%
  mutate(padjust = p.adjust(pValue, method = "fdr")) %>% 
  dplyr::select(Description, everything())

ATG7_res_ATG7_only %>% 
  filter(Description == "ATG7") %>% 
  dplyr::select(-geneID)
write_delim(ATG7_res_ATG7_only, 
            here::here("results/tables_figures/ATG7_OUTRIDER_results.csv"), 
            delim = ",")



3.1.1.3 Figures

Figure: ATG7 expression - ATG7_expr_plot.png

plot_fpkm <- function(gene_id, gene_name, gene_count_ods, ATG7_patient_metadata){
  
  gene_count_ods_filtered <- gene_count_ods[names(gene_count_ods) == gene_id, ]
  
  expr_to_plot <- 
    assays(gene_count_ods_filtered)[["fpkm"]] %>% 
    as.data.frame() %>% 
    gather(key = "samp_id", value = "fpkm")  %>% 
    mutate(ATG7_patient = samp_id %in% ATG7_patient_metadata[["samp_id_tidy"]]) 
  
  families <- tibble(samp_id = c("M0920.18", "M0921.18", "M1111.19", "M1716.19", "M1856.17", "S2557"), 
                     family = c(2, 2, 3, 4, 1, 1))
  
  expr_plot <- expr_to_plot %>% 
    left_join(families) %>% 
    mutate(family = ifelse(is.na(family), "control", family)) %>% 
    ggplot(aes(x = ATG7_patient, y = fpkm)) + 
    geom_boxplot() +
    geom_jitter(aes(colour = family)) + 
    scale_x_discrete(name = "ATG7 Patient") + 
    scale_y_continuous(name = "FPKM") + 
    scale_colour_manual(name = "Family", 
                        values = c(ggpubr::get_palette("npg", 4), "black")) + 
    theme(legend.position = "right")
  
  ggsave(expr_plot, 
         filename = str_c(gene_name, "_expr_plot.png"), 
         path = here::here("results/tables_figures"), 
         width = 6, height = 4, dpi = 600)
  
  return(expr_plot + labs(subtitle = str_c(gene_name, " expression")))
  
}
         
plot_fpkm(gene_id = "ENSG00000197548", 
          gene_name = "ATG7", 
          gene_count_ods, 
          ATG7_patient_metadata)



3.1.2 OUTRIDER detects 9 genes that are expression outliers common to at least 2 of the ATG7 patients

3.1.2.1 Text

OUTRIDER detected 264 unique genes as expression outliers across all 5 samples. In order to find genes that were commonly disrupted across ATG7 patients we filtered results by genes that were outliers in at least two ATG7 patient samples. 9 genes remained after this filter (Table: Common outliers - ATG7_OUTRIDER_common_outliers.csv).

VPS41, HMG20A and TBC1D3L were specifically down-regulated in family 2 (Figure: VPS41/HMG20A/TBC1D3L expression - GENE_NAME_expr_plot.png). TBC1D3L is a GTPase activating protein for RAB5, which looks to have a role in macroautophagy (https://jcs.biologists.org/content/121/10/e1002). VPS41 has a role in vesicular transport and is suggested to be neuroprotective in AD/PD acting via an autophagy mechanism (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276837/).



3.1.2.2 Tables

Table: Common outliers - ATG7_OUTRIDER_common_outliers.csv Each row represents a patient and each column represents a gene that was detected as a outlier in at least 2 of the ATG7 patients. Values in each cell represent the “z-score;pvalue” for each patient/gene respectively. Green indicates a positive z-score and red a negative z-score with respect to controls. Bold font is used for those that are significant outliers (p-value < 0.05 after FDR correction).

ATG7_res_ATG7_only_signif <- ATG7_res_ATG7_only %>% 
  filter(padjust <= 0.05) %>% 
  dplyr::select(Description, everything())

common_genes <- ATG7_res_ATG7_only_signif %>% 
  dplyr::count(Description) %>% 
  filter(n > 1) %>% 
  arrange(desc(n)) 

common_genes <- 
  ATG7_res_ATG7_only %>% 
  filter(Description %in% common_genes$Description, 
         sampleID %in% ATG7_patient_metadata[["samp_id_tidy"]]) %>% 
  arrange(Description) %>% 
  mutate(pval_zscore = str_c(zScore, ";", round(padjust, 2))) %>% 
  dplyr::select(`Sample ID` = sampleID, pval_zscore, gene_name = Description) %>% 
  arrange(gene_name) %>% 
  spread(key = gene_name, value = pval_zscore) %>% 
  left_join(ATG7_patient_metadata %>% 
              dplyr::select(samp_id_tidy, family), 
            by = c("Sample ID" = "samp_id_tidy")) %>% 
  dplyr::select(Family = family, everything()) %>% 
  arrange(Family)

common_genes
write_delim(common_genes, 
            here::here("results/tables_figures/ATG7_OUTRIDER_common_outliers.csv"), 
            delim = ",")



3.1.2.3 Figures

Figure: VPS41/HMG20A/TBC1D3L expression - GENE_NAME_expr_plot.png

plot_fpkm(gene_id = "ENSG00000140382", 
          gene_name = "HMG20A", 
          gene_count_ods, 
          ATG7_patient_metadata)

plot_fpkm(gene_id = "ENSG00000274512", 
          gene_name = "TBC1D3L", 
          gene_count_ods, 
          ATG7_patient_metadata)

plot_fpkm(gene_id = "ENSG00000006715", 
          gene_name = "VPS41", 
          gene_count_ods, 
          ATG7_patient_metadata)



3.2 Aberrant splicing

3.2.1 dasper detects aberrant splicing in ATG7 in family 1 but not families 2, 3 and 4

3.2.1.1 Text

Aberrant splicing is detected in ATG7 in family 1 with a rank of 22 when using patient samples as controls (Figure: ATG7 outlier scores - ATG7_outlier_scores_plot.png). Sashimi plots describing the splicing changes across ATG7 demonstrate a novel acceptor position used in the family 1 which causes an exonic extension and as a consequence, a premature stop codon (Figure: ATG7 sashimi plots - PATIENT_ID_sashimi_plot_whole_gene.png ). For the remaining patient samples, the splicing within ATG7 was ranked above 3000 suggesting that splicing is not the pathogenic mechanism in families 2, 3 and 4. Consistent with this, no abnormal splicing events are observed in the sashimi plots in the areas surrounding the missense mutations.



3.2.1.2 Tables

Table: ATG7 patient metadata - ATG7_dasper_outlier_scores.csv dasper outlier scores for the top 100 most aberrant splicing events in each patient is shown below. The full table of all dasper results can be found in the file ATG7_dasper_outlier_scores.csv.

load("/home/dzhang/projects/ATG7_RNAseq/results/aberrant_splicing/dasper/outlier_scores.rda")

outlier_scores[["junctions"]] <- NULL

# generate a character for saving output
outlier_scores[["gene_id"]] <- outlier_scores[["gene_id_cluster"]] %>% 
  lapply(str_c, collapse = ",") %>% 
  unlist()

outlier_scores %>% 
  as_tibble() %>% 
  filter(rank %in% 1:100) %>% 
  dplyr::select(-gene_id_cluster)
outlier_scores %>% 
  as_tibble() %>% 
  dplyr::select(-gene_id_cluster) %>% 
  write_delim(here::here("results/tables_figures/ATG7_dasper_outlier_scores.csv"), delim = ",")



3.2.1.3 Figures

Figure: ATG7 outlier scores - ATG7_outlier_scores_plot.png The y-axis ranks represent the degree of aberrant splicing in ATG7 in each patient. The x-axis details each of the 5 patients. The fill of the bars represent the family the patient belongs to and consequently, they're mutation type. A rank of 1 corresponds to the most aberrant splicing event in each patient.

ATG7_outlier_scores <- 
  outlier_scores[any(outlier_scores[["gene_id_cluster"]] == "ENSG00000197548"),] %>% 
  as_tibble() %>% 
  group_by(samp_id) %>% 
  filter(rank == min(rank))

ATG7_outlier_scores <- ATG7_outlier_scores %>% 
  left_join(
    ATG7_patient_metadata %>% 
      mutate(fam_mut = mutation_type %>% 
               str_replace(".*;", "") %>% 
               str_c(family, " - ", .)) %>% 
      dplyr::select(samp_id = samp_id_tidy, 
                    fam_mut, 
                    family)
  ) %>% 
  arrange(family)

ATG7_outlier_scores <- ATG7_outlier_scores %>% 
  mutate(fam_mut = fam_mut %>% 
           factor(unique(ATG7_outlier_scores[["fam_mut"]])))

ATG7_outlier_scores_plot <- ATG7_outlier_scores %>% 
  ggplot(aes(x = samp_id, y = rank, fill = fam_mut)) + 
  geom_col(colour = "black") + 
  geom_text(aes(label = rank), 
            nudge_y = 1000) + 
  scale_fill_manual(name = "Family (mutation)", 
                    values = ggpubr::get_palette("npg", 4)) +
  scale_y_continuous(name = "ATG7 rank") + 
  scale_x_discrete(name = "Sample ID")
  
ATG7_outlier_scores_plot

ggsave(plot = ATG7_outlier_scores_plot, 
       filename = "ATG7_outlier_scores_plot.png", 
       path = here::here("results/tables_figures"), 
       width = 8, 
       height = 5)

Figure: ATG7 sashimi plots - PATIENT_ID_sashimi_plot_whole_gene.png The sashimi plots are arranged in two panels. The top demonstrates the coverage for each base across the entire ATG7 gene, normalised by division by the total exonic coverage of ATG7. The bottom describes the splicing across the ATG7, with curved lines representing junctions. Junctions are coloured by their annotation and labelled with their normalised read count in cases and controls. Red x's mark the mutation sites in each patient.

3.2.1.3.1 M1856.17
knitr::include_graphics(here::here("results/tables_figures", "M1856.17_sashimi_whole_gene.png")) 

3.2.1.3.2 M0920.18
knitr::include_graphics(here::here("results/tables_figures", "M0920.18_sashimi_whole_gene.png")) 

3.2.1.3.3 M0921.18
knitr::include_graphics(here::here("results/tables_figures", "M0921.18_sashimi_whole_gene.png")) 

3.2.1.3.4 M1111.19
knitr::include_graphics(here::here("results/tables_figures", "M1111.19_sashimi_whole_gene.png")) 

3.2.1.3.5 M1716.19
knitr::include_graphics(here::here("results/tables_figures", "M1716.19_sashimi_whole_gene.png")) 



3.3 Exploratory analysis

3.3.1 Common outliers to M1856-17, M1111-19, and M1716-19

3.3.1.1 Text

There are no OUTRIDER hits in common between the 3 samples. Given that OUTRIDER is conservative, in order to find those that are most likely shared hits between the 3 samples we ranked all OUTRIDER genes in each sample. Then obtained a mean rank across samples - the lower the rank, the more commonly dysrupted the gene across the three samlples (Table: M1856-17, M1111-19, M1716-19 common outliers - ATG7_OUTRIDER_M1856_M1111_M1716_common_outliers.csv).



3.3.1.2 Tables

Table: M1856-17, M1111-19, M1716-19 common outliers - ATG7_OUTRIDER_M1856_M1111_M1716_common_outliers.csv The below is a slice of the full table in ATG7_OUTRIDER_M1856_M1111_M1716_common_outliers.csv. It contains the top 100 genes that have potentially sharing between the 3 samples M1856-17, M1111-19, M1716-19.

common_outliers_M1856_M1111_M1716 <- 
  ATG7_res_ATG7_only %>% 
  filter(sampleID %in% c("M1856.17", "M1111.19", "M1716.19")) %>% 
  group_by(sampleID) %>% 
  mutate(rank = rank(padjust)) %>% 
  group_by(geneID) %>% 
  mutate(mean_rank = mean(rank)) %>% 
  ungroup() %>% 
  arrange(mean_rank) 

common_outliers_M1856_M1111_M1716 %>% 
  write_delim(here::here("results/tables_figures/ATG7_OUTRIDER_M1856_M1111_M1716_common_outliers.csv"), 
              delim = ",")

common_outliers_M1856_M1111_M1716 %>% 
  filter(!duplicated(Description)) %>% 
  dplyr::select(Description, geneID, mean_rank) %>% 
  head(100)



3.3.2 NFE2L2 expression

3.3.2.1 Text

NFE2L2 may be up-regulated in ATG7 patients. NFE2L2 is not an expression outlier and just misses the p-value cut-off for DESeq2 (p-value: 0.07). However, when plotting the expression of NFE2L2 it looks like it may be up-regulated in ATG7 patients (Figure: NFE2L2 expression - NFE2L2_expr_plot.png).



3.3.2.2 Figures

Figure: NFE2L2 expression - NFE2L2_expr_plot.png

plot_fpkm(gene_id = "ENSG00000116044", 
          gene_name = "NFE2L2", 
          gene_count_ods, 
          ATG7_patient_metadata)



3.3.3 RAB9A expression

3.3.3.1 Text

RAB9A expression does not look to be significantly different in the ATG7 patients compared with controls. It is neither an OUTRIDER or DESeq2 hit (Figure: RAB9A expression - RAB9A_expr_plot.png).



3.3.3.2 Figures

Figure: RAB9A expression - RAB9A_expr_plot.png

plot_fpkm(gene_id = "ENSG00000123595", 
          gene_name = "RAB9A", 
          gene_count_ods, 
          ATG7_patient_metadata)



4 Supplementary/intermediary analysis

Here sits any key analysis steps that were run, however were not part of the main results section. This is included for the purpose of reproducibility.

4.1 Obtain gene counts

4.1.1 Generate genes-only version of Ensembl v97 gtf


cd /tools
git clone https://github.com/broadinstitute/gtex-pipeline
chmod -R 775 gtex-pipeline

# generate the genes gtf
/tools/gtex-pipeline/gene_model/collapse_annotation.py \
/data/references/ensembl/gtf_gff3/v97/Homo_sapiens.GRCh38.97.gtf \
/data/references/ensembl/gtf_gff3/v97/Homo_sapiens.GRCh38.97.genes.gtf

4.1.2 RNA-SEQC

source(here::here("scripts/aberrant_expression/get_gene_count_RSE.R"))

4.2 OUTRIDER intermediary plots

These are intermediate plots returned during the OUTRIDER analysis and can be ignored unless you're interested.

4.2.1 Filter by expression

  • There are 12339 genes remaining after filtering by expression.
knitr::include_graphics(here::here("results/aberrant_expression/OUTRIDER/50_controls/FPKM.png"))

knitr::include_graphics(here::here("results/aberrant_expression/OUTRIDER/50_controls/expressed_genes.png"))

4.2.2 Pre-correction gene count heatmaps

knitr::include_graphics(here::here("results//aberrant_expression/OUTRIDER/50_controls/count_cor_pre_correction.png"))

knitr::include_graphics(here::here("results//aberrant_expression/OUTRIDER/50_controls/count_gene_sample_pre_correction.png"))

4.2.3 Optimising the autoencoder correction dimension

knitr::include_graphics(here::here("results//aberrant_expression/OUTRIDER/50_controls/encoding_dim.png"))

4.2.4 Post-correction gene count heatmaps

Here, I have included one example (M0920.18). In reality, there's plots like these for each ATG7 sample as I have re-run each ATG7 with the other ATG7 samples removed.

knitr::include_graphics(here::here("results//aberrant_expression/OUTRIDER/50_controls/count_cor_post_correction_M0920.18.png"))

knitr::include_graphics(here::here("results//aberrant_expression/OUTRIDER/50_controls/count_gene_sample_post_correction_M0920.18.png"))

4.3 dasper

As dasper dependencies require a more recent version of Ubuntu we'll need to use a docker to run this analysis. First we have to set up the docker:


sudo docker run -it --rm --name ATG7_analysis \
-v /home/dzhang/projects/RNA_seq_diag_mito/results/tidy_samp_metadata/mito_samp_metadata_tidy.rda:\
/home/dzhang/projects/RNA_seq_diag_mito/results/tidy_samp_metadata/mito_samp_metadata_tidy.rda \
-v /data/references/ENCODE_blacklist_v2/hg38-blacklist.v2.bed:\
/data/references/ENCODE_blacklist_v2/hg38-blacklist.v2.bed \
-v /data/references/ensembl/gtf_gff3/v97/Homo_sapiens.GRCh38.97.gtf:\
/data/references/ensembl/gtf_gff3/v97/Homo_sapiens.GRCh38.97.gtf \
-v /data/RNA_seq_diag/mito/STAR/:\
/data/RNA_seq_diag/mito/STAR/ \
-v /data/RNA_seq_diag/mito_add_pos_ctrls/STAR/:\
/data/RNA_seq_diag/mito_add_pos_ctrls/STAR/ \
-v /data/RNA_seq_diag/mito/bw/:\
/data/RNA_seq_diag/mito/bw/ \
-v /data/RNA_seq_diag/mito_add_pos_ctrls/bw/:\
/data/RNA_seq_diag/mito_add_pos_ctrls/bw/ \
dasper:v1 bash

Then run the script to conduct the dasper analysis:

source(here::here("scripts/aberrant_splicing/dasper.R"))

Finally, copy the data back over to our server from the docker.


sudo docker cp \
ATG7_analysis:/data/junctions.rda \
/home/dzhang/projects/ATG7_RNAseq/results/aberrant_splicing/dasper/junctions.rda

sudo docker cp \
ATG7_analysis:/data/junctions_normed_for_plotting.rda \
/home/dzhang/projects/ATG7_RNAseq/results/aberrant_splicing/dasper/junctions_normed_for_plotting.rda

sudo docker cp \
ATG7_analysis:/data/outlier_scores.rda \
/home/dzhang/projects/ATG7_RNAseq/results/aberrant_splicing/dasper/outlier_scores.rda



6 Reproducibility

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.1 (2019-07-05)
##  os       Ubuntu 16.04.6 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_GB.UTF-8                 
##  ctype    en_GB.UTF-8                 
##  tz       Europe/London               
##  date     2020-10-24                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package              * version  date       lib source                              
##  abind                  1.4-5    2016-07-21 [2] CRAN (R 3.6.1)                      
##  annotate               1.64.0   2019-10-29 [1] Bioconductor                        
##  AnnotationDbi        * 1.48.0   2019-10-29 [1] Bioconductor                        
##  askpass                1.1      2019-01-13 [2] CRAN (R 3.6.1)                      
##  assertthat             0.2.1    2019-03-21 [2] CRAN (R 3.6.1)                      
##  backports              1.1.10   2020-09-15 [1] CRAN (R 3.6.1)                      
##  base64enc              0.1-3    2015-07-28 [2] CRAN (R 3.6.1)                      
##  BBmisc                 1.11     2017-03-10 [1] CRAN (R 3.6.1)                      
##  Biobase              * 2.46.0   2019-10-29 [1] Bioconductor                        
##  BiocFileCache          1.10.2   2019-11-08 [1] Bioconductor                        
##  BiocGenerics         * 0.32.0   2019-10-29 [1] Bioconductor                        
##  BiocParallel         * 1.20.1   2019-12-21 [1] Bioconductor                        
##  biomaRt                2.42.1   2020-03-26 [1] Bioconductor                        
##  Biostrings             2.54.0   2019-10-29 [1] Bioconductor                        
##  bit                    4.0.4    2020-08-04 [1] CRAN (R 3.6.1)                      
##  bit64                  4.0.5    2020-08-30 [1] CRAN (R 3.6.1)                      
##  bitops                 1.0-6    2013-08-17 [2] CRAN (R 3.6.1)                      
##  blob                   1.2.1    2020-01-20 [1] CRAN (R 3.6.1)                      
##  broom                  0.5.6    2020-04-20 [1] CRAN (R 3.6.1)                      
##  car                    3.0-8    2020-05-21 [1] CRAN (R 3.6.1)                      
##  carData                3.0-4    2020-05-22 [1] CRAN (R 3.6.1)                      
##  caTools                1.18.0   2020-01-17 [1] CRAN (R 3.6.1)                      
##  cellranger             1.1.0    2016-07-27 [2] CRAN (R 3.6.1)                      
##  checkmate              2.0.0    2020-02-06 [1] CRAN (R 3.6.1)                      
##  cli                    2.1.0    2020-10-12 [1] CRAN (R 3.6.1)                      
##  cluster                2.1.0    2019-06-19 [4] CRAN (R 3.6.1)                      
##  codetools              0.2-16   2018-12-24 [4] CRAN (R 3.6.1)                      
##  colorspace             1.4-1    2019-03-18 [2] CRAN (R 3.6.1)                      
##  crayon                 1.3.4    2017-09-16 [2] CRAN (R 3.6.1)                      
##  crosstalk              1.1.0.1  2020-03-13 [1] CRAN (R 3.6.1)                      
##  curl                   4.3      2019-12-02 [1] CRAN (R 3.6.1)                      
##  dasper               * 0.99.0   2020-10-23 [1] Github (dzhang32/dasper@148c5cf)    
##  data.table           * 1.13.0   2020-07-24 [1] CRAN (R 3.6.1)                      
##  DBI                    1.1.0    2019-12-15 [2] CRAN (R 3.6.1)                      
##  dbplyr                 1.4.4    2020-05-27 [1] CRAN (R 3.6.1)                      
##  DelayedArray         * 0.12.3   2020-04-09 [1] Bioconductor                        
##  dendextend             1.14.0   2020-08-26 [1] CRAN (R 3.6.1)                      
##  DESeq2                 1.26.0   2019-10-29 [1] Bioconductor                        
##  digest                 0.6.25   2020-02-23 [1] CRAN (R 3.6.1)                      
##  dplyr                * 1.0.2    2020-08-18 [1] CRAN (R 3.6.1)                      
##  ellipsis               0.3.1    2020-05-15 [1] CRAN (R 3.6.1)                      
##  evaluate               0.14     2019-05-28 [2] CRAN (R 3.6.1)                      
##  fansi                  0.4.1    2020-01-08 [1] CRAN (R 3.6.1)                      
##  farver                 2.0.3    2020-01-16 [1] CRAN (R 3.6.1)                      
##  fastmap                1.0.1    2019-10-08 [1] CRAN (R 3.6.1)                      
##  forcats              * 0.5.0    2020-03-01 [2] CRAN (R 3.6.1)                      
##  foreach                1.5.0    2020-03-30 [1] CRAN (R 3.6.1)                      
##  foreign                0.8-72   2019-08-02 [4] CRAN (R 3.6.1)                      
##  Formula                1.2-3    2018-05-03 [2] CRAN (R 3.6.1)                      
##  fs                     1.5.0    2020-07-31 [1] CRAN (R 3.6.1)                      
##  gclus                  1.3.2    2019-01-07 [1] CRAN (R 3.6.1)                      
##  gdata                  2.18.0   2017-06-06 [1] CRAN (R 3.6.1)                      
##  genefilter             1.68.0   2019-10-29 [1] Bioconductor                        
##  geneplotter            1.64.0   2019-10-29 [1] Bioconductor                        
##  generics               0.0.2    2018-11-29 [1] CRAN (R 3.6.1)                      
##  GenomeInfoDb         * 1.22.1   2020-03-27 [1] Bioconductor                        
##  GenomeInfoDbData       1.2.2    2020-08-13 [1] Bioconductor                        
##  GenomicAlignments      1.22.1   2019-11-12 [1] Bioconductor                        
##  GenomicFeatures      * 1.38.2   2020-02-15 [1] Bioconductor                        
##  GenomicRanges        * 1.38.0   2019-10-29 [1] Bioconductor                        
##  ggplot2              * 3.3.2    2020-06-19 [1] CRAN (R 3.6.1)                      
##  ggpubr                 0.4.0    2020-06-27 [1] CRAN (R 3.6.1)                      
##  ggsci                  2.9      2018-05-14 [1] CRAN (R 3.6.1)                      
##  ggsignif               0.6.0    2019-08-08 [1] CRAN (R 3.6.1)                      
##  glue                   1.4.2    2020-08-27 [1] CRAN (R 3.6.1)                      
##  gplots                 3.0.4    2020-07-05 [1] CRAN (R 3.6.1)                      
##  gprofiler2             0.1.9    2020-04-23 [1] CRAN (R 3.6.1)                      
##  gridExtra              2.3      2017-09-09 [2] CRAN (R 3.6.1)                      
##  gtable                 0.3.0    2019-03-25 [2] CRAN (R 3.6.1)                      
##  gtools                 3.8.2    2020-03-31 [1] CRAN (R 3.6.1)                      
##  haven                  2.3.1    2020-06-01 [1] CRAN (R 3.6.1)                      
##  heatmaply              1.1.1    2020-08-25 [1] CRAN (R 3.6.1)                      
##  here                   0.1      2017-05-28 [1] CRAN (R 3.6.1)                      
##  Hmisc                  4.4-1    2020-08-10 [1] CRAN (R 3.6.1)                      
##  hms                    0.5.3    2020-01-08 [1] CRAN (R 3.6.1)                      
##  htmlTable              2.1.0    2020-09-16 [1] CRAN (R 3.6.1)                      
##  htmltools              0.5.0    2020-06-16 [1] CRAN (R 3.6.1)                      
##  htmlwidgets            1.5.1    2019-10-08 [1] CRAN (R 3.6.1)                      
##  httpuv                 1.5.2    2019-09-11 [2] CRAN (R 3.6.1)                      
##  httr                   1.4.2    2020-07-20 [1] CRAN (R 3.6.1)                      
##  IRanges              * 2.20.2   2020-01-13 [1] Bioconductor                        
##  iterators              1.0.12   2019-07-26 [2] CRAN (R 3.6.1)                      
##  jpeg                   0.1-8.1  2019-10-24 [1] CRAN (R 3.6.1)                      
##  jsonlite               1.7.1    2020-09-07 [1] CRAN (R 3.6.1)                      
##  KernSmooth             2.23-17  2020-04-26 [1] CRAN (R 3.6.1)                      
##  knitr                * 1.29     2020-06-23 [1] CRAN (R 3.6.1)                      
##  labeling               0.3      2014-08-23 [2] CRAN (R 3.6.1)                      
##  later                  1.1.0.1  2020-06-05 [1] CRAN (R 3.6.1)                      
##  lattice                0.20-38  2018-11-04 [4] CRAN (R 3.6.1)                      
##  latticeExtra           0.6-29   2019-12-19 [2] CRAN (R 3.6.1)                      
##  lazyeval               0.2.2    2019-03-15 [2] CRAN (R 3.6.1)                      
##  lifecycle              0.2.0    2020-03-06 [1] CRAN (R 3.6.1)                      
##  locfit                 1.5-9.4  2020-03-25 [1] CRAN (R 3.6.1)                      
##  lubridate              1.7.9    2020-06-08 [1] CRAN (R 3.6.1)                      
##  magrittr               1.5      2014-11-22 [2] CRAN (R 3.6.1)                      
##  MASS                   7.3-51.6 2020-04-26 [1] CRAN (R 3.6.1)                      
##  Matrix                 1.2-18   2019-11-27 [1] CRAN (R 3.6.1)                      
##  matrixStats          * 0.57.0   2020-09-25 [1] CRAN (R 3.6.1)                      
##  memoise                1.1.0    2017-04-21 [2] CRAN (R 3.6.1)                      
##  mime                   0.9      2020-02-04 [1] CRAN (R 3.6.1)                      
##  modelr                 0.1.5    2019-08-08 [2] CRAN (R 3.6.1)                      
##  munsell                0.5.0    2018-06-12 [2] CRAN (R 3.6.1)                      
##  nlme                   3.1-148  2020-05-24 [1] CRAN (R 3.6.1)                      
##  nnet                   7.3-12   2016-02-02 [4] CRAN (R 3.6.1)                      
##  openssl                1.4.3    2020-09-18 [1] CRAN (R 3.6.1)                      
##  openxlsx               4.1.5    2020-05-06 [1] CRAN (R 3.6.1)                      
##  OUTRIDER             * 1.5.1    2020-09-17 [1] Github (gagneurlab/OUTRIDER@7fbdaaf)
##  pcaMethods             1.78.0   2019-10-29 [1] Bioconductor                        
##  pheatmap               1.0.12   2019-01-04 [1] CRAN (R 3.6.1)                      
##  pillar                 1.4.6    2020-07-10 [1] CRAN (R 3.6.1)                      
##  pkgconfig              2.0.3    2019-09-22 [2] CRAN (R 3.6.1)                      
##  plotly                 4.9.2.1  2020-04-04 [1] CRAN (R 3.6.1)                      
##  plyr                   1.8.6    2020-03-03 [2] CRAN (R 3.6.1)                      
##  png                    0.1-7    2013-12-03 [1] CRAN (R 3.6.1)                      
##  prettyunits            1.1.1    2020-01-24 [1] CRAN (R 3.6.1)                      
##  progress               1.2.2    2019-05-16 [2] CRAN (R 3.6.1)                      
##  promises               1.1.1    2020-06-09 [1] CRAN (R 3.6.1)                      
##  PRROC                  1.3.1    2018-06-19 [1] CRAN (R 3.6.1)                      
##  purrr                * 0.3.4    2020-04-17 [1] CRAN (R 3.6.1)                      
##  R6                     2.4.1    2019-11-12 [1] CRAN (R 3.6.1)                      
##  rappdirs               0.3.1    2016-03-28 [1] CRAN (R 3.6.1)                      
##  RColorBrewer           1.1-2    2014-12-07 [2] CRAN (R 3.6.1)                      
##  Rcpp                   1.0.5    2020-07-06 [1] CRAN (R 3.6.1)                      
##  RCurl                  1.98-1.2 2020-04-18 [1] CRAN (R 3.6.1)                      
##  reactable            * 0.2.3    2020-10-04 [1] CRAN (R 3.6.1)                      
##  reactR                 0.4.3    2020-07-12 [1] CRAN (R 3.6.1)                      
##  readr                * 1.4.0    2020-10-05 [1] CRAN (R 3.6.1)                      
##  readxl                 1.3.1    2019-03-13 [2] CRAN (R 3.6.1)                      
##  registry               0.5-1    2019-03-05 [1] CRAN (R 3.6.1)                      
##  reprex                 0.3.0    2019-05-16 [2] CRAN (R 3.6.1)                      
##  reshape2               1.4.4    2020-04-09 [2] CRAN (R 3.6.1)                      
##  reticulate             1.16     2020-05-27 [1] CRAN (R 3.6.1)                      
##  rio                    0.5.16   2018-11-26 [1] CRAN (R 3.6.1)                      
##  rlang                  0.4.8    2020-10-08 [1] CRAN (R 3.6.1)                      
##  rmarkdown              2.3      2020-06-18 [1] CRAN (R 3.6.1)                      
##  rpart                  4.1-15   2019-04-12 [4] CRAN (R 3.6.1)                      
##  rprojroot              1.3-2    2018-01-03 [1] CRAN (R 3.6.1)                      
##  Rsamtools              2.2.3    2020-02-23 [1] Bioconductor                        
##  RSQLite                2.2.1    2020-09-30 [1] CRAN (R 3.6.1)                      
##  rstatix                0.6.0    2020-06-18 [1] CRAN (R 3.6.1)                      
##  rstudioapi             0.11     2020-02-07 [2] CRAN (R 3.6.1)                      
##  rtracklayer            1.46.0   2019-10-29 [1] Bioconductor                        
##  rvest                  0.3.5    2019-11-08 [1] CRAN (R 3.6.1)                      
##  S4Vectors            * 0.24.4   2020-04-09 [1] Bioconductor                        
##  scales                 1.1.1    2020-05-11 [1] CRAN (R 3.6.1)                      
##  seriation              1.2-8    2019-08-27 [1] CRAN (R 3.6.1)                      
##  sessioninfo          * 1.1.1    2018-11-05 [2] CRAN (R 3.6.1)                      
##  shiny                  1.5.0    2020-06-23 [1] CRAN (R 3.6.1)                      
##  stringi                1.5.3    2020-09-09 [1] CRAN (R 3.6.1)                      
##  stringr              * 1.4.0    2019-02-10 [2] CRAN (R 3.6.1)                      
##  SummarizedExperiment * 1.16.1   2019-12-19 [1] Bioconductor                        
##  survival               3.2-3    2020-06-13 [1] CRAN (R 3.6.1)                      
##  tibble               * 3.0.4    2020-10-12 [1] CRAN (R 3.6.1)                      
##  tidyr                * 1.1.2    2020-08-27 [1] CRAN (R 3.6.1)                      
##  tidyselect             1.1.0    2020-05-11 [1] CRAN (R 3.6.1)                      
##  tidyverse            * 1.3.0    2019-11-21 [1] CRAN (R 3.6.1)                      
##  TSP                    1.1-10   2020-04-17 [1] CRAN (R 3.6.1)                      
##  vctrs                  0.3.4    2020-08-29 [1] CRAN (R 3.6.1)                      
##  viridis                0.5.1    2018-03-29 [2] CRAN (R 3.6.1)                      
##  viridisLite            0.3.0    2018-02-01 [2] CRAN (R 3.6.1)                      
##  webshot                0.5.2    2019-11-22 [1] CRAN (R 3.6.1)                      
##  withr                  2.3.0    2020-09-22 [1] CRAN (R 3.6.1)                      
##  xfun                   0.18     2020-09-29 [1] CRAN (R 3.6.1)                      
##  XML                    3.99-0.3 2020-01-20 [2] CRAN (R 3.6.1)                      
##  xml2                   1.3.2    2020-04-23 [1] CRAN (R 3.6.1)                      
##  xtable                 1.8-4    2019-04-21 [1] CRAN (R 3.6.1)                      
##  XVector                0.26.0   2019-10-29 [1] Bioconductor                        
##  yaml                   2.2.1    2020-02-01 [1] CRAN (R 3.6.1)                      
##  zip                    2.0.4    2019-09-01 [1] CRAN (R 3.6.1)                      
##  zlibbioc               1.32.0   2019-10-29 [1] Bioconductor                        
## 
## [1] /home/dzhang/R/x86_64-pc-linux-gnu-library/3.6
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library