Skip to contents

Given a set of exons, to_intron() will return the corresponding introns.

Usage

to_intron(exons, group_var = NULL)

Arguments

exons

data.frame() contains exons which can originate from multiple transcripts differentiated by group_var.

group_var

character() if input data originates from more than 1 transcript, group_var must specify the column that differentiates transcripts (e.g. "transcript_id").

Value

data.frame() contains the intron co-ordinates.

Details

It is important to note that, for visualization purposes, to_intron() defines introns precisely as the exon boundaries, rather than the intron start/end being (exon end + 1)/(exon start - 1).

Examples

library(magrittr)
library(ggplot2)

# to illustrate the package's functionality
# ggtranscript includes example transcript annotation
sod1_annotation %>% head()
#> # A tibble: 6 × 8
#>   seqnames  start    end strand type  gene_name transcript_name transcript_biot…
#>   <fct>     <int>  <int> <fct>  <fct> <chr>     <chr>           <chr>           
#> 1 21       3.17e7 3.17e7 +      gene  SOD1      NA              NA              
#> 2 21       3.17e7 3.17e7 +      tran… SOD1      SOD1-202        protein_coding  
#> 3 21       3.17e7 3.17e7 +      exon  SOD1      SOD1-202        protein_coding  
#> 4 21       3.17e7 3.17e7 +      CDS   SOD1      SOD1-202        protein_coding  
#> 5 21       3.17e7 3.17e7 +      star… SOD1      SOD1-202        protein_coding  
#> 6 21       3.17e7 3.17e7 +      exon  SOD1      SOD1-202        protein_coding  

# extract exons
sod1_exons <- sod1_annotation %>% dplyr::filter(type == "exon")
sod1_exons %>% head()
#> # A tibble: 6 × 8
#>   seqnames  start    end strand type  gene_name transcript_name transcript_biot…
#>   <fct>     <int>  <int> <fct>  <fct> <chr>     <chr>           <chr>           
#> 1 21       3.17e7 3.17e7 +      exon  SOD1      SOD1-202        protein_coding  
#> 2 21       3.17e7 3.17e7 +      exon  SOD1      SOD1-202        protein_coding  
#> 3 21       3.17e7 3.17e7 +      exon  SOD1      SOD1-202        protein_coding  
#> 4 21       3.17e7 3.17e7 +      exon  SOD1      SOD1-202        protein_coding  
#> 5 21       3.17e7 3.17e7 +      exon  SOD1      SOD1-202        protein_coding  
#> 6 21       3.17e7 3.17e7 +      exon  SOD1      SOD1-204        processed_trans…

# to_intron() is a helper function included in ggtranscript
# which is useful for converting exon co-ordinates to introns
sod1_introns <- sod1_exons %>% to_intron(group_var = "transcript_name")
sod1_introns %>% head()
#> # A tibble: 6 × 8
#>   seqnames strand type  gene_name transcript_name transcript_biot…  start    end
#>   <fct>    <fct>  <chr> <chr>     <chr>           <chr>             <int>  <int>
#> 1 21       +      intr… SOD1      SOD1-204        processed_trans… 3.17e7 3.17e7
#> 2 21       +      intr… SOD1      SOD1-202        protein_coding   3.17e7 3.17e7
#> 3 21       +      intr… SOD1      SOD1-204        processed_trans… 3.17e7 3.17e7
#> 4 21       +      intr… SOD1      SOD1-201        protein_coding   3.17e7 3.17e7
#> 5 21       +      intr… SOD1      SOD1-203        processed_trans… 3.17e7 3.17e7
#> 6 21       +      intr… SOD1      SOD1-202        protein_coding   3.17e7 3.17e7

# this can be particular useful when combined with
# geom_range() and geom_intron()
# to visualize the core components of transcript annotation
sod1_exons %>%
    ggplot(aes(
        xstart = start,
        xend = end,
        y = transcript_name
    )) +
    geom_range() +
    geom_intron(
        data = to_intron(sod1_exons, "transcript_name")
    )