Given a set of exons
, to_intron()
will return the corresponding introns.
Arguments
- exons
data.frame()
contains exons which can originate from multiple transcripts differentiated bygroup_var
.- group_var
character()
if input data originates from more than 1 transcript,group_var
must specify the column that differentiates transcripts (e.g. "transcript_id").
Value
data.frame()
contains the intron co-ordinates.
Details
It is important to note that, for visualization purposes, to_intron()
defines introns precisely as the exon boundaries, rather than the intron
start/end being (exon end + 1)/(exon start - 1).
Examples
library(magrittr)
library(ggplot2)
# to illustrate the package's functionality
# ggtranscript includes example transcript annotation
sod1_annotation %>% head()
#> # A tibble: 6 × 8
#> seqnames start end strand type gene_name transcript_name transcript_biot…
#> <fct> <int> <int> <fct> <fct> <chr> <chr> <chr>
#> 1 21 3.17e7 3.17e7 + gene SOD1 NA NA
#> 2 21 3.17e7 3.17e7 + tran… SOD1 SOD1-202 protein_coding
#> 3 21 3.17e7 3.17e7 + exon SOD1 SOD1-202 protein_coding
#> 4 21 3.17e7 3.17e7 + CDS SOD1 SOD1-202 protein_coding
#> 5 21 3.17e7 3.17e7 + star… SOD1 SOD1-202 protein_coding
#> 6 21 3.17e7 3.17e7 + exon SOD1 SOD1-202 protein_coding
# extract exons
sod1_exons <- sod1_annotation %>% dplyr::filter(type == "exon")
sod1_exons %>% head()
#> # A tibble: 6 × 8
#> seqnames start end strand type gene_name transcript_name transcript_biot…
#> <fct> <int> <int> <fct> <fct> <chr> <chr> <chr>
#> 1 21 3.17e7 3.17e7 + exon SOD1 SOD1-202 protein_coding
#> 2 21 3.17e7 3.17e7 + exon SOD1 SOD1-202 protein_coding
#> 3 21 3.17e7 3.17e7 + exon SOD1 SOD1-202 protein_coding
#> 4 21 3.17e7 3.17e7 + exon SOD1 SOD1-202 protein_coding
#> 5 21 3.17e7 3.17e7 + exon SOD1 SOD1-202 protein_coding
#> 6 21 3.17e7 3.17e7 + exon SOD1 SOD1-204 processed_trans…
# to_intron() is a helper function included in ggtranscript
# which is useful for converting exon co-ordinates to introns
sod1_introns <- sod1_exons %>% to_intron(group_var = "transcript_name")
sod1_introns %>% head()
#> # A tibble: 6 × 8
#> seqnames strand type gene_name transcript_name transcript_biot… start end
#> <fct> <fct> <chr> <chr> <chr> <chr> <int> <int>
#> 1 21 + intr… SOD1 SOD1-204 processed_trans… 3.17e7 3.17e7
#> 2 21 + intr… SOD1 SOD1-202 protein_coding 3.17e7 3.17e7
#> 3 21 + intr… SOD1 SOD1-204 processed_trans… 3.17e7 3.17e7
#> 4 21 + intr… SOD1 SOD1-201 protein_coding 3.17e7 3.17e7
#> 5 21 + intr… SOD1 SOD1-203 processed_trans… 3.17e7 3.17e7
#> 6 21 + intr… SOD1 SOD1-202 protein_coding 3.17e7 3.17e7
# this can be particular useful when combined with
# geom_range() and geom_intron()
# to visualize the core components of transcript annotation
sod1_exons %>%
ggplot(aes(
xstart = start,
xend = end,
y = transcript_name
)) +
geom_range() +
geom_intron(
data = to_intron(sod1_exons, "transcript_name")
)