Skip to contents

Vignette in development

In this vignette, we will walk you through the basic usage using the readthis R package for fast reading of output files from various programs, including Mutect2, Strelka, ASCAT, and FACETS. We will use the Strelka VCF files that are installed with the readthis package.

library(readthis)

strelka_vcf <- system.file("extdata", "Strelka", "S1.somatic.snvs.vcf.gz", package = "readthis")

VCF files with Strelka somatic SNVs can be read with read_strelka_somatic_snvs() function. The main argument taken by function is the path, which can be either a path to a single VCF file, vector of paths to many files, or a single path to a directory containing many files.

Reading a single file

In the simpliest case path points a single VCF file:

read_strelka_somatic_snvs(strelka_vcf, verbose = FALSE)
#> # A tibble: 9 × 9
#>   sample_id chrom   pos ref   alt   ref_reads alt_reads    VAF    DP
#>   <chr>     <chr> <int> <chr> <chr>     <int>     <int>  <dbl> <int>
#> 1 TUMOR     chr1      1 T     G            40         2 0.0476  1000
#> 2 TUMOR     chr2      3 A     T            27         3 0.1      554
#> 3 TUMOR     chr3      5 C     A            23         3 0.115    412
#> 4 TUMOR     chr4      7 T     C            39        10 0.204    932
#> 5 TUMOR     chr5      8 A     C            59         3 0.0484   945
#> 6 TUMOR     chr6      9 C     A            25         2 0.0741   500
#> 7 TUMOR     chr7     10 A     C            35         2 0.0541   870
#> 8 TUMOR     chrX     11 A     T            32         1 0.0303   893
#> 9 TUMOR     chrY     12 T     A            62         3 0.0462   740

Reading list of files

path can be a vector of paths to many VCF files

strelka_vcf2 <- system.file("extdata", "Strelka", "S2.somatic.snvs.vcf.gz", package = "readthis")
files <- c(S1 = strelka_vcf, S2 = strelka_vcf2)
read_strelka_somatic_snvs(files, verbose = FALSE)
#> # A tibble: 18 × 10
#>    patient_id sample_id chrom   pos ref   alt   ref_reads alt_reads    VAF    DP
#>    <chr>      <chr>     <chr> <int> <chr> <chr>     <int>     <int>  <dbl> <int>
#>  1 S1         TUMOR     chr1      1 T     G            40         2 0.0476  1000
#>  2 S1         TUMOR     chr2      3 A     T            27         3 0.1      554
#>  3 S1         TUMOR     chr3      5 C     A            23         3 0.115    412
#>  4 S1         TUMOR     chr4      7 T     C            39        10 0.204    932
#>  5 S1         TUMOR     chr5      8 A     C            59         3 0.0484   945
#>  6 S1         TUMOR     chr6      9 C     A            25         2 0.0741   500
#>  7 S1         TUMOR     chr7     10 A     C            35         2 0.0541   870
#>  8 S1         TUMOR     chrX     11 A     T            32         1 0.0303   893
#>  9 S1         TUMOR     chrY     12 T     A            62         3 0.0462   740
#> 10 S2         TUMOR     chr1      1 T     G            40         2 0.0476  1000
#> 11 S2         TUMOR     chr2      3 A     T            27         3 0.1      554
#> 12 S2         TUMOR     chr3      5 C     A            23         3 0.115    412
#> 13 S2         TUMOR     chr4      7 T     C            39        10 0.204    932
#> 14 S2         TUMOR     chr5      8 A     C            59         3 0.0484   945
#> 15 S2         TUMOR     chr6      9 C     A            25         2 0.0741   500
#> 16 S2         TUMOR     chr7     10 A     C            35         2 0.0541   870
#> 17 S2         TUMOR     chrX     11 A     T            32         1 0.0303   893
#> 18 S2         TUMOR     chrY     12 T     A            62         3 0.0462   740

Reading all files from the directory

strelka_dir <- system.file("extdata", "Strelka", package = "readthis")
read_strelka_somatic_snvs(strelka_dir, verbose = FALSE)
#> # A tibble: 18 × 10
#>    patient_id sample_id chrom   pos ref   alt   ref_reads alt_reads    VAF    DP
#>    <chr>      <chr>     <chr> <int> <chr> <chr>     <int>     <int>  <dbl> <int>
#>  1 S1         TUMOR     chr1      1 T     G            40         2 0.0476  1000
#>  2 S1         TUMOR     chr2      3 A     T            27         3 0.1      554
#>  3 S1         TUMOR     chr3      5 C     A            23         3 0.115    412
#>  4 S1         TUMOR     chr4      7 T     C            39        10 0.204    932
#>  5 S1         TUMOR     chr5      8 A     C            59         3 0.0484   945
#>  6 S1         TUMOR     chr6      9 C     A            25         2 0.0741   500
#>  7 S1         TUMOR     chr7     10 A     C            35         2 0.0541   870
#>  8 S1         TUMOR     chrX     11 A     T            32         1 0.0303   893
#>  9 S1         TUMOR     chrY     12 T     A            62         3 0.0462   740
#> 10 S2         TUMOR     chr1      1 T     G            40         2 0.0476  1000
#> 11 S2         TUMOR     chr2      3 A     T            27         3 0.1      554
#> 12 S2         TUMOR     chr3      5 C     A            23         3 0.115    412
#> 13 S2         TUMOR     chr4      7 T     C            39        10 0.204    932
#> 14 S2         TUMOR     chr5      8 A     C            59         3 0.0484   945
#> 15 S2         TUMOR     chr6      9 C     A            25         2 0.0741   500
#> 16 S2         TUMOR     chr7     10 A     C            35         2 0.0541   870
#> 17 S2         TUMOR     chrX     11 A     T            32         1 0.0303   893
#> 18 S2         TUMOR     chrY     12 T     A            62         3 0.0462   740

readthis contains methods for bulk reading of output files from some other programs. To see the list of functions implemented in the package go to Reference page.