| Title: | Colocalization analysis of transcriptome elements in the presence of isoform heterogeneity and ambiguity |
|---|---|
| Description: | RgnTX allows the integration of transcriptome annotations so as to model the complex alternative splicing patterns. It supports the testing of transcriptome elements without clear isoform association, which is often the real scenario due to technical limitations. It involves functions that do permutaion test for evaluating association between features and transcriptome regions. |
| Authors: | Yue Wang [aut, cre], Jia Meng [aut] |
| Maintainer: | Yue Wang <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 1.14.0 |
| Built: | 2026-06-10 10:45:17 UTC |
| Source: | https://github.com/bioc/RgnTX |
The first step of calculating positional shift over transcriptome regions.
calculateShift(regions, disp, direction = "right", strand = "+")calculateShift(regions, disp, direction = "right", strand = "+")
regions |
A feature set, which should be a GRangesList object. |
disp |
A data frame object. It should have three columns, which are |
direction |
Either to be character "left" or "right", which means the direction to which the starting position is shifting. The former means moving to the direction of 5' while the latter means moving to 3'. |
strand |
Either to be "+" or "-". |
A GRanges object.
# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # Download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 disp.p.l <- data.frame( start = as.numeric(max(end(cds.p))), distance = width - 1, names = trans.id.pstv ) R.p.l <- calculateShift( regions = cds.p, disp = disp.p.l, direction = "left", strand = "+" )# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # Download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 disp.p.l <- data.frame( start = as.numeric(max(end(cds.p))), distance = width - 1, names = trans.id.pstv ) R.p.l <- calculateShift( regions = cds.p, disp = disp.p.l, direction = "left", strand = "+" )
Evaluation function. This function calculates the mean of the distance from each region of set RS1 to the closest region in RS2.
distanceTx(A, B, beta = 0.2, ...)distanceTx(A, B, beta = 0.2, ...)
A |
Region set 1. A Granges or GRangesList object. |
B |
Region set 2. A Granges or GRangesList object. |
beta |
It is a user-defined argument that can filter out the corresponding percent of largest distance values. Default value is 0.2. |
... |
Any additional parameters needed. |
A numeric object.
overlapWidthTx, overlapCountsTx
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") A <- randomizeTx( txdb, trans.ids, random_num = 20, random_length = 100 ) B <- randomizeTx( txdb, trans.ids, random_num = 20, random_length = 100 ) distanceTx(A, B, beta = 0.2)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") A <- randomizeTx( txdb, trans.ids, random_num = 20, random_length = 100 ) B <- randomizeTx( txdb, trans.ids, random_num = 20, random_length = 100 ) distanceTx(A, B, beta = 0.2)
This function receives three arguments: the scope region set, the target region set and the type of strand. It returns a subset of target region set, which is the intersection of the target region set and the scope region set.
extractRegions(regions_A, R, strand = "+")extractRegions(regions_A, R, strand = "+")
regions_A |
The scope region set. A |
R |
The target region set. A |
strand |
The strand type of the transcripts. It has options "+" and "-". |
A GRangesList object.
# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 disp.p.l <- data.frame( start = as.numeric(max(end(cds.p))), distance = width - 1, names = trans.id.pstv ) R.p.l <- calculateShift(regions = cds.p, disp = disp.p.l, direction = "left", strand = "+") R.cds.last200 <- extractRegions(regions_A = cds.p, R = R.p.l, strand = "+")# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 disp.p.l <- data.frame( start = as.numeric(max(end(cds.p))), distance = width - 1, names = trans.id.pstv ) R.p.l <- calculateShift(regions = cds.p, disp = disp.p.l, direction = "left", strand = "+") R.cds.last200 <- extractRegions(regions_A = cds.p, R = R.p.l, strand = "+")
This function makes sure the two input region sets are in the correct format required by RgnTX evaluation functions.
getFormatCorrect(A, B)getFormatCorrect(A, B)
A |
Region set 1. A Granges or GRangesList object. |
B |
Region set 2. A Granges or GRangesList object. |
A list object.
This function returns a default permutation space for features with isoform ambiguity. The default permutation space of a feature is the aggregate of the multiple transcripts it may overlap with. It requires the input feature to be GRanges format.
getPermSpaceByFeatures(features, txdb, type = "mature")getPermSpaceByFeatures(features, txdb, type = "mature")
features |
A |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of regions over transcriptome. |
A list object, which contains two elements.
perm.space: A GRangesList object that includes all the transcripts input features may overlap with.
index: It contains a series of numbers indicating which feature these transcripts are respectively associated with.
getPermSpaceByTxID, getPermSpaceByType
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) permSpace <- getPermSpaceByFeatures(features = m6A_sites_data[1:100], txdb)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) permSpace <- getPermSpaceByFeatures(features = m6A_sites_data[1:100], txdb)
This function returns 5'UTR/CDS/3'UTR/mRNA/full part of transcriptome regions grouped by corresponding transcript ids.
getPermSpaceByTxID(trans_ids = "all", txdb, type = "mature")getPermSpaceByTxID(trans_ids = "all", txdb, type = "mature")
trans_ids |
A character object. The transcript ids. Default is "all". If it takes the default value "all", the space that users get will be the whole transcriptome. |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
A GRangesList object.
getPermSpaceByType, getPermSpaceByFeatures
trans.ids <- c("170", "782", "974", "1364", "1387") library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene permspace <- getPermSpaceByTxID(trans.ids, txdb)trans.ids <- c("170", "782", "974", "1364", "1387") library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene permspace <- getPermSpaceByTxID(trans.ids, txdb)
This function can return 5'UTR/CDS/3'UTR/mRNA/full part of transcriptome regions, following the format required by the main permutation test functions.
getPermSpaceByType(txdb, type = "mature")getPermSpaceByType(txdb, type = "mature")
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
A GRangesList object.
getPermSpaceByTxID, getPermSpaceByFeatures
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene permSpace <- getPermSpaceByType(txdb, type = "CDS")library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene permSpace <- getPermSpaceByType(txdb, type = "CDS")
Calculate a p-value and z-score based on observed value and random evaluation values.
getPvalZscore(orig.ev, rand.ev, pval_z = FALSE)getPvalZscore(orig.ev, rand.ev, pval_z = FALSE)
orig.ev |
Observed value. |
rand.ev |
Random evaluation values. |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
A p-value and a z-score.
Get stop codon regions for input transcripts. This is an example of customPick function.
getStopCodon(trans_ids, txdb, ...)getStopCodon(trans_ids, txdb, ...)
trans_ids |
A character object containing transcript ids. |
txdb |
A TxDb object. |
... |
Any additional parameters needed. |
A numeric object.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS2 <- getStopCodon(trans.ids, txdb)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS2 <- getStopCodon(trans.ids, txdb)
Generate a data frame object that contains information about input genomic feature set and its mapping results over the transcriptome.
getTransInfo(A, txdb)getTransInfo(A, txdb)
A |
Genomic feature set, which should be a |
txdb |
A TxDb object. |
A data.frame object containing the following components:
index_trans: The label of transcripts.
index_features: The label of genomic features.
seqnames: The chr name.
features_pos: The starting coordinate of each genomic feature.
width_features: The width of each genomic feature.
strand: The strand type of each genomic feature.
trans_ID: The ids of the transcripts that each feature can be mapped to.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene getTransInfo(A = m6A_sites_data[1:100], txdb)library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene getTransInfo(A = m6A_sites_data[1:100], txdb)
Convert a GRanges object to a GRangesList object. The output region set follows the format required by the main permutation test functions.
GRanges2GRangesList(A = NULL)GRanges2GRangesList(A = NULL)
A |
A |
If input GRanges object has a metadata named as "group",
ranges having the same group number represent a region. If not, a range is a region.
A region in the input set will be outputted as a list element IN returned
GRangesList object.
A GRangesList object.
library(GenomicRanges) GRanges.object <- GRanges( Rle(c("chr2", "chr2", "chr1", "chr3")), IRanges(1:4, width = 5) ) # Assign the first and the second ranges to the same element. GRanges.object$group <- c(1, 1, 2, 3) GRangesList.object <- GRanges2GRangesList(GRanges.object)library(GenomicRanges) GRanges.object <- GRanges( Rle(c("chr2", "chr2", "chr1", "chr3")), IRanges(1:4, width = 5) ) # Assign the first and the second ranges to the same element. GRanges.object$group <- c(1, 1, 2, 3) GRangesList.object <- GRanges2GRangesList(GRanges.object)
Convert a GRangesList object to a GRanges object. The output region set follows the format required by the RgnTX permutation test functions, which should have metadata columns 'group' and 'transcriptsHits'.
GRangesList2GRanges(A = NULL)GRangesList2GRanges(A = NULL)
A |
A |
A GRanges object. Its transcript ids (if available) should be contained in a metadata column named “transcriptsHits”, which are provided by the names of input GRangesList object.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) RS1 <- GRangesList2GRanges(RS1)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) RS1 <- GRangesList2GRanges(RS1)
This function receives two region sets and returns the number of their overlaps.
overlapCountsTx(A, B, count_once = TRUE, over_trans = TRUE, ...)overlapCountsTx(A, B, count_once = TRUE, over_trans = TRUE, ...)
A |
Region set 1. A |
B |
Region set 2. A |
count_once |
Whether the overlap of multiple B regions with a single A region should be counted once or multiple times. |
over_trans |
Whether the overlapping is counted over the transcriptome or over the genome. |
... |
Any additional parameters needed. |
A numeric object.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") exons.tx0 <- exonsBy(txdb) regions.A <- exons.tx0[trans.ids] A <- randomizeTransByOrder(regions.A, random_length = 200) B <- randomizeTransByOrder(regions.A, random_length = 200) overlapCountsTx(A, B)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") exons.tx0 <- exonsBy(txdb) regions.A <- exons.tx0[trans.ids] A <- randomizeTransByOrder(regions.A, random_length = 200) B <- randomizeTransByOrder(regions.A, random_length = 200) overlapCountsTx(A, B)
Evaluation function. This function receives a feature set (with isoform ambiguity) and a transcriptome region set (without isoform ambiguity), and returns a weighted number of overlaps between them.
overlapCountsTxIA(A, B, ...)overlapCountsTxIA(A, B, ...)
A |
A feature set, which should be |
B |
A region set, which should be |
... |
Any additional parameters needed. |
A numeric object.
overlapWidthTx, distanceTx, overlapCountsTx
library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene RS1 <- m6A_sites_data[1:100] trans.info <- getTransInfo(RS1, txdb) trans.ids <- trans.info[, "trans_ID"] RS2 <- getStopCodon(trans.ids, txdb = txdb) # Evaluation step. orig.ev <- overlapCountsTxIA(RS1, RS2)library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene RS1 <- m6A_sites_data[1:100] trans.info <- getTransInfo(RS1, txdb) trans.ids <- trans.info[, "trans_ID"] RS2 <- getStopCodon(trans.ids, txdb = txdb) # Evaluation step. orig.ev <- overlapCountsTxIA(RS1, RS2)
Evaluation function. This function returns the sum of widths of each overlap between two region sets, i.e., the total number of overlapping nucleotides between two input region sets.
overlapWidthTx(A, B, ...)overlapWidthTx(A, B, ...)
A |
Region set 1. A |
B |
Region set 2. A |
... |
Any additional parameters needed. |
A numeric object.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") A <- randomizeTx( txdb, trans.ids, random_num = 20, random.length = 100 ) B <- randomizeTx( txdb, trans.ids = trans.ids, random_num = 20, random_length = 100 ) overlapWidthTx(A, B)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") A <- randomizeTx( txdb, trans.ids, random_num = 20, random.length = 100 ) B <- randomizeTx( txdb, trans.ids = trans.ids, random_num = 20, random_length = 100 ) overlapWidthTx(A, B)
Perform permutation test for evaluating spatial association between a feature set and a region set.
permTestTx(RS1 = NULL, RS2 = NULL, txdb = NULL, type = "mature", ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)permTestTx(RS1 = NULL, RS2 = NULL, txdb = NULL, type = "mature", ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RS1 |
The region set to be randomized. It should be in the |
RS2 |
The region set to be compared with. It should be in the |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
ntimes |
Randomization times. |
ev_function_1 |
Evaluation function defines what statistic to be tested between |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
permTestTxIA only needs users to input two region sets. It will automatically randomize the first region set into transcriptome.
A list object, which is defined to be permTestTx.results class. It contains the following items:
RSL: Randomized region sets of RS1.
RS1: The feature set to be randomized.
RS2: The region set to be compared with the feature set.
orig.ev: The value of overlapping counts between RS1 and RS2.
rand.ev: The values of overlapping counts between each element in RSL and RS2.
pval: p-value of the test.
zscore: Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 500) A <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) B <- c(randomizeTx(txdb, trans.ids, random_num = 75, random_length = 100), A[1:25]) permTestTx_results <- permTestTx(A, B, txdb, ntimes = 5)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 500) A <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) B <- c(randomizeTx(txdb, trans.ids, random_num = 75, random_length = 100), A[1:25]) permTestTx_results <- permTestTx(A, B, txdb, ntimes = 5)
Perform permutation test for evaluating spatial association between region sets. This permutation test function receives two region sets and a set of randomized region sets of one of them. It evaluates if there is an association between these two region sets.
permTestTx_customAll(RSL = NULL, RS1 = NULL, RS2 = NULL, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)permTestTx_customAll(RSL = NULL, RS1 = NULL, RS2 = NULL, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RSL |
Randomized region sets of |
RS1 |
The region set. It should be in the |
RS2 |
The region set to be compared with. It should be in the |
ev_function_1 |
Evaluation function defines what statistic to be tested between RS1 and RS2. Default is |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in RSL and RS2. Default is |
pval_z |
Boolean. Default is FALSE. If FALSE, p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
permTestTx_customAll will use evaluation function ev_function_1 to calculate the test statistic between RS1 and RS2, and use ev_function_2 to evaluate the statistic between RSL and RS2. It will also return a p-value and a z-score.
A list object, which is defined to be permTestTx.results class. It contains the following items:
RSL: Randomized region sets of RS1.
RS1: The feature set to be randomized.
RS2: The region set to be compared with the feature set.
orig.ev: The value of overlapping counts between RS1 and RS2.
rand.ev: The values of overlapping counts between each element in RSL and RS2.
pval: p-value of the test.
zscore: Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids1<- c("170") RS1 <- randomizeTx(txdb = txdb, trans_ids = trans.ids1, random_num = 20, random_length = 100) RS2 <- randomizeTx(txdb = txdb, trans_ids = trans.ids1, random_num = 20, random_length = 100) trans.ids2 <- c("170", "782", "974", "1364", "1387") RSL <- randomizeTx(txdb = txdb, trans_ids = trans.ids2, random_num = 20, random_length = 100, N = 10) permTestTx_results <- permTestTx_customAll(RSL = RSL, RS1 = RS1, RS2 = RS2)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids1<- c("170") RS1 <- randomizeTx(txdb = txdb, trans_ids = trans.ids1, random_num = 20, random_length = 100) RS2 <- randomizeTx(txdb = txdb, trans_ids = trans.ids1, random_num = 20, random_length = 100) trans.ids2 <- c("170", "782", "974", "1364", "1387") RSL <- randomizeTx(txdb = txdb, trans_ids = trans.ids2, random_num = 20, random_length = 100, N = 10) permTestTx_results <- permTestTx_customAll(RSL = RSL, RS1 = RS1, RS2 = RS2)
Perform permutation test for evaluating spatial association between a feature set and the customPick regions. The latter is defined by the customPick_function argument provided by users.
permTestTx_customPick(RS1 = NULL, txdb = NULL, type = "mature", customPick_function = NULL, ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)permTestTx_customPick(RS1 = NULL, txdb = NULL, type = "mature", customPick_function = NULL, ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RS1 |
The feature set to be randomized. It should be in the |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
customPick_function |
A custom function needs to be inputted by users. The custom function should have two arguments: a TxDb object and a character object of transcript ids. It returns a part of region of each transcript. |
ntimes |
Randomization times. |
ev_function_1 |
Evaluation function defines what statistic to be tested between |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
Each feature in RS1 is only mapped with the customPick regions over its transcript (picked by the customPick_function). The output orig.ev is the number of features that have overlap with its customPick region.
The set of randomized region sets is outputted as RSL. The overlapping counts between each set in RSL with RS2 is outputted as rand.ev.
A list object, which is defined to be permTestTx.results class. It contains the following items:
RSL: Randomized region sets of RS1.
RS1: The feature set to be randomized.
RS2: The region set to be compared with the feature set.
orig.ev: The value of overlapping counts between RS1 and RS2.
rand.ev: The values of overlapping counts between each element in RSL and RS2.
pval: p-value of the test.
zscore: Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 100) RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 200, type = 'CDS') getCDS = function(txdb, trans.id){ cds.tx0 <- cdsBy(txdb, use.names=FALSE) cds.names <- as.character(intersect(names(cds.tx0), trans.id)) cds = cds.tx0[cds.names] return(cds) } permTestTx_results <- permTestTx_customPick(RS1,txdb, customPick_function = getCDS, ntimes = 5)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 100) RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 200, type = 'CDS') getCDS = function(txdb, trans.id){ cds.tx0 <- cdsBy(txdb, use.names=FALSE) cds.names <- as.character(intersect(names(cds.tx0), trans.id)) cds = cds.tx0[cds.names] return(cds) } permTestTx_results <- permTestTx_customPick(RS1,txdb, customPick_function = getCDS, ntimes = 5)
Perform permutation test for evaluating spatial association between some features (with isoform ambiguity) and a region set. It randomizes the features and compares it with the region set to see if there is an association between the features and the region set. The difference between this function and permTestTx is that it is for RNA-related genomic features that have isoform ambiguity, i.e., features that one does not know which transcript they comes from.
permTestTxIA(RS1 = NULL, RS2 = NULL, txdb = NULL, type = 'mature', ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)permTestTxIA(RS1 = NULL, RS2 = NULL, txdb = NULL, type = 'mature', ntimes = 50, ev_function_1 = overlapCountsTx, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RS1 |
The feature set to be randomized. It should be in the |
RS2 |
The region set to be compared with. It should be in the |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
ntimes |
Randomization times. |
ev_function_1 |
Evaluation function defines what statistic to be tested between |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
permTestTxIA only needs users to input two region sets. It will automatically randomize the first region set into transcriptome.
A list object, which is defined to be permTestTx.results class. It contains the following items:
RSL: Randomized region sets of RS1.
RS1: The feature set to be randomized.
RS2: The region set to be compared with the feature set.
orig.ev: The value of overlapping counts between RS1 and RS2.
rand.ev: The values of overlapping counts between each element in RSL and RS2.
pval: p-value of the test.
zscore: Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] trans.ids <- getTransInfo(RS1, txdb)[, "trans_ID"] RS2 <- getStopCodon(trans.ids, txdb) permTestTx_results <- permTestTxIA(RS1 = RS1, RS2 = RS2, txdb = txdb, ntimes = 5)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] trans.ids <- getTransInfo(RS1, txdb)[, "trans_ID"] RS2 <- getStopCodon(trans.ids, txdb) permTestTx_results <- permTestTxIA(RS1 = RS1, RS2 = RS2, txdb = txdb, ntimes = 5)
Perform permutation test for evaluating spatial association between RNA features and a specified kind of regions. The latter is defined by the customPick_function argument input by users. The difference between this function and permTestTx_customPick is that it is for RNA-related genomic features that have isoform ambiguity, i.e., features that one does not know which transcript they comes from.
permTestTxIA_customPick(RS1 = NULL, txdb = NULL, type = 'mature', customPick_function = NULL, ntimes = 50, ev_function_1 = overlapCountsTxIA, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)permTestTxIA_customPick(RS1 = NULL, txdb = NULL, type = 'mature', customPick_function = NULL, ntimes = 50, ev_function_1 = overlapCountsTxIA, ev_function_2 = overlapCountsTx, pval_z = FALSE, ...)
RS1 |
The region set to be randomized. It should be in the |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
customPick_function |
A custom function needs to be inputted by users. The customPick function should have two arguments: a TxDb object and a character object of transcript ids. It returns a specified region over each transcript. |
ntimes |
Randomization times. |
ev_function_1 |
Evaluation function defines what statistic to be tested between |
ev_function_2 |
Evaluation function defines what statistic to be tested between each element in |
pval_z |
Boolean. Default is FALSE. If FALSE, the p-value is calculated based on the number of random evaluations is larger or less than the initial evaluation. If TRUE, the p-value is calculated based on a z-test. |
... |
Any additional parameters needed. |
permTestTxIA_customPick will assess the test statistic between RS1 and each region in RSL, and the relation between RS1 and RS2.
Each RNA feature is only mapped with a part of region on its transcript (picked by the customPick_function). The output orig.ev is the weighted counts between RS1 and RS2. Each feature in RS1 related to n1 isoforms in RS2 and overlapped with n2 RS2 regions will contribute a value of n2/n1 to the total number of overlaps.
This test function also randomizes input features per transcript. The set of randomized results is outputted as RSL. The overlapping counts between each set in RSL with RS2 is outputted as rand.ev.
A list object, which is defined to be permTestTx.results class. It contains the following information:
RSL: Randomized region sets of RS1.
RS1: The feature set to be randomized.
RS2: The region set to be compared with the feature set.
orig.ev: The value of overlapping counts between RS1 and RS2.
rand.ev: The values of overlapping counts between each element in RSL and RS2.
pval: p-value of the test.
zscore: Standard score of the test.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] permTestTx_results <- permTestTxIA_customPick(RS1 = RS1, txdb = txdb, type = 'mature', customPick_function = getStopCodon, ntimes = 5)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] permTestTx_results <- permTestTxIA_customPick(RS1 = RS1, txdb = txdb, type = 'mature', customPick_function = getStopCodon, ntimes = 5)
Show a graphical representation of permutation test.
plotPermResults(permTestTx_results = NULL, breaks = 15, alpha = 0.05, test_type = "one-sided", binwidth = NULL)plotPermResults(permTestTx_results = NULL, breaks = 15, alpha = 0.05, test_type = "one-sided", binwidth = NULL)
permTestTx_results |
A |
breaks |
Histogram breaks. Default is 15. |
alpha |
Significance level. |
test_type |
The type of the test. This argument only receives either two options "one-sided" or "two-sided". Default is "one-sided". |
binwidth |
Histogram binwidth. |
A plot object.
file <- system.file(package="RgnTX", "extdata", "permTestTx_results.rds") permTestTx_results <- readRDS(file) p_a <- plotPermResults(permTestTx_results, binwidth = 1) p_afile <- system.file(package="RgnTX", "extdata", "permTestTx_results.rds") permTestTx_results <- readRDS(file) p_a <- plotPermResults(permTestTx_results, binwidth = 1) p_a
Plot shifted z scores for permutation test results.
plotShiftedZScoreTx(shitedZScoresTx_results)plotShiftedZScoreTx(shitedZScoresTx_results)
shitedZScoresTx_results |
A |
A plot.
file <- system.file(package="RgnTX", "extdata", "shiftedZScoreTx_results1.rds") shiftedZScoreTx_results <- readRDS(file) p1 <- plotShiftedZScoreTx(shiftedZScoreTx_results) p1file <- system.file(package="RgnTX", "extdata", "shiftedZScoreTx_results1.rds") shiftedZScoreTx_results <- readRDS(file) p1 <- plotShiftedZScoreTx(shiftedZScoreTx_results) p1
Randomize features into transcriptome.
randomizeFeaturesTx(RS, txdb, type = "mature", N = 1, ...)randomizeFeaturesTx(RS, txdb, type = "mature", N = 1, ...)
RS |
The feature to be randomized. It should be a |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
N |
The number of iterations. |
... |
Any additional parameters needed. |
A GRangesList object. The name of each element is the id of the transcript where the corresponding range is located.
randomizeTransByOrder, randomizeFeaturesTxIA, randomizeTx
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) RS <- randomizeFeaturesTx(RS1, txdb, N = 1)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100) RS <- randomizeFeaturesTx(RS1, txdb, N = 1)
Randomize features into transcriptome, especially for the features that have isoform ambiguity.
randomizeFeaturesTxIA(RS, txdb, type = "mature", N = 1, ...)randomizeFeaturesTxIA(RS, txdb, type = "mature", N = 1, ...)
RS |
The feature being randomized. It should be a |
txdb |
A TxDb object. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
N |
Randomization times. |
... |
Any additional parameters needed. |
A GRangesList object. The name of each element is the id of the transcript where the corresponding range is located.
randomizeTransByOrder, randomizeFeaturesTx, randomizeTx
library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene RS1 <- m6A_sites_data[1:100] RS <- randomizeFeaturesTxIA(RS1, txdb, N = 1)library(TxDb.Hsapiens.UCSC.hg19.knownGene) file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene RS1 <- m6A_sites_data[1:100] RS <- randomizeFeaturesTxIA(RS1, txdb, N = 1)
This function receives a GRangesList object and picks a random region within each list element of this object. The length of the region to be picked is decided by the input random_length argument.
randomizeTransByOrder(regions_A, random_length = 20)randomizeTransByOrder(regions_A, random_length = 20)
regions_A |
A |
random_length |
A |
A GRangesList object. The name of each list element should be the corresponding transcript id.
randomizeTx, randomizeFeaturesTx, randomizeFeaturesTxIA
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 500) regions.A <- exons.tx0[trans.ids] RS <- randomizeTransByOrder(regions.A, random_length = 20)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene exons.tx0 <- exonsBy(txdb) trans.ids <- sample(names(exons.tx0), 500) regions.A <- exons.tx0[trans.ids] RS <- randomizeTransByOrder(regions.A, random_length = 20)
Pick random regions over specified transcripts.
randomizeTx(txdb, trans_ids = 'all', random_num = 100, random_length = 20, type = 'mature', N = 1, ...)randomizeTx(txdb, trans_ids = 'all', random_num = 100, random_length = 20, type = 'mature', N = 1, ...)
txdb |
A TxDb object. |
trans_ids |
The ids of transcripts, which should be a character object. Random regions will be picked from these transcripts. If this argument takes the default value 'all', the scope of picking random regions will be the whole transcriptome. |
random_num |
The number of regions to be picked. |
random_length |
The length of regions to be picked. |
type |
A character object. Default is "mature". It accepts options "mature", "full", "fiveUTR", "CDS" or "threeUTR", with which one can get corresponding types of transcriptome regions. |
N |
Randomization times. |
... |
Any additional parameters needed. |
A GRangesList object. The name of each element is the id of the transcript where the corresponding range is located.
randomizeTransByOrder, randomizeFeaturesTx, randomizeFeaturesTxIA
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene trans.ids <- c("170", "782", "974", "1364", "1387") RS1 <- randomizeTx(txdb, trans.ids, random_num = 100, random_length = 100)
Calculate shifted z scores for permutation test results.
shiftedZScoreTx(permTestTx_results = NULL, txdb = NULL, window = 200, step = 20, ev_function_1 = overlapCountsTx, ...)shiftedZScoreTx(permTestTx_results = NULL, txdb = NULL, window = 200, step = 20, ev_function_1 = overlapCountsTx, ...)
permTestTx_results |
A |
txdb |
A TxDb object. |
window |
The window of the whole shifting. |
step |
The step of each shifting. |
ev_function_1 |
Evaluation function. Default is |
... |
Any additional parameters needed. |
see examples in plotShiftedZScoreTx
A list object, which is defined to be shitedZScore.results class. It contains the following items:
shifted.z.scores: Standard z-scores after shifting.
window: Window of the whole shifting.
step: Step of each shifting.
original.z.score: Original standard score.
library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] permTestTx_results <- permTestTxIA_customPick(RS1 = RS1, txdb = txdb, customPick_function = getStopCodon, ntimes = 5) shiftedZScoreTx_results <- shiftedZScoreTx(permTestTx_results, txdb = txdb, window = 2000, step = 200, ev_function_1 = overlapCountsTxIA)library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene file <- system.file(package="RgnTX", "extdata/m6A_sites_data.rds") m6A_sites_data <- readRDS(file) RS1 <- m6A_sites_data[1:500] permTestTx_results <- permTestTxIA_customPick(RS1 = RS1, txdb = txdb, customPick_function = getStopCodon, ntimes = 5) shiftedZScoreTx_results <- shiftedZScoreTx(permTestTx_results, txdb = txdb, window = 2000, step = 200, ev_function_1 = overlapCountsTxIA)
Calculate positional shifting over transcript regions. This function accepts a feature set and outputs a region set from it. Each output region is from each input feature.
shiftTx(regions, start, width, direction, strand)shiftTx(regions, start, width, direction, strand)
regions |
A feature set following the format indicated in vignette section 3. Either to be |
start |
Starting positions. Each value represents a starting position in each input feature. |
width |
Widths. Each value represents a width of each region to be picked from each feature. |
direction |
Either to be character "left" or "right", which means the direction to which the starting position is shifting. The former means moving to the direction of 5' while the latter means moving to 3'. |
strand |
The strand type of the transcripts. It receives "+" or "-". |
A Granges object.
# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 start <- as.numeric(max(end(cds.p))) R.cds.last200 <- shiftTx(cds.p, start = start, width = width, direction = 'left', strand = "+")# Take five transcripts. # Extract the last 200 nt regions from their CDS part. library(TxDb.Hsapiens.UCSC.hg19.knownGene) trans.id.pstv <- c("170", "782", "974", "1364", "1387") txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene # download the CDS part of all transcriptome cds.tx0 <- cdsBy(txdb, use.names = FALSE) # pick the CDS part of these five transcripts cds.p <- cds.tx0[trans.id.pstv] width <- 200 start <- as.numeric(max(end(cds.p))) R.cds.last200 <- shiftTx(cds.p, start = start, width = width, direction = 'left', strand = "+")
Generate GRangesList object from vectors. The output region set follows the format required by the main permutation test functions.
vector2GRangesList(RefSeqID, targetName, strand, blockSizes, targetStart)vector2GRangesList(RefSeqID, targetName, strand, blockSizes, targetStart)
RefSeqID |
The name of each element. |
targetName |
The seqnames of each range. |
strand |
The strand of each range. |
blockSizes |
The width of each range. |
targetStart |
The stard coordinate of each range. |
A GRangesList object.