Performs gene set enrichment testing using a gene-wise permutation procedure,
while accounting for confounding variables. Note that the deconfounding step
requires that appropriately labeled covariate columns are provided in the
obj.info file (otherwise standard gene score residuals are passed to
permutations). Applies a Generalized Pareto Distribution (GPD) to the tail of
the empirical null for accurate estimation of small p-values.
Usage
plR_permute(
plR.input,
permute = TRUE,
n.perm = 500000L,
n.boot = 30L,
alt = "upper",
md.meth = "robust",
kern.wt.max = 0.05,
kern.bound = "auto",
kern.func = "normal",
kern.scale = "auto",
gpd.cutoff = 5000L/n.perm,
seed = NULL,
verbose = TRUE,
n.cores = "auto",
fut.plan = "auto"
)Arguments
- plR.input
plRclass object; output frompolylinkR::plR_read. Required.- permute
logical; should the permutation step be performed? Defaults toTRUE. IfFALSE, only gene score deconfounding is performed without estimating set scores. IfTRUE, users must provide confounder covariates inobj.info. This is useful for exploring how parameter settings impact deconfounded gene scores; the permutation step can be performed later by passing the output back intoplR_permute.- n.perm
integer; number of permutations. Defaults to1e5. Must be in the range[5e4L, Inf)and be exactly divisible by1e4.- n.boot
integer; number of bootstrap replicates for null inference. Defaults to30L. Must be in the range[5, Inf).- alt
character; direction of the hypothesis test. Defaults to"upper"for enrichment in the upper tail (large set scores). Alternatively,"lower"tests for enrichment in the lower tail (small values). When"lower"is chosen, data are internally negated in functions performing p-value estimation.- md.meth
character; determines whether raw covariate data or ranks are used in Mahalanobis distance calculations. Defaults to"robust", where Mahalanobis distances are converted to ranks and Spearman's metric is used to calculate the covariance matrix. The alternate option"raw"uses the original covariates and Pearson's covariance for scaling.- kern.wt.max
numeric; maximum probability weight for a single gene. Defaults to0.05. Must be in the range(1 / (n.genes - 1), 1]. Set to1if no upper bound is desired. Ignored if covariate columns are not detected inobj.info.- kern.bound
numeric; flanking region around each gene where weights of overlapping genes inside the region are set to 0. Weights of partially overlapping genes are downscaled by the proportion overlapping the excluded region. Defaults to 0.1 Mbp or 0.1 cM (depending on genetic coordinates). Range(0, Inf];0denotes standard gene boundaries andInfexcludes all genes on the same chromosome. Ignored if appropriate covariate columns are not detected inobj.info.- kern.func
character; kernel function used to generate probability weights from distances between the focal gene and other genes in confounder space. Default is"normal"(Gaussian kernel). Alternate options include"exponential"and"inverse". Ignored if covariate columns are not detected inobj.info.- kern.scale
numeric; scalar used in the kernel function to convert Mahalanobis distances to probabilities. Defaults to2for Gaussian decay,log(10)for exponential decay, or2for inverse decay. Range(0, Inf]. Ignored if covariate columns are not detected inobj.info.- gpd.cutoff
numeric; threshold tail probability at which to apply GPD tail fitting. Defaults to500 / n.perm. Must be in the range[max(c(1e-04, 500 / n.perm)), 0.05]. The lower bound constraint ensures that a minimum of500exceedances are available for GPD estimation, while also ensuring compatibility with the empirical CDF, where the lowest evaluated quantile is1e-4.- seed
integer; random seed for reproducibility. Preserved across subsequent polylinkR functions (plR_permute,plR_rescale). Defaults toNULL, in which case a seed is generated automatically. Must be within[-.Machine$integer.max, .Machine$integer.max].- verbose
logical; should progress messages be printed to the console? Defaults toTRUE.- n.cores
integer; number of cores for parallel processing. Defaults to1ormaximum cores - 1. Must be in the range[1, maximum cores].- fut.plan
character; parallel backend from thefuturepackage. Defaults to usern.coreschoice or checks available cores, choosing"sequential"on single-core and"multisession"on multi-core systems. Options:"multisession","multicore","cluster","sequential".
Value
A plR S3 object containing:
obj.infodata.tablefor each gene (object).set.infodata.tablefor each gene set.set.objdata.tablemapping genes to gene sets.
Deconfounded gene scores are recorded in obj.info as
objStat.std. Set scores are recorded in set.info as
setScore.std. Results from enrichment tests are in
setScore.std.p.
All plR S3 objects include auxiliary information and summaries as
attributes:
plr.dataReusable datasets and parameters, including GPD fitting results and autocovariance estimation.
plr.argsArgument settings used in the function.
plr.sessionR session information and function run time.
plr.trackInternal tracking number indicating functional steps.
classS3 object class (i.e.,
plr).
Each attribute can be accessed using attr() or attributes().
The first four attribute classes aggregate information over successive
functions. For example, to access the plr.data attribute for the
plR object output after running plR_permute, use
attributes(X)$plR.data$permute.data, where X is the object
name. Similarly, the arguments used in plR_read are in
attributes(X)$plR.args$read.args.
The primary data structure of the plR object can be accessed using
print() or by simply typing the object's name.
Examples
if (FALSE) { # \dontrun{
# Assuming `my_plr` is the result of `polylinkR::plR_read`
# Example 1: Basic usage
new_plr <- plR_permute(plR.input = my_plr)
# Example 2: Less permutations, more bootstraps
new_plr <- plR_permute(
plR.input = my_plr,
n.perm = 1e5,
n.boot = 100
)
# Example 3: Modified covariate handling, single processor
new_plr <- plR_permute(
plR.input = my_plr,
kern.wt.max = 0.2,
md.meth = "raw",
n.cores = 1
)
# Example 4: Modified GPD estimation, user-specified seed
new_plr <- plR_permute(
plR.input = my_plr,
gpd.cutoff = 0.01,
seed = 1000
)
# Example 5: Only deconfound scores (no enrichment analysis)
new_plr <- plR_permute(
plR.input = my_plr,
permute = FALSE
)
# Example 6: Use deconfounded scores from Example 5 to run enrichment
new_plr <- plR_permute(plR.input = new_plr)
} # }
