| Title: | Flexible simulation of paired-insertion counts for single-cell ATAC-sequencing data |
|---|---|
| Description: | simPIC is a package for simulating single-cell ATAC-seq count data. It provides a user-friendly, well documented interface for data simulation. Functions are provided for parameter estimation, realistic scATAC-seq data simulation, and comparing real and simulated datasets. |
| Authors: | Sagrika Chugh [aut, cre] (ORCID: <https://orcid.org/0000-0002-8050-5214>), Heejung Shim [aut], Davis McCarthy [aut] |
| Maintainer: | Sagrika Chugh <[email protected]> |
| License: | GPL-3 |
| Version: | 1.8.0 |
| Built: | 2026-06-06 06:08:30 UTC |
| Source: | https://github.com/bioc/simPIC |
Add additional feature statistics to a SingleCellExperiment object
addFeatureStats( sce, value = "counts", log = FALSE, offset = 1, no.zeros = FALSE )addFeatureStats( sce, value = "counts", log = FALSE, offset = 1, no.zeros = FALSE )
sce |
SingleCellExperiment to add feature statistics to. |
value |
the count value to calculate statistics. |
log |
logical. Whether to take log2 before calculating statistics. |
offset |
offset to add to avoid taking log of zero. |
no.zeros |
logical. Whether to remove all zeros from each feature before calculating statistics. |
Currently adds the following statistics: mean and variance. Statistics
are added to the rowData slot and are named
Stat[Log]Value[No0] where Log and No0 are added if
those arguments are true.
SingleCellExperiment with additional feature statistics
This function converts a sparse matrix into a SingleCellExperiment(SCE) object.
convert_to_SCE(sparse_data)convert_to_SCE(sparse_data)
sparse_data |
A sparse matrix containing count data, where rows are peaks and columns represent cells. |
A SingleCellExperiment(SCE) object with the sparse matrix stored in the "counts" assay.
Get counts matrix from a SingleCellExperiment object. If counts is missing a warning is issued and the first assay is returned.
getCounts(sce)getCounts(sce)
sce |
SingleCellExperiment object |
counts matrix
simPIC: Simulate single-cell ATAC-seq data
globalvariables
Create a newsimPICcount object to store parameters.
newsimPICcount(...)newsimPICcount(...)
... |
Variables to set newsimPICcount object parameters. |
This function creates the object variable which is passed in all functions.
new object from class simPICcount.
object <- newsimPICcount()object <- newsimPICcount()
This function defines a custom theme for ggplot2 to ensure consistent visual appearance across multiple plots.
plot_theme()plot_theme()
A ggplot2 theme object with predefined settings.
Bind the rows of two data frames, keeping only the columns that are common to both.
rbindMatched(df1, df2)rbindMatched(df1, df2)
df1 |
first data.frame to bind. |
df2 |
second data.frame to bind. |
data.frame containing rows from df1 and df2 but
only common columns.
Trying two fitting methods and selecting the best one.
selectFit(data, distr, verbose = TRUE)selectFit(data, distr, verbose = TRUE)
data |
The data to fit. |
distr |
Name of the distribution to fit. |
verbose |
logical. To print messages or not. |
The distribution is fitted to the data using each of the
fitdist fitting methods. The fit with the
smallest Cramer-von Mises statistic is selected.
The selected fit object
Set input parameters of the simPICcount object.
setsimPICparameters(object, update = NULL, ...)setsimPICparameters(object, update = NULL, ...)
object |
input simPICcount object. |
update |
new parameters. |
... |
set new parameters for simPICcount object. |
simPICcount object with updated parameters.
object <- newsimPICcount() object <- setsimPICparameters(object, nCells = 200, nPeaks = 500)object <- newsimPICcount() object <- setsimPICparameters(object, nCells = 200, nPeaks = 500)
Combine data from several SingleCellExperiment objects and produce some basic plots comparing them.
simPICcompare( sces, point.size = 0.2, point.alpha = 0.1, fits = TRUE, colours = NULL )simPICcompare( sces, point.size = 0.2, point.alpha = 0.1, fits = TRUE, colours = NULL )
sces |
named list of SingleCellExperiment objects to combine and compare. |
point.size |
size of points in scatter plots. |
point.alpha |
opacity of points in scatter plots. |
fits |
whether to include fits in scatter plots. |
colours |
vector of colours to use for each dataset. |
The returned list has three items:
RowDataCombined row data from the provided SingleCellExperiments.
ColDataCombined column data from the provided SingleCellExperiments.
PlotsComparison plots
MeansBoxplot of mean distribution.
VariancesBoxplot of variance distribution.
MeanVarScatter plot with fitted lines showing the mean-variance relationship.
LibrarySizesBoxplot of the library size distribution.
ZerosPeakBoxplot of the percentage of each peak that is zero.
ZerosCellBoxplot of the percentage of each cell that is zero.
MeanZerosScatter plot with fitted lines showing the mean-zeros relationship.
The plots returned by this function are created using
ggplot and are only a sample of the kind of plots
you might like to consider. The data used to create these plots is also
returned and should be in the correct format to allow you to create
further plots using ggplot.
List containing the combined datasets and plots.
sim1 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "weibull", seed = 7856 ) sim2 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "gamma", seed = 4234 ) comparison <- simPICcompare(list(weibull = sim1, gamma = sim2)) names(comparison) names(comparison$Plots)sim1 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "weibull", seed = 7856 ) sim2 <- simPICsimulate( nPeaks = 1000, nCells = 500, pm.distr = "gamma", seed = 4234 ) comparison <- simPICcompare(list(weibull = sim1, gamma = sim2)) names(comparison) names(comparison$Plots)
S4 class that holds parameters for simPIC simulation.
a simPIC class object.
The parameters not shown in brackets can be estimated from real data
using simPICestimate. For details of the simPIC simulation
see simPICsimulate. The default parameters are based on PBMC10k
dataset and can be reproduced using test data and script provided in
inst/script
simPIC simulation parameters:
nPeaksThe number of peaks to simulate.
nCellsThe number of cells to simulate.
[seed]Seed to use for generating random numbers.
[default]The logical variable whether to use default parameters (TRUE) or learn from data (FALSE)
lib.size.meanlogmeanlog (location) parameter for the library size log-normal distribution.
lib.size.sdlogsdlog (scale) parameter for the library size log-normal distribution.
mean.scalescale parameter for the mean weibull distribution.
mean.shapeshape parameter for the mean weibull distribution.
sparsityprobability of openness to be multiplied to the input of poisson distribution to generate final simulated matrix.
Parameters are estimated using the estimateDisp function
in the edgeR package.
simPICEstBCV(counts, object, verbose)simPICEstBCV(counts, object, verbose)
counts |
counts matrix to estimate parameters from. |
object |
simPICcount object to store estimated values in. |
verbose |
logical. To print progress messages or not. |
The estimateDisp function is used to estimate the common
dispersion and prior degrees of freedom. See
estimateDisp for details. When estimating parameters on
simulated data we found a broadly linear relationship between the true
underlying common dispersion and the edgR estimate, therefore we
apply a small correction, disp = -0.3 + 0.15 * edgeR.disp.
simPICcount object with estimated values.
Estimate simulation parameters for library size, peak means, and sparsity for simPIC simulation from a real peak by cell input matrix
simPICestimate( counts, object = newsimPICcount(), pm.distr = c("gamma", "weibull", "pareto", "lngamma"), method = c("single", "groups"), verbose = TRUE ) ## S3 method for class 'SingleCellExperiment' simPICestimate( counts, object = newsimPICcount(), pm.distr = "weibull", method = "single", verbose = TRUE ) ## S3 method for class 'dgCMatrix' simPICestimate( counts, object = newsimPICcount(), pm.distr = "weibull", method = "single", verbose = TRUE )simPICestimate( counts, object = newsimPICcount(), pm.distr = c("gamma", "weibull", "pareto", "lngamma"), method = c("single", "groups"), verbose = TRUE ) ## S3 method for class 'SingleCellExperiment' simPICestimate( counts, object = newsimPICcount(), pm.distr = "weibull", method = "single", verbose = TRUE ) ## S3 method for class 'dgCMatrix' simPICestimate( counts, object = newsimPICcount(), pm.distr = "weibull", method = "single", verbose = TRUE )
counts |
either a sparse peak by cell count matrix, or a SingleCellExperiment object containing count data to estimate parameters. |
object |
simPICcount object to store estimated parameters and counts. |
pm.distr |
statistical distribution for estimating peak mean parameters. Available distributions: gamma, weibull, lngamma, pareto. Default is weibull. |
method |
to use for simulation. Single for simulating one cell-type or groups for simulating distinct cell-types. |
verbose |
logical variable. Prints the simulation progress if TRUE. |
simPICcount object containing all estimated parameters.
counts <- readRDS(system.file("extdata", "test.rds", package = "simPIC")) est <- newsimPICcount() est <- simPICestimate(counts, pm.distr = "weibull")counts <- readRDS(system.file("extdata", "test.rds", package = "simPIC")) est <- newsimPICcount() est <- simPICestimate(counts, pm.distr = "weibull")
Estimate the library size parameters for simPIC simulation.
simPICestimateLibSize(counts, object, verbose)simPICestimateLibSize(counts, object, verbose)
counts |
count matrix. |
object |
simPICcount object to store estimated values. |
verbose |
logical. To print messages or not. |
Parameters for the lognormal distribution are estimated by fitting the
library sizes using fitdist. All the fitting
methods are tried and the fit with the best Cramer-von Mises statistic is
selected.
simPICcount object with estimated library size parameters.
Estimate peak mean parameters for simPIC simulation
simPICestimatePeakMean(norm.counts, object, pm.distr, verbose)simPICestimatePeakMean(norm.counts, object, pm.distr, verbose)
norm.counts |
library size normalised counts matrix. |
object |
simPICcount object to store estimated values. |
pm.distr |
distribution parameter for peak means. |
verbose |
logical. To print progress messages or not. |
Parameters for gamma distribution are estimated by fitting the mean
normalised counts using fitdist.
All the fitting methods are tried and the fit with the best Cramer-von
Mises statistic is selected.
simPICcount object containing all estimated parameters
This function estimates the sparsity of cells based on a normalized counts matrix and updates the parameters of a simPIC object accordingly.
simPICestimateSparsity(norm.counts, object, verbose)simPICestimateSparsity(norm.counts, object, verbose)
norm.counts |
A normalized count matrix to estimate parameters from. |
object |
simPICcount object to store estimated parameters. |
verbose |
logical. To print messages or not. |
simPICcount object with updated sparsity parameter.
Get the value of a single variable from input simPICcount object.
simPICget(object, name)simPICget(object, name)
object |
input simPICcount object. |
name |
name of the parameter. |
Value of the input parameter.
object <- newsimPICcount() nPeaks <- simPICget(object, "nPeaks")object <- newsimPICcount() nPeaks <- simPICget(object, "nPeaks")
Get multiple parameter values from a simPIC object.
simPICgetparameters(object, names)simPICgetparameters(object, names)
object |
input object to get values from. |
names |
vector of names of the parameters to get. |
List with the values of the selected parameters.
object <- newsimPICcount() simPICgetparameters(object, c("nPeaks", "nCells", "peak.mean.shape"))object <- newsimPICcount() simPICgetparameters(object, c("nPeaks", "nCells", "peak.mean.shape"))
Simulate a peak by cell matrix given the mean accessibility for each peak in each cell. Cells start with the mean accessibility for the group they belong to (when simulating groups). The selected means are adjusted for each cell's expected library size.
simPICsimSingleCellMeans(object, sim) simPICsimulateGroupCellMeans(object, sim)simPICsimSingleCellMeans(object, sim) simPICsimulateGroupCellMeans(object, sim)
object |
simPIC object with simulation parameters. |
sim |
SingleCellExperiment to add cell means to. |
SingleCellExperiment with added cell means.
Simulate peak by cell count matrix from a sparse single-cell ATAC-seq peak by cell input using simPIC methods.
simPICsimulate( object = newsimPICcount(), pm.distr = "weibull", method = c("single", "groups"), verbose = TRUE, ... ) simPICsimulatesingle(object = newsimPICcount(), verbose = TRUE, ...) simPICsimulatemulti( object = newsimPICcount(), pm.distr = "weibull", method = c("groups"), verbose = TRUE, ... )simPICsimulate( object = newsimPICcount(), pm.distr = "weibull", method = c("single", "groups"), verbose = TRUE, ... ) simPICsimulatesingle(object = newsimPICcount(), verbose = TRUE, ...) simPICsimulatemulti( object = newsimPICcount(), pm.distr = "weibull", method = c("groups"), verbose = TRUE, ... )
object |
simPICcount object with simulation parameters.
See |
pm.distr |
distribution parameter for peak means. Available distributions: gamma, weibull, lngamma, pareto. Default is weibull. |
method |
to use for simulation. Single for simulating one cell-type or groups for simulating distinct cell-types. |
verbose |
logical variable. Prints the simulation progress if TRUE. |
... |
Any additional parameter settings to override what is provided
in |
simPIC provides the option to manually adjust each of the
simPICcount object parameters by calling
setsimPICparameters.
The simulation involves following steps:
Set up simulation parameters
Set up SingleCellExperiment object
Simulate library sizes
Simulate sparsity
Simulate peak means
Create final synthetic counts
The final output is a
SingleCellExperiment object that
contains the simulated count matrix. The parameters are stored in the
colData (for cell specific information),
rowData (for peak specific information) or
assays (for peak by cell matrix) slots. This additional
information includes:
SingleCellExperiment object containing the simulated counts.
# default simulation sim <- simPICsimulate(pm.distr = "weibull")# default simulation sim <- simPICsimulate(pm.distr = "weibull")
Simulate means for each peak in each cell that are adjusted to follow a mean-variance trend using Biological Coefficient of Variation taken from and inverse gamma distribution.
simPICsimulateBCVMeans(object, sim)simPICsimulateBCVMeans(object, sim)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment to add BCV means to. |
SingleCellExperiment with simulated BCV means.
Generate library sizes for cells in simPIC simulation based on the estimated values of mus and sigmas.
simPICsimulateLibSize(object, sim, verbose)simPICsimulateLibSize(object, sim, verbose)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
verbose |
logical. To print progress messages. |
SingleCellExperiment object with simulated library sizes.
Simulate differential accessibility. Differential accessibility factors for each
group are produced using getLNormFactors and these are added
along with updated means for each group. For paths care is taken to make sure
they are simulated in the correct order.
simPICsimulatemultiDA(object, sim)simPICsimulatemultiDA(object, sim)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment to add differential accessibility to. |
SingleCellExperiment with simulated differential accessibility.
Generate peak means for cells in simPIC simulation based on the estimated values of shape and rate parameters.
simPICsimulatePeakMean(object, sim, pm.distr, verbose)simPICsimulatePeakMean(object, sim, pm.distr, verbose)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
pm.distr |
distribution parameter for peak means. Available distributions: gamma, weibull, lngamma, pareto. Default is weibull. |
verbose |
logical. Whether to print progress messages. |
SingleCellExperiment object with simulated peak means.
Counts are simulated from a poisson distribution where each peak has a mean, expected library size and proportion of accessible chromatin.
simPICsimulateTrueCounts(object, sim)simPICsimulateTrueCounts(object, sim)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
SingleCellExperiment object with simulated true counts.
Counts are simulated from a poisson distribution where each peak has a mean, expected library size and proportion of accessible chromatin.
simPICsimulateTrueCountsGroups(object, sim)simPICsimulateTrueCountsGroups(object, sim)
object |
simPICcount object with simulation parameters. |
sim |
SingleCellExperiment object containing simulation parameters. |
SingleCellExperiment object with simulated true counts.