NEWS
Rarr 2.1
Breaking changes
- The name and configuration options for the fixed-length-ascii (|S in Zarr v2) and fixed-length-ucs4 (U in Zarr v2) data types have been updated to null_terminated_bytes and fixed_length_utf32 respectively to match their newly specified format in Zarr v3.
Rarr 1.99
Breaking changes
- The DelayedArray backend (
writeZarrArray() and ZarrArray() functions)
has been migrated to a separate, dedicated package.
This reduces the number of dependencies from 37 to 24.
This also greatly improves performance in for the standard case (when the
DelayedArray backend is not used).
write_zarr_array() now writes Zarr v3 by default. Writing Zarr v2 is still
possible by explicitly setting the argument zarr_version = 2.
New features
- Zarr v3 arrays with data types and codecs that already existed in v2
can now be read via
read_zarr_array(), and written via write_zarr_array().
- Zarr v3 consolidated metadata is now returned by
zarr_overview(), the
same way it was already previously done for v2 consolidated metadata.
- More data types are available when writing Zarr arrays:
- boolean / logical
- int8
- int16
- int64 (up to values that can be represented as R integers)
- uint8
- uint16
- uint32 (up to values that can be represented as R integers)
- uint64 (up to values that can be represented as R integers)
- float32 / single
- Scalar arrays (i.e., arrays with zero dimensions) can now be read.
Thanks to Artür Manukyan for the bug report.
- Zarr attributes can now be read by passing an s3 URL directly as
the first argument of
read_zarr_attributes(). This makes
read_zarr_attributes() consistent with read_zarr_array() and
zarr_overview().
- "Simple" structured data types (i.e., only one level of nesting and
no arrays) can now be read from Zarr v2 arrays.
simplifyVector = FALSE is added to fromJSON in read_zarr_attributes(),
thus attributes of both local and s3 zarr stores are read identically.
- The
dimension_names optional field is support in both v2 (not strictly
part of the spec) and v3. It is mapped to names(dimnames(.)) in R.
NA_real_ is now an allowed fill value in write_zarr_array() when
writing numeric arrays, following a request from Hervé Pagès.
- Fill values stored as their byte representation are now understood
when reading Zarr arrays.
write_zarr_array() now supports writing NA_character_, which means
it is possible to preserve NAs when roundtriping an R character
array, based on a request from Hervé Pagès.
Minor improvements
- There is now a dedicated vignette describing the supported Zarr features
in Rarr, available at
https://huber-group-embl.github.io/Rarr/articles/features.html.
This makes it more easily discoverable on the Bioconductor landing page.
- Rarr initializes empty/missing chunks only once per read operation, which
significantly improves performance when reading arrays with many missing chunks.
- Reading fixed-length string and unicode arrays is now ~20% faster.
- The
shape and chunks fields in v2 metadata are now always encoded as
JSON arrays, even when they contain a single element. This makes Rarr more
compatible with other Zarr implementations. Thanks to Artür Manukyan for the
bug report and pull request.
- Empty zarr arrays (i.e., arrays with
shape and chunks equal zero)
can now be written.
- Compression for writing Zarr arrays now default to zstd rather than zlib.
zstd achieves similar or better compression levels while being much faster
at compressing (= writing Zarr arrays) and decompressing (= reading Zarr
arrays). This matches the default used by Zarr Python implementation.
write_zarr_array() now fails early with an explicit error message when
x is not an array.
Bug fixes
- Rarr is now fully compatible with big endian platforms.
- ZSTD decompression now also works in case where we cannot guess a priori the buffer
size from the data type, such as when using variable length strings.
Thanks to Artür Manukyan for the bug report and test data.
zarr_overview() no longer fails on consolidated metadata containing uncompressed
arrays. This was introduced in https://github.com/Huber-group-EMBL/Rarr/pull/45.
Thanks to Sharla Gelfand for reporting the issue and providing test data.
- the
fill_value is now correctly interpreted when reading Zarr v2 string or
unicode arrays. This is visible for example when trying to read missing chunks
from such arrays. Thanks to Artür Manukyan for the bug report.
Internal changes
- Some internal changes are preparing the transition to support Zarr v3:
- "C" and "F" fill orders are now handled via a codec mechanism, which also
supports a wider range of transpose operations.
- The endian configuration is now handled via a codec.
- A GitHub Actions workflow has been added to occasionally test this package on
a big endian platform.
- Bundled libraries have been updated:
- blosc 1.20.1 -> 1.21.6
- snappy 1.1.1 -> 1.2.2
- zstd 1.5.5 -> 1.5.7
- lz4 1.9.2 -> 1.10.0
- Resizable vector in C code for compression now uses the official exported R C
API, instead of internal R functions.
- The
const qualifier is used where appropriate in the C code.
Rarr 1.9
New features
- New functions to work with Zarr attributes have been added:
read_zarr_attributes() reads Zarr v2 and v3 attributes
write_zarr_attributes() only supports writing Zarr v2 attributes for now.
- This package now has a pkgdown website, available at
https://huber-group-embl.github.io/Rarr/.
- Zarr v3 arrays are now supported for reading metadata via
zarr_overview().
Breaking changes
zarr_overview(as_data_frame = TRUE) now returns information more in line with the
output of zarr_overview(as_data_frame = FALSE). In particular:
- a new
endianness column has been added to indicate the byte order of the
array data.
- the
nchunks column is now a list column specifying the number of chunks in
each dimension, rather than a single integer giving the total number of
chunks.
Minor improvements
- An explicit error message is now given when attempting to read a Zarr array
version 3. This version will be supported in a future release of Rarr.
Bug fixes
.url_parse_other() now accounts for port numbers in host name and colons in
S3 buckets.
writeZarrArray() now allows writing character arrays, and no longer errors
complaining about null 'nchar' argument value. Default of 'nchar' is now
NULL.
writeZarrArray() no longer silently and incorrectly fills the last
rows/columns when dim is not divisible by chunk_dim.
- The object name is no longer repeated (e.g.,
name.zarrname.zarr) when
writing a Zarr array to a file in the current working directory.
- Invalid URLs for examples with S3 storage in
read_zarr_array() and
zarr_overview() have been updated.
read_zarr_array() no longer errors on arrays with numeric values other than
float, int, uint and complex.
zarr_overview() now returns an explicit error message when the .zarray file
is absent
Internal changes
- Coding style throughout the package has been harmonized using the air tool.
Contributors using RStudio, Positron or VS Code should have their code styled
automatically on save.
- Continuous integration checks have been made stricter by setting
biocCheck()
error level to "error" rather than "never", and R CMD check error level to
"warning" rather than "error".
- Static analysis via the lintr package is now performed on each push and PR.
It should mostly be invisible to users but might result in slightly increased
performance in some cases.
- The superseded httr dependency has been replaced with the lighter curl
package, thus reducing the total number of dependencies for the package from
42 to 40.
- The unused stringr dependency has been removed, reducing the total number of
dependencies for the package from 40 to 38.
- A minor PROTECT()/UNPROTECT() imbalance in the C code, exposed by rchk, has
been fixed. It is not likely to cause problems in real-world situations but
it could theoretically lead to crashes in some cases.
- Argument
path in internal function read_array_metadata() has been renamed
to zarr_path for consistency with other internal functions
- Some internal functions have been renamed with a leading dot, in line with
the officially recommended style for Bioconductor packages.
- This package now uses testthat instead of tinytest as a testing framework.
This comes with more utilities to handle snapshot tests and mocked tests.
- Function calls are now counted in tests to ensure we don't repeatedly perform
a task (in particular, an expensive I/O task) more often than necessary.
Rarr 1.7
- Added
path() method for ZarrArray class that returns the location of the
zarr array root.
- Removed used of non-API call
SETLENGTH in C code.
- Small changes to compilation of internal blosc libraries to cope with
the C23 compiler becoming the default in R-4.5.0
Rarr 1.5
- Fixed bug when creating an empty array with a floating datatype. The fill
value would be interpreted as an integer by
read_metadata() and create
and array of the wrong type.
- Fixed bug in
update_zarr_array() when NULL was provided to one or more
dimensions in the index argument. This was parsed incorrectly and the
underlying zarr was not modified.
- Fixed bug in reading 64-bit integer arrays compressed with ZLIB or LZ4.
The calculated decompression buffer size was too small and reading would
fail. (Thanks to Dan Auerbach for the report:
https://github.com/grimbough/Rarr/issues/10)
- Added support for the ZarrArray S4 class and the DelayedArray framework.
- Improvements to read and write performance.
Rarr 1.3
- Added support for using the zstd compression library for reading and writing.
Rarr 1.1
- Fixed bug when reading an array if the fill value in
.zarray was null.
- Addressed bug in makevars where Rarr.so could be compiled before libblosc.a
was ready. Also backported to Rarr 1.0.2.
(Thanks to Michael Sumner for reporting this issue:
https://github.com/grimbough/Rarr/issues/5)
- Corrected issue where fixed length string datatypes would be written with
null terminators, resulting in strings that were one byte longer than the
dtype value written in the
.zarray metadata. Also backported to Rarr 1.0.3.
- Added support for reading and writing the fixed length Unicode datatype, and
for reading variable length UTF-8 datatype.
Rarr 0.99.9
- Response it initial package review (thanks @Kayla-Morrell)
- Provided manual page examples for use_* compression filter functions.
- Add details of how example data in inst/extdata/zarr_examples was created.
- General code tidying
Rarr 0.99.8
- Patch compression libraries to remove R CMD check warnings about C functions
that might crash R or write to something other than the R console. Working
in Linux only.
Rarr 0.99.7
- Allow reading and writing chunks with GZIP compression.
- Add compression level arguments to several compression tools.
Rarr 0.99.6
- Allow reading and writing chunks with no compression.
- Enable LZ4 compression for writing.
- Fix bug in blosc compression that could result in larger chunks than necessary.
- Improve speed of indexing when combining chunks into the final output array.
Rarr 0.99.5
- Fixed bug when specifying nested chunks, where the chunk couldn't be written
unless the directory already existed.
Rarr 0.99.4
- When writing chunks that overlap the array edge, even the undefined overhang
region should be written to disk.
Rarr 0.99.3
- Allow choice between column and row ordering when creating a Zarr array
Rarr 0.99.2
- Catch bug when chunk files contain values outside the array extent.
- Add manual page issues identified by BBS
Rarr 0.99.1
- Switch from aws.s3 to paws.storage for S3 data retrieval.
Rarr 0.99.0
- Initial Bioconductor submission.
Rarr 0.0.1
- Added a
NEWS.md file to track changes to the package.