PyPI Conda Docs Build Status Coverage

anndata - Annotated Data

Report issues and see the code on GitHub.

AnnData provides a scalable way of keeping track of data and learned annotations. It was initially built for Scanpy (Genome Biology, 2018).


Upcoming changes:

0.7 January 22, 2020


Breaking changes introduced between 0.6.22.post1 and 0.7:

  • Elements of AnnDatas don’t have their dimensionality reduced when the main object is subset. This is to maintain consistency when subsetting. See discussion in #145.

  • Internal modules like anndata.core are private and their contents are not stable: See #174.

  • The old deprecated attributes .smp*. .add and .data have been removed.

Currently broken features

  • sc.pp.normalize_per_cell doesn’t work on dask arrays. It just doesn’t modify the matrix.

View overhaul – PR #164
  • Indexing into a view no longer keeps a reference to intermediate view, see #62.

  • Views are now lazy. Elements of view of AnnData are not indexed until they’re accessed.

  • Indexing with scalars no longer reduces dimensionality of contained arrays, see #145.

  • All elements of AnnData should now follow the same rules about how they’re subset, see #145.

  • Can now index by observations and variables at the same time.

IO overhaul – PR #167
  • Reading and writing has been overhauled for simplification and speed.

    • Time and memory usage can be half of previous in typical use cases

  • Zarr backend now supports sparse arrays, and generally is closer to having the same features as HDF5.

  • Backed mode should see significant speed and memory improvements for access along compressed dimensions and IO. PR #241.

  • Categoricals can now be ordered (PR #230) and written to disk with a large number of categories (PR #217).

Mapping attributes overhaul (obsm, varm, layers, …)
  • New attributes obsp and varp have been added for two dimensional arrays where each axis corresponds to a single axis of the AnnData object. PR #207.

    • These are intended to store values like cell-by-cell graphs, which are currently stored in uns.

  • Sparse arrays are now allowed as values in all mapping attributes.

  • DataFrames are now allowed as values in obsm and varm.

  • All mapping attributes now share an implementation and will have the same behaviour. PR #164.

Miscellaneous improvements
  • Mapping attributes now have ipython tab completion (e.g. adata.obsm["\t can provide suggestions) PR #183.

  • AnnData attributes are now delete-able (e.g. del adata.raw) PR #242.

  • Many many bug fixes

Versions 0.6.*

  • better support for aligned mappings (obsm, varm, layers) 0.6.22 #155 thanks to I Virshup

  • convenience accesors obs_vector(), var_vector() for 1d arrays. 0.6.21 #144 thanks to I Virshup

  • compatibility with Scipy >=1.3 by removing IndexMixin dependency. 0.6.20 #151 thanks to P Angerer

  • bug fix for second-indexing into views. 0.6.19 0ab553f thanks to P Angerer

  • bug fix for reading excel files. 0.6.19 90bea2c thanks to A Wolf

  • changed default compression to None in write_h5ad() to speed up read and write, disk space use is usually less critical. 0.6.16 21d8033 thanks to A Wolf

  • maintain dtype upon copy. 0.6.13 534bea4 thanks to A Wolf

  • layers inspired by .loom files allows their information lossless reading via read_loom(). #46 & #48 thanks to S Rybakov

  • support for reading zarr files: read_zarr() 0.6.7 #38 thanks to T White

  • initialization from pandas DataFrames 0.6. 648bcc8 thanks to A Wolf

  • iteration over chunks chunked_X() and chunk_X() 0.6.1 #20 thanks to S Rybakov

Version 0.6 May 1, 2018

  • compatibility with Seurat converter

  • tremendous speedup for concatenate()

  • bug fix for deep copy of unstructured annotation after slicing

  • bug fix for reading HDF5 stored single-category annotations

  • “outer join” concatenation: adds zeros for concatenation of sparse data and nans for dense data

  • better memory efficiency in loom exports

Version 0.5 February 9, 2018

Version 0.4 December 23, 2017

  • read/write .loom files

  • scalability beyond dataset sizes that fit into memory: see this blog post

  • AnnData has a raw attribute, which simplifies storing the data matrix when you consider it raw: see the clustering tutorial