From scientific reproducibility to software environment management
Kolen Cheung
Research Software & Analytics Group, University of Exeter
November 26th, 2025
According to Hernández and Colom (2023),
| Goodman | Claerbout | ACM |
|---|---|---|
| Repeatability | ||
| Methods reproducibility | Reproducibility | Replicability |
| Results reproducibility | Replicability | Reproducibility |
| Inferential reproducibility |
“Package manager” can refer to multiple things:
conda, pip, apt)
By scope:
By distribution method:
By platform:
By linking strategy:
From pixell/setup.py at b41248618ce92277a19a4efccadfc3b7403d67f5 · simonsobs/pixell
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""The setup script."""
from __future__ import print_function
import setuptools
from setuptools import find_packages
from distutils.errors import DistutilsError
from numpy.distutils.core import setup, Extension, build_ext, build_src
import versioneer
import os, sys
import subprocess as sp
import numpy as np
build_ext = build_ext.build_ext
build_src = build_src.build_src
compile_opts = {
#'extra_compile_args': ['-std=c99','-fopenmp', '-Wno-strict-aliasing', '-g', '-O0', '-fPIC', '-fsanitize=address', '-fsanitize=undefined'],
'extra_compile_args': ['-std=c99','-fopenmp', '-Wno-strict-aliasing', '-g', '-Ofast', '-fPIC'],
'extra_f90_compile_args': ['-fopenmp', '-Wno-conversion', '-Wno-tabs', '-fPIC'],
'f2py_options': ['skip:', 'map_border', 'calc_weights', ':'],
'extra_link_args': ['-fopenmp', '-g', '-fPIC', '-fno-lto']
}
# Set compiler options
# Windows
if sys.platform == 'win32':
raise DistutilsError('Windows is not supported.')
elif sys.platform == 'darwin' or sys.platform == 'linux':
environment = os.environ
if not 'CC' in environment:
environment["CC"] = "gcc"
if not "CXX" in environment:
environment["CXX"] = "g++"
if not "FC" in environment:
environment["FC"] = "gfortran"
# Now, try out our environment!
c_return = sp.call([environment["CC"], *compile_opts["extra_compile_args"], "scripts/omp_hello.c", "-o", "/tmp/pixell-cc-test"], env=environment)
if c_return != 0:
raise EnvironmentError(
"Your C compiler does not support the following flags, required by pixell: "
f"{' '.join(compile_opts['extra_compile_args'])}"
". Consider setting the value of environment variable CC to a known good gcc install. "
"The built-in Apple clang does not support OpenMP. Use Homebrew to install either gcc or llvm. "
f"Current value of $CC is {environment['CC']}.",
)
else:
print(f"C compiler found ({environment['CC']}) and supports OpenMP.")
cxx_return = sp.call([environment["CXX"], *compile_opts["extra_compile_args"], "scripts/omp_hello.c", "-o", "/tmp/pixell-cxx-test"], env=environment)
if cxx_return != 0:
raise EnvironmentError(
"Your CXX compiler does not support the following flags, required by pixell: "
f"{' '.join(compile_opts['extra_compile_args'])}"
". Consider setting the value of environment variable CXX to a known good gcc install. "
"The built-in Apple clang does not support OpenMP. Use Homebrew to install either gcc or llvm. "
f"Current value of $CXX is {environment['CXX']}.",
)
else:
print(f"CXX compiler found ({environment['CXX']}) and supports OpenMP.")
fc_return = sp.call([environment["FC"], *compile_opts["extra_f90_compile_args"], "scripts/omp_hello.f90", "-o", "/tmp/pixell-fc-test"], env=environment)
if fc_return != 0:
raise EnvironmentError(
"Your Fortran compiler does not support the following flags, required by pixell: "
f"{' '.join(compile_opts['extra_f90_compile_args'])}"
". Consider setting the value of environment variable FC to a known good gfortran install."
f"Current value of $FC is {environment['FC']}.",
)
else:
print(f"Fortran compiler found ({environment['FC']}) and supports OpenMP.")
# Why do we remove -fPIC here?
compile_opts['extra_link_args'] = ['-fopenmp']
else:
raise EnvironmentError("Unknown platform. Please file an issue on GitHub.")
def pip_install(package):
import pip
if hasattr(pip, 'main'):
pip.main(['install', package])
else:
pip._internal.main(['install', package])
with open('README.rst') as readme_file:
readme = readme_file.read()
with open('HISTORY.rst') as history_file:
history = history_file.read()
requirements = ['numpy>=1.20.0',
'astropy>=2.0',
'setuptools>=39',
'h5py>=2.7',
'scipy>=1.0',
'python_dateutil>=2.7',
'cython<3.0.4',
'healpy>=1.13',
'matplotlib>=2.0',
'pyyaml>=5.0',
'Pillow>=5.3.0',
'pytest-cov>=2.6',
'coveralls>=1.5',
'pytest>=4.6',
'ducc0>=0.31.0']
test_requirements = ['pip>=9.0',
'bumpversion>=0.5',
'wheel>=0.30',
'watchdog>=0.8',
'flake8>=3.5',
'coverage>=4.5',
'Sphinx>=1.7',
'twine>=1.10',
'numpy>=1.20',
'astropy>=2.0',
'setuptools>=39.2',
'h5py>=2.7,<=2.10',
'scipy>=1.0',
'python_dateutil>=2.7',
'cython<3.0.4',
'matplotlib>=2.0',
'pyyaml>=5.0',
'pytest-cov>=2.6',
'coveralls>=1.5',
'pytest>=4.6']
# Why are we doing this instead of allowing the environment to do this? We should just use -O3 and -fPIC.
fcflags = os.getenv('FCFLAGS')
if fcflags is None or fcflags.strip() == '':
fcflags = ['-O3','-fPIC']
#fcflags = ['-O0','-fPIC', '-fsanitize=address', '-fsanitize=undefined']
else:
print('User supplied fortran flags: ', fcflags)
print('These will supersede other optimization flags.')
fcflags = fcflags.split()
compile_opts['extra_f90_compile_args'].extend(fcflags)
compile_opts['extra_f77_compile_args'] = compile_opts['extra_f90_compile_args']
def presrc():
# Create f90 files for f2py.
if sp.call('make -C fortran', shell=True) != 0:
raise DistutilsError('Failure in the fortran source-prep step.')
def prebuild():
# Handle cythonization
no_cython = sp.call('cython --version',shell=True)
if no_cython:
try:
print("Cython not found. Attempting a conda install first.")
import conda.cli
conda.cli.main('conda', 'install', '-y', 'cython')
except:
try:
print("conda install of cython failed. Attempting a pip install.")
pip_install("cython")
except:
raise DistutilsError('Cython not found and all attempts at installing it failed. User intervention required.')
if sp.call('make -C cython', shell=True) != 0:
raise DistutilsError('Failure in the cython pre-build step.')
class CustomBuild(build_ext):
def run(self):
print("Running build...")
prebuild()
# Then let setuptools do its thing.
return build_ext.run(self)
class CustomSrc(build_src):
def run(self):
print("Running src...")
presrc()
# Then let setuptools do its thing.
return build_src.run(self)
class CustomEggInfo(setuptools.command.egg_info.egg_info):
def run(self):
print("Running EggInfo...")
presrc()
prebuild()
return setuptools.command.egg_info.egg_info.run(self)
# Cascade your overrides here.
cmdclass = {
'build_ext': CustomBuild,
'build_src': CustomSrc,
'egg_info': CustomEggInfo,
}
cmdclass = versioneer.get_cmdclass(cmdclass)
setup(
author="Simons Observatory Collaboration Analysis Library Task Force",
author_email='mathewsyriac@gmail.com',
classifiers=[
'Development Status :: 2 - Pre-Alpha',
'Intended Audience :: Developers',
'License :: OSI Approved :: BSD License',
'Natural Language :: English',
"Programming Language :: Python :: 2",
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
],
description="pixell",
package_dir={"pixell": "pixell"},
entry_points={
},
ext_modules=[
Extension('pixell.cmisc',
sources=['cython/cmisc.c','cython/cmisc_core.c'],
libraries=['m'],
include_dirs=[np.get_include()],
**compile_opts),
Extension('pixell.distances',
sources=['cython/distances.c','cython/distances_core.c'],
libraries=['m'],
include_dirs=[np.get_include()],
**compile_opts),
Extension('pixell.srcsim',
sources=['cython/srcsim.c','cython/srcsim_core.c'],
libraries=['m'],
include_dirs=[np.get_include()],
**compile_opts),
Extension('pixell._interpol_32',
sources=['fortran/interpol_32.f90'],
**compile_opts),
Extension('pixell._interpol_64',
sources=['fortran/interpol_64.f90'],
**compile_opts),
Extension('pixell._colorize',
sources=['fortran/colorize.f90'],
**compile_opts),
Extension('pixell._array_ops_32',
sources=['fortran/array_ops_32.f90'],
**compile_opts),
Extension('pixell._array_ops_64',
sources=['fortran/array_ops_64.f90'],
**compile_opts),
],
include_dirs = [],
library_dirs = [],
install_requires=requirements,
extras_require = {'fftw':['pyFFTW>=0.10'],'mpi':['mpi4py>=2.0']},
license="BSD license",
long_description=readme + '\n\n' + history,
package_data={'pixell': ['pixell/tests/data/*.fits','pixell/tests/data/*.dat','pixell/tests/data/*.pkl']},
include_package_data=True,
data_files=[('pixell', ['pixell/arial.ttf'])],
keywords='pixell',
name='pixell',
packages=find_packages(),
test_suite='pixell.tests',
tests_require=test_requirements,
url='https://github.com/simonsobs/pixell',
version=versioneer.get_version(),
zip_safe=False,
cmdclass=cmdclass,
scripts=['scripts/test-pixell']
)
print('\n[setup.py request was successful.]')it really sounds like your needs are so unusual compared to the larger Python community that you’re just better off building your own
From 2012 PyData Workshop Panel Discussion with Guido van Rossum. See Conda: Myths and Misconceptions | Pythonic Perambulations.
| Section | Who needs it? | Architecture? | Example |
|---|---|---|---|
| Build | The compiler machine | Build Platform (e.g., x86) | cmake, gcc, make |
| Host | The package being built (linking phase) | Target Platform (e.g., ARM64) | openssl, python, libpng |
| Run | The final user | Target Platform (e.g., ARM64) | python, requests, numpy |
Multi-platform
Linux-x86_64Linux-aarch64Linux-ppc64leMacOSX-x86_64MacOSX-arm64Windows-x86_64Language agnostic
juliaupmeta.yaml at minimum to define the packagesconda, mamba, python, etc. This is similar to the Anaconda distribution or mini-conda distribution.| Section | Who needs it? | Architecture? | Example |
|---|---|---|---|
| Build | The compiler machine | Build Platform (e.g., x86) | cmake, gcc, make |
| Host | The package being built (linking phase) | Target Platform (e.g., ARM64) | openssl, python, libpng |
| Run | The final user | Target Platform (e.g., ARM64) | python, requests, numpy |
Quoting directly from Knowledge Base | conda-forge | community-driven packaging for conda
You can switch your BLAS implementation by doing,
conda install "libblas=*=*_mkl"
conda install "libblas=*=*_openblas"
conda install "libblas=*=*_blis"
conda install "libblas=*=*_accelerate"
conda install "libblas=*=*_newaccelerate"
conda install "libblas=*=*_netlib"MPI:
pkg=*=mpi_{{ mpi }}_*pkg=*=mpi_*pkg=*=nompi_*Or even microarch! See Microarchitecture-optimized builds
Conda/mamba environment is designed to be reproducible with a different prefix (see the placeholder trick in Detailed operations — documentation)
To reproduce an environment
Create an environment (env2) as a clone of an existing environment (env1):
See more in conda/grayskull: Grayskull - Recipe generator for Conda.
{% set name = "ducc0" %}
{% set version = "0.39.1" %}
package:
name: {{ name|lower }}
version: {{ version }}
source:
url: https://pypi.org/packages/source/{{ name[0] }}/{{ name }}/ducc0-{{ version }}.tar.gz
sha256: 38eda188733d43c3602726e28bc9928d3117cdc23b5c1e7d89fdc26004a1d847
build:
number: 1
skip: true # [py<=36]
script_env: DUCC0_OPTIMIZATION=portable
script: {{ PYTHON }} -m pip install . -vv
requirements:
build:
- python # [build_platform != target_platform]
- cross-python_{{ target_platform }} # [build_platform != target_platform]
- pybind11 # [build_platform != target_platform]
- nanobind
- make
- cmake
- {{ compiler('c') }}
- {{ stdlib("c") }}
- {{ compiler('cxx') }}
host:
- pip
- pybind11
- nanobind
- python
- make
- cmake
- scikit-build
- scikit-build-core
run:
- numpy >=1.17.0
- python
test:
imports:
- ducc0
commands:
- pip check
requires:
- pip
about:
home: https://gitlab.mpcdf.mpg.de/mtr/ducc
summary: Distinctly useful code collection
license: GPL-2.0-or-later
license_file: LICENSE
extra:
recipe-maintainers:
- ickc
- MarkWieczorek
- mreineck
{% set version = "2.3.14" %}
{% set sha256 = "924912213af3bbacd622b9318bd6d79055c4d57f58c2da486f4b3f62a12466f1" %}
{% set build = 2 %}
{% if blas_impl == 'openblas' %}
{% set build = build + 100 %}
{% endif %}
{% set blas_prefix = blas_impl %}
package:
name: toast
version: {{ version }}
source:
url: https://github.com/hpc4cmb/toast/archive/{{ version }}.tar.gz
sha256: {{ sha256 }}
build:
skip: True # [py<37]
skip: True # [win]
number: {{ build }}
string: "{{ blas_prefix }}_py{{ py }}h{{ PKG_HASH }}_{{ build }}"
run_exports:
- toast * {{ blas_prefix }}_*
requirements:
build:
- {{ compiler('c') }}
- {{ compiler('cxx') }}
- cmake
- make # [unix]
- llvm-openmp >=4.0.1 # [osx]
host:
- llvm-openmp >=4.0.1 # [osx]
- python
- fftw # [blas_impl == 'openblas']
- openblas * openmp_* # [blas_impl == 'openblas']
- mkl-devel # [blas_impl == 'mkl']
- liblapack
- suitesparse
- libaatm
run:
- llvm-openmp >=4.0.1 # [osx]
- python
- {{ pin_compatible("fftw") }} # [blas_impl == 'openblas']
- openblas * openmp_* # [blas_impl == 'openblas']
- {{ pin_compatible("mkl") }} # [blas_impl == 'mkl']
- {{ pin_compatible("liblapack") }}
- {{ pin_compatible("suitesparse") }}
- {{ pin_compatible("libaatm") }}
- numpy
- scipy
- astropy
- healpy
- h5py
- ephem
test:
files:
- run_test.sh
commands:
- ./run_test.sh
about:
home: https://github.com/hpc4cmb/toast
license: BSD-2-Clause
license_family: BSD
license_file: LICENSE
summary: 'Time Ordered Astrophysics Scalable Tools'
description: |
TOAST is a software framework for simulating and processing timestream data
collected by microwave telescopes.
dev_url: https://github.com/hpc4cmb/toast
extra:
recipe-maintainers:
- tskisnerFrom envoy/conda/system_linux-aarch64.yml at main · ickc/envoy
channels:
- conda-forge
dependencies:
- bash
- bat
- bat-extras
- bottom
- btop
- bzip2
- clang-format
- coreutils
- curl
- difftastic
- diffutils
- direnv
- dua-cli
- dust
- exiftool
- fastfetch
- fd-find
- ffmpeg
- file
- findutils
- fzf
- gawk
- gh
- ghostscript
- git
- git-delta
- gnu-units
- go-shfmt
- go-task
- graphviz
- grep
- gzip
- htop
- hyperfine
- imagemagick
- inetutils
- joshuto
- jq
- juliaup
- libarchive
- lsdeluxe
- make
- mediainfo
- mosh
- nano
- nvtop
- onefetch
- openssh
- pandoc
- parallel
- patch
- pdf2svg
- pixi
- poppler
- prettier
- ripgrep
- rsync
- sed
- shellcheck
- starship
- tar
- tmux
- tokei
- tree
- unzip
- uv
- wget
- which
- zellij
- zsh
- zstd
name: systemSO:UK Data Centre example:
x86_64-v3, x86_64-v4, etc.)make).conda, pip, conda-lock, etc.), inspired by modern package managers like Cargo and npm.pixi global install (similar to pipx)From python-autojax/pixi.toml at c8a71287dd42752e95e06d3339eb44bc472c5d99 · ickc/python-autojax
[project]
authors = ["Kolen Cheung <christian.kolen@gmail.com>"]
channels = ["conda-forge"]
description = "DiRAC: revealing the nature of dark matter with the James Webb space telescope and JAX"
name = "autojax"
platforms = ["osx-arm64", "linux-64", "linux-aarch64"]
version = "0.1.0"
[tasks]
[dependencies]
python = ">=3.9"
numpy = "*"
numba = "*"
jax = "*"
# build
poetry = "*"
# extras
bump-my-version = "*"
# tests
coverage = "*"
pytest = "*"
pytest-benchmark = "*"
# docs
furo = "*"
linkify-it-py = "*"
myst-parser = "*"
sphinx = "*"
sphinx-autobuild = "*"
pygal = ">=3.0.5,<4"
defopt = ">=6.4.0,<7"
ipykernel = ">=6.29.5,<7"
[pypi-dependencies]
sphinx-last-updated-by-git = "*"
sphinxcontrib-apidoc = ">=0.5.0,<1"
autojax = { path = ".", editable = true}
[feature.cuda]
system-requirements = {cuda = "12"}
platforms = ["linux-64", "linux-aarch64"]
[feature.cuda.target.linux-64.dependencies]
jaxlib = { version = "*", build = "*cuda*" }
[environments]
cuda = ["cuda"][workspace]
channels = ["conda-forge"]
platforms = ["win-64", "linux-64", "linux-aarch64", "osx-64", "osx-arm64"]
[tasks]
# bootstrap
bootstrap-julia = { cmd = "juliaup add $JULIAUP_CHANNEL", description = "install julia version specified by JULIAUP_CHANNEL" }
# resolve
resolve = { depends-on = ["resolve-root", "resolve-library", "resolve-docs"], description = "resolve environments" }
resolve-root = { cmd = "julia --project=. -e 'using Pkg; Pkg.develop(PackageSpec(path=\"BrownianSpinDynamics\")); Pkg.resolve()'", description = "resolve root environment" }
resolve-library = { cmd = "julia --project=BrownianSpinDynamics -e 'using Pkg; Pkg.resolve()'", description = "resolve library environment" }
resolve-docs = { cmd = "julia --project=BrownianSpinDynamics/docs -e 'using Pkg; Pkg.develop(PackageSpec(path=\"BrownianSpinDynamics\")); Pkg.resolve()'", description = "resolve docs environment" }
# update
update = { depends-on = ["update-root", "update-library", "update-docs"], description = "update environments" }
update-root = { cmd = "julia --project=. -e 'using Pkg; Pkg.develop(PackageSpec(path=\"BrownianSpinDynamics\")); Pkg.update()'", description = "update root environment" }
update-library = { cmd = "julia --project=BrownianSpinDynamics -e 'using Pkg; Pkg.update()'", description = "update library environment" }
update-docs = { cmd = "julia --project=BrownianSpinDynamics/docs -e 'using Pkg; Pkg.develop(PackageSpec(path=\"BrownianSpinDynamics\")); Pkg.update()'", description = "update docs environment" }
update-precompile = { cmd = "julia --project=BrownianSpinDynamics -e 'using Pkg; Pkg.precompile()'", description = "update precompile environment" }
# precompile
precompile = { depends-on = ["precompile-root", "precompile-library", "precompile-docs"], description = "precompile environments" }
precompile-root = { cmd="julia --project=. -e 'using Pkg; Pkg.develop(PackageSpec(path=\"BrownianSpinDynamics\")); Pkg.instantiate(); Pkg.precompile()'", description = "precompile root environment" }
precompile-library = { cmd="julia --project=BrownianSpinDynamics -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'", description = "precompile library environment" }
precompile-docs = { cmd="julia --project=BrownianSpinDynamics/docs -e 'using Pkg; Pkg.develop(PackageSpec(path=\"BrownianSpinDynamics\")); Pkg.instantiate(); Pkg.precompile()'", description = "precompile docs environment" }
# test
test = { cmd = "julia --project=BrownianSpinDynamics integration_tests/runtests_all.jl", description = "run all tests" }
test-unit = { cmd = "julia --project=BrownianSpinDynamics -e 'using Pkg; Pkg.test(test_args=ARGS, allow_reresolve = false)' {{ case }}", args = [{ arg = "case", default = ""}], description = "run unit tests" }
test-integration = { cmd = "julia --project=BrownianSpinDynamics integration_tests/runtests.jl {{ case }}", args = [{ arg = "case", default = ""}], description = "run integration tests" }
# linting
lint-aqua = { cmd = "julia --project=. scripts/lint_package.jl", description = "lint the library with Aqua.jl"}
# benchmarks
bench = { cmd = "julia --project=. BrownianSpinDynamics/bench/bench.jl {{ case }}", args = [{ arg = "case", default = ""}], description = "run benchmarks" }
# format
format = { depends-on = ["pre-sync", "julia-format", "post-sync"], description = "format everything" }
pre-sync = { cmd = "jupytext --sync 'tutorials/*.ipynb'", description = "Synchronize ipynb,jl pairs using jupytext" }
post-sync = { cmd = "jupytext --sync 'tutorials/*.ipynb'", description = "Synchronize ipynb,jl pairs using jupytext" }
julia-format = { cmd = "julia -e 'using JuliaFormatter; format(\".\")'", description = "format all files using JuliaFormatter"}
format-library = { cmd = "julia -e 'using JuliaFormatter; format(\"BrownianSpinDynamics\")'", description = "format BrownianSpinDynamics using JuliaFormatter" }
# docs
docs-build = { cmd = "julia --project=BrownianSpinDynamics/docs BrownianSpinDynamics/docs/make.jl", description = "build docs" }
docs-serve = { cmd = "julia --project=BrownianSpinDynamics/docs BrownianSpinDynamics/docs/serve.jl", description = "serve docs" }
# install
install-kernel = { cmd = "julia --project=. scripts/install-julia-brownian-spin-dynamics.jl --overwrite", description = "install Jupyter kernel for BrownianSpinDynamics" }
# dev
find-version = { cmd = "scripts/find-version.sh {{ pkg }}", args = ["pkg"], description = "find version of a package from Manifest.toml" }
[dependencies]
juliaup = ">=1.17.21,<2"
jupytext = ">=1.17.2,<2"
[activation.env]
JULIA_PROJECT = "@."
JULIAUP_CHANNEL = "1.11.7"
# this put the .julia directory typically available in ~/.julia
# to the conda prefix that the pixi environment resides in
[target.unix.activation.env]
JULIA_DEPOT_PATH = "$CONDA_PREFIX/.julia"
JULIAUP_DEPOT_PATH = "$CONDA_PREFIX/.julia"
[target.win.activation.env]
JULIA_DEPOT_PATH = "%CONDA_PREFIX%\\.julia"
JULIAUP_DEPOT_PATH = "%CONDA_PREFIX%\\.julia"If we represent the lifecycle of reproducibility from source code and data to result via functions:
\(c_i = C(s_i, g_i(s_j))\): Compilation takes source code and the dependency graph to compiled binaries
\(e = G(c_i)\): environment constructed from the whole dependency Graph of all precompiled binaries
\(p_i = f_i(e, d_j)\): filters or functions that are an individual part of your scientific workflow, executing in the environment and acting on your data to produce data products.
\(r = W(e, f_i, d_j)\): a Workflow that chains all these to obtain the final result.
Then it becomes obvious that (3) is the job of the programmer, (4) is the job of the workflow manager to ensure that they are pure functions (so that it is reproducible given the same inputs.)
The remaining task (1) and (2) are the jobs of a package manager.
What if we can make them pure functions? That’s basically what a functional package manager does.
What could make it impure?
On top of these, functional package manager guarantees building softwares is a pure function. Hence it is always reproducible.
(In contrast, despite all these efforts, non-functional package managers cannot guarantee purity, hence reproducibility.)
Nix (and also Guix, another functional package manager inspired by Nix) has various levels of integration with Software Heritage to automatically mitigate against link rot.
Thompson (1984)