Installation
============
This guide covers installing L-SURF and setting up GPU acceleration for high-performance ray tracing.
Prerequisites
-------------
NVIDIA GPU (Recommended)
~~~~~~~~~~~~~~~~~~~~~~~~
For GPU-accelerated simulations, you need:
* **NVIDIA GPU** with Compute Capability 3.5+ (most GPUs from 2012 onwards)
* **NVIDIA Driver** version 450 or later
* **CUDA Toolkit** version 11.0 or later
.. note::
L-SURF works without a GPU using CPU fallback, but simulations will be 10-100x slower.
Check Your System
~~~~~~~~~~~~~~~~~
Before installing, verify your GPU setup::
# Check NVIDIA driver installation and GPU info
nvidia-smi
# Check CUDA toolkit version (if installed)
nvcc --version
Example output from ``nvidia-smi``::
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 0% 45C P8 10W / 250W | 512MiB / 11264MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Python Requirements
~~~~~~~~~~~~~~~~~~~
* Python >= 3.13
* conda/mamba (recommended) or pip
Dependencies
------------
.. list-table::
:header-rows: 1
:widths: 20 40 40
* - Category
- Packages
- Notes
* - **Core**
- numpy >= 1.24, matplotlib >= 3.7, pydantic >= 2.0
- Required for all functionality
* - **GPU**
- numba >= 0.58, CUDA Toolkit >= 11.0
- Required for GPU acceleration
* - **Optional**
- h5py >= 3.8, astropy-healpix >= 1.0
- HDF5 support, spherical analysis
* - **Development**
- pytest, black, ruff, mypy, pre-commit
- For development and testing
Installation Methods
--------------------
Recommended: Conda Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This method handles all dependencies automatically::
# 1. Clone the repository
git clone https://github.com/your-org/lsurf.git
cd lsurf
# 2. Create the conda environment
conda env create -f environment.yml
# 3. Activate the environment
conda activate lsurf
# 4. Verify installation
python -c "import lsurf; print('L-SURF installed successfully')"
Alternative: pip
~~~~~~~~~~~~~~~~
If you prefer pip without conda::
git clone https://github.com/your-org/lsurf.git
cd lsurf
pip install -e ".[dev]"
GPU Setup
---------
L-SURF uses `Numba `_ for GPU acceleration via CUDA.
There are two approaches to set up CUDA:
Option 1: System CUDA (Recommended)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Install NVIDIA drivers and CUDA toolkit at the system level. This is the most reliable method.
**Ubuntu/Debian:**
.. code-block:: bash
# Install NVIDIA driver
sudo apt update
sudo apt install nvidia-driver-535 # Use latest available version
# Install CUDA toolkit
# Download from: https://developer.nvidia.com/cuda-downloads
# Or use the package manager:
sudo apt install nvidia-cuda-toolkit
**Fedora:**
.. code-block:: bash
# Install NVIDIA driver (RPM Fusion required)
sudo dnf install akmod-nvidia
# Install CUDA toolkit
sudo dnf install cuda
**Arch Linux:**
.. code-block:: bash
sudo pacman -S nvidia cuda
**After installation**, add CUDA to your PATH (add to ``~/.bashrc``)::
export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
Option 2: Conda CUDA
~~~~~~~~~~~~~~~~~~~~
Install CUDA toolkit via conda-forge (useful for isolated environments)::
conda activate lsurf
conda install -c conda-forge cudatoolkit=12.0
.. warning::
Conda CUDA requires matching versions between cudatoolkit and your NVIDIA driver.
Check compatibility at `CUDA Toolkit Release Notes `_.
Verifying GPU Setup
~~~~~~~~~~~~~~~~~~~
After installation, verify GPU acceleration works:
.. code-block:: python
from numba import cuda
# Check CUDA availability
print(f"CUDA available: {cuda.is_available()}")
if cuda.is_available():
gpu = cuda.get_current_device()
print(f"GPU: {gpu.name}")
print(f"Compute Capability: {gpu.compute_capability}")
print(f"Total Memory: {gpu.total_memory / 1e9:.1f} GB")
# Test a simple kernel
@cuda.jit
def test_kernel(arr):
i = cuda.grid(1)
if i < arr.size:
arr[i] *= 2
import numpy as np
test_arr = cuda.to_device(np.ones(1000, dtype=np.float32))
test_kernel[10, 100](test_arr)
print("GPU kernel test: PASSED")
else:
print("GPU not available - will use CPU fallback")
Run a Quick Benchmark
~~~~~~~~~~~~~~~~~~~~~
Test GPU performance with a simple ray tracing simulation:
.. code-block:: python
import lsurf as sr
import time
# Create a simple surface and source
surface = sr.create_planar_surface(
point=(0, 0, 0),
normal=(0, 0, 1),
)
source = sr.CollimatedBeam(
center=(0, 0, 1),
direction=(0, 0, -1),
radius=0.1,
num_rays=100000,
wavelength=532e-9,
)
rays = source.generate()
# Time the intersection calculation
start = time.perf_counter()
distances, hit_mask = surface.intersect(rays.positions, rays.directions)
elapsed = time.perf_counter() - start
print(f"Intersected {rays.num_rays:,} rays in {elapsed*1000:.1f} ms")
print(f"Throughput: {rays.num_rays/elapsed/1e6:.1f} million rays/second")
Typical performance:
* **GPU (RTX 3080)**: ~500 million rays/second
* **CPU (8-core)**: ~5 million rays/second
Troubleshooting
---------------
CUDA Not Available
~~~~~~~~~~~~~~~~~~
If ``cuda.is_available()`` returns ``False``:
1. **Check NVIDIA driver**::
nvidia-smi
If this fails, install/reinstall NVIDIA drivers.
2. **Check CUDA toolkit**::
nvcc --version
If this fails, install CUDA toolkit or set ``CUDA_HOME``.
3. **Verify GPU compute capability**:
Your GPU must have Compute Capability >= 3.5. Check at
`CUDA GPUs `_.
4. **Reinstall numba**::
pip install --upgrade --force-reinstall numba
Conda Package Not Found
~~~~~~~~~~~~~~~~~~~~~~~
If you see ``PackagesNotFoundError``::
# Update conda
conda update -n base conda
# Clear cache and retry
conda clean --all
conda env create -f environment.yml
Out of GPU Memory
~~~~~~~~~~~~~~~~~
If you see ``CudaAPIError: Out of memory``:
1. Reduce ``num_rays`` in your simulation
2. Use batched processing for large simulations
3. Close other GPU applications
4. Check memory usage with ``nvidia-smi``
.. code-block:: python
# Process rays in batches
batch_size = 100000
for i in range(0, total_rays, batch_size):
batch = rays[i:i+batch_size]
# Process batch...
Slow Performance
~~~~~~~~~~~~~~~~
If simulations are slower than expected:
1. **Verify GPU is being used**:
.. code-block:: python
from numba import cuda
print(cuda.is_available()) # Should be True
2. **Monitor GPU utilization** during simulation::
watch -n 0.5 nvidia-smi
3. **Increase ray count** - GPUs perform better with more parallel work::
# Too few rays - GPU underutilized
source = sr.CollimatedBeam(num_rays=1000) # Bad
# Better GPU utilization
source = sr.CollimatedBeam(num_rays=100000) # Good
4. **Check for Numba warnings** about suboptimal grid sizes.
Import Errors
~~~~~~~~~~~~~
If you get ``ModuleNotFoundError: No module named 'lsurf'``::
# Verify installation
pip show lsurf
# Reinstall if needed
pip install -e ".[dev]"
Platform-Specific Notes
-----------------------
Windows
~~~~~~~
* Install NVIDIA drivers from `NVIDIA Driver Downloads `_
* Install CUDA Toolkit from `CUDA Downloads `_
* Use Anaconda/Miniconda for Python environment management
macOS
~~~~~
* CUDA is **not supported** on macOS (Apple Silicon or Intel)
* L-SURF will use CPU-only mode automatically
* Performance will be limited compared to NVIDIA GPU systems
WSL2 (Windows Subsystem for Linux)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GPU passthrough works with WSL2:
1. Install latest NVIDIA Windows driver (supports WSL2 GPU)
2. **Do not** install CUDA inside WSL2 - it uses Windows driver
3. Install L-SURF normally inside WSL2
::
# Inside WSL2
nvidia-smi # Should show your Windows GPU
conda env create -f environment.yml
conda activate lsurf