Installation ============ This guide covers installing L-SURF and setting up GPU acceleration for high-performance ray tracing. Prerequisites ------------- NVIDIA GPU (Recommended) ~~~~~~~~~~~~~~~~~~~~~~~~ For GPU-accelerated simulations, you need: * **NVIDIA GPU** with Compute Capability 3.5+ (most GPUs from 2012 onwards) * **NVIDIA Driver** version 450 or later * **CUDA Toolkit** version 11.0 or later .. note:: L-SURF works without a GPU using CPU fallback, but simulations will be 10-100x slower. Check Your System ~~~~~~~~~~~~~~~~~ Before installing, verify your GPU setup:: # Check NVIDIA driver installation and GPU info nvidia-smi # Check CUDA toolkit version (if installed) nvcc --version Example output from ``nvidia-smi``:: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A | | 0% 45C P8 10W / 250W | 512MiB / 11264MiB | 0% Default | +-------------------------------+----------------------+----------------------+ Python Requirements ~~~~~~~~~~~~~~~~~~~ * Python >= 3.13 * conda/mamba (recommended) or pip Dependencies ------------ .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Category - Packages - Notes * - **Core** - numpy >= 1.24, matplotlib >= 3.7, pydantic >= 2.0 - Required for all functionality * - **GPU** - numba >= 0.58, CUDA Toolkit >= 11.0 - Required for GPU acceleration * - **Optional** - h5py >= 3.8, astropy-healpix >= 1.0 - HDF5 support, spherical analysis * - **Development** - pytest, black, ruff, mypy, pre-commit - For development and testing Installation Methods -------------------- Recommended: Conda Environment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This method handles all dependencies automatically:: # 1. Clone the repository git clone https://github.com/your-org/lsurf.git cd lsurf # 2. Create the conda environment conda env create -f environment.yml # 3. Activate the environment conda activate lsurf # 4. Verify installation python -c "import lsurf; print('L-SURF installed successfully')" Alternative: pip ~~~~~~~~~~~~~~~~ If you prefer pip without conda:: git clone https://github.com/your-org/lsurf.git cd lsurf pip install -e ".[dev]" GPU Setup --------- L-SURF uses `Numba `_ for GPU acceleration via CUDA. There are two approaches to set up CUDA: Option 1: System CUDA (Recommended) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Install NVIDIA drivers and CUDA toolkit at the system level. This is the most reliable method. **Ubuntu/Debian:** .. code-block:: bash # Install NVIDIA driver sudo apt update sudo apt install nvidia-driver-535 # Use latest available version # Install CUDA toolkit # Download from: https://developer.nvidia.com/cuda-downloads # Or use the package manager: sudo apt install nvidia-cuda-toolkit **Fedora:** .. code-block:: bash # Install NVIDIA driver (RPM Fusion required) sudo dnf install akmod-nvidia # Install CUDA toolkit sudo dnf install cuda **Arch Linux:** .. code-block:: bash sudo pacman -S nvidia cuda **After installation**, add CUDA to your PATH (add to ``~/.bashrc``):: export CUDA_HOME=/usr/local/cuda export PATH=$CUDA_HOME/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH Option 2: Conda CUDA ~~~~~~~~~~~~~~~~~~~~ Install CUDA toolkit via conda-forge (useful for isolated environments):: conda activate lsurf conda install -c conda-forge cudatoolkit=12.0 .. warning:: Conda CUDA requires matching versions between cudatoolkit and your NVIDIA driver. Check compatibility at `CUDA Toolkit Release Notes `_. Verifying GPU Setup ~~~~~~~~~~~~~~~~~~~ After installation, verify GPU acceleration works: .. code-block:: python from numba import cuda # Check CUDA availability print(f"CUDA available: {cuda.is_available()}") if cuda.is_available(): gpu = cuda.get_current_device() print(f"GPU: {gpu.name}") print(f"Compute Capability: {gpu.compute_capability}") print(f"Total Memory: {gpu.total_memory / 1e9:.1f} GB") # Test a simple kernel @cuda.jit def test_kernel(arr): i = cuda.grid(1) if i < arr.size: arr[i] *= 2 import numpy as np test_arr = cuda.to_device(np.ones(1000, dtype=np.float32)) test_kernel[10, 100](test_arr) print("GPU kernel test: PASSED") else: print("GPU not available - will use CPU fallback") Run a Quick Benchmark ~~~~~~~~~~~~~~~~~~~~~ Test GPU performance with a simple ray tracing simulation: .. code-block:: python import lsurf as sr import time # Create a simple surface and source surface = sr.create_planar_surface( point=(0, 0, 0), normal=(0, 0, 1), ) source = sr.CollimatedBeam( center=(0, 0, 1), direction=(0, 0, -1), radius=0.1, num_rays=100000, wavelength=532e-9, ) rays = source.generate() # Time the intersection calculation start = time.perf_counter() distances, hit_mask = surface.intersect(rays.positions, rays.directions) elapsed = time.perf_counter() - start print(f"Intersected {rays.num_rays:,} rays in {elapsed*1000:.1f} ms") print(f"Throughput: {rays.num_rays/elapsed/1e6:.1f} million rays/second") Typical performance: * **GPU (RTX 3080)**: ~500 million rays/second * **CPU (8-core)**: ~5 million rays/second Troubleshooting --------------- CUDA Not Available ~~~~~~~~~~~~~~~~~~ If ``cuda.is_available()`` returns ``False``: 1. **Check NVIDIA driver**:: nvidia-smi If this fails, install/reinstall NVIDIA drivers. 2. **Check CUDA toolkit**:: nvcc --version If this fails, install CUDA toolkit or set ``CUDA_HOME``. 3. **Verify GPU compute capability**: Your GPU must have Compute Capability >= 3.5. Check at `CUDA GPUs `_. 4. **Reinstall numba**:: pip install --upgrade --force-reinstall numba Conda Package Not Found ~~~~~~~~~~~~~~~~~~~~~~~ If you see ``PackagesNotFoundError``:: # Update conda conda update -n base conda # Clear cache and retry conda clean --all conda env create -f environment.yml Out of GPU Memory ~~~~~~~~~~~~~~~~~ If you see ``CudaAPIError: Out of memory``: 1. Reduce ``num_rays`` in your simulation 2. Use batched processing for large simulations 3. Close other GPU applications 4. Check memory usage with ``nvidia-smi`` .. code-block:: python # Process rays in batches batch_size = 100000 for i in range(0, total_rays, batch_size): batch = rays[i:i+batch_size] # Process batch... Slow Performance ~~~~~~~~~~~~~~~~ If simulations are slower than expected: 1. **Verify GPU is being used**: .. code-block:: python from numba import cuda print(cuda.is_available()) # Should be True 2. **Monitor GPU utilization** during simulation:: watch -n 0.5 nvidia-smi 3. **Increase ray count** - GPUs perform better with more parallel work:: # Too few rays - GPU underutilized source = sr.CollimatedBeam(num_rays=1000) # Bad # Better GPU utilization source = sr.CollimatedBeam(num_rays=100000) # Good 4. **Check for Numba warnings** about suboptimal grid sizes. Import Errors ~~~~~~~~~~~~~ If you get ``ModuleNotFoundError: No module named 'lsurf'``:: # Verify installation pip show lsurf # Reinstall if needed pip install -e ".[dev]" Platform-Specific Notes ----------------------- Windows ~~~~~~~ * Install NVIDIA drivers from `NVIDIA Driver Downloads `_ * Install CUDA Toolkit from `CUDA Downloads `_ * Use Anaconda/Miniconda for Python environment management macOS ~~~~~ * CUDA is **not supported** on macOS (Apple Silicon or Intel) * L-SURF will use CPU-only mode automatically * Performance will be limited compared to NVIDIA GPU systems WSL2 (Windows Subsystem for Linux) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ GPU passthrough works with WSL2: 1. Install latest NVIDIA Windows driver (supports WSL2 GPU) 2. **Do not** install CUDA inside WSL2 - it uses Windows driver 3. Install L-SURF normally inside WSL2 :: # Inside WSL2 nvidia-smi # Should show your Windows GPU conda env create -f environment.yml conda activate lsurf