Installation
============

This guide covers installing L-SURF and setting up GPU acceleration for high-performance ray tracing.

Prerequisites
-------------

NVIDIA GPU (Recommended)
~~~~~~~~~~~~~~~~~~~~~~~~

For GPU-accelerated simulations, you need:

* **NVIDIA GPU** with Compute Capability 3.5+ (most GPUs from 2012 onwards)
* **NVIDIA Driver** version 450 or later
* **CUDA Toolkit** version 11.0 or later

.. note::
   L-SURF works without a GPU using CPU fallback, but simulations will be 10-100x slower.

Check Your System
~~~~~~~~~~~~~~~~~

Before installing, verify your GPU setup::

   # Check NVIDIA driver installation and GPU info
   nvidia-smi

   # Check CUDA toolkit version (if installed)
   nvcc --version

Example output from ``nvidia-smi``::

   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 535.154.05   Driver Version: 535.154.05   CUDA Version: 12.2    |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |===============================+======================+======================|
   |   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
   |  0%   45C    P8    10W / 250W |    512MiB / 11264MiB |      0%      Default |
   +-------------------------------+----------------------+----------------------+

Python Requirements
~~~~~~~~~~~~~~~~~~~

* Python >= 3.13
* conda/mamba (recommended) or pip

Dependencies
------------

.. list-table::
   :header-rows: 1
   :widths: 20 40 40

   * - Category
     - Packages
     - Notes
   * - **Core**
     - numpy >= 1.24, matplotlib >= 3.7, pydantic >= 2.0
     - Required for all functionality
   * - **GPU**
     - numba >= 0.58, CUDA Toolkit >= 11.0
     - Required for GPU acceleration
   * - **Optional**
     - h5py >= 3.8, astropy-healpix >= 1.0
     - HDF5 support, spherical analysis
   * - **Development**
     - pytest, black, ruff, mypy, pre-commit
     - For development and testing

Installation Methods
--------------------

Recommended: Conda Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This method handles all dependencies automatically::

   # 1. Clone the repository
   git clone https://github.com/your-org/lsurf.git
   cd lsurf

   # 2. Create the conda environment
   conda env create -f environment.yml

   # 3. Activate the environment
   conda activate lsurf

   # 4. Verify installation
   python -c "import lsurf; print('L-SURF installed successfully')"

Alternative: pip
~~~~~~~~~~~~~~~~

If you prefer pip without conda::

   git clone https://github.com/your-org/lsurf.git
   cd lsurf
   pip install -e ".[dev]"

GPU Setup
---------

L-SURF uses `Numba <https://numba.pydata.org/>`_ for GPU acceleration via CUDA.
There are two approaches to set up CUDA:

Option 1: System CUDA (Recommended)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Install NVIDIA drivers and CUDA toolkit at the system level. This is the most reliable method.

**Ubuntu/Debian:**

.. code-block:: bash

   # Install NVIDIA driver
   sudo apt update
   sudo apt install nvidia-driver-535  # Use latest available version

   # Install CUDA toolkit
   # Download from: https://developer.nvidia.com/cuda-downloads
   # Or use the package manager:
   sudo apt install nvidia-cuda-toolkit

**Fedora:**

.. code-block:: bash

   # Install NVIDIA driver (RPM Fusion required)
   sudo dnf install akmod-nvidia

   # Install CUDA toolkit
   sudo dnf install cuda

**Arch Linux:**

.. code-block:: bash

   sudo pacman -S nvidia cuda

**After installation**, add CUDA to your PATH (add to ``~/.bashrc``)::

   export CUDA_HOME=/usr/local/cuda
   export PATH=$CUDA_HOME/bin:$PATH
   export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Option 2: Conda CUDA
~~~~~~~~~~~~~~~~~~~~

Install CUDA toolkit via conda-forge (useful for isolated environments)::

   conda activate lsurf
   conda install -c conda-forge cudatoolkit=12.0

.. warning::
   Conda CUDA requires matching versions between cudatoolkit and your NVIDIA driver.
   Check compatibility at `CUDA Toolkit Release Notes <https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html>`_.

Verifying GPU Setup
~~~~~~~~~~~~~~~~~~~

After installation, verify GPU acceleration works:

.. code-block:: python

   from numba import cuda

   # Check CUDA availability
   print(f"CUDA available: {cuda.is_available()}")

   if cuda.is_available():
       gpu = cuda.get_current_device()
       print(f"GPU: {gpu.name}")
       print(f"Compute Capability: {gpu.compute_capability}")
       print(f"Total Memory: {gpu.total_memory / 1e9:.1f} GB")

       # Test a simple kernel
       @cuda.jit
       def test_kernel(arr):
           i = cuda.grid(1)
           if i < arr.size:
               arr[i] *= 2

       import numpy as np
       test_arr = cuda.to_device(np.ones(1000, dtype=np.float32))
       test_kernel[10, 100](test_arr)
       print("GPU kernel test: PASSED")
   else:
       print("GPU not available - will use CPU fallback")

Run a Quick Benchmark
~~~~~~~~~~~~~~~~~~~~~

Test GPU performance with a simple ray tracing simulation:

.. code-block:: python

   import lsurf as sr
   import time

   # Create a simple surface and source
   surface = sr.create_planar_surface(
       point=(0, 0, 0),
       normal=(0, 0, 1),
   )

   source = sr.CollimatedBeam(
       center=(0, 0, 1),
       direction=(0, 0, -1),
       radius=0.1,
       num_rays=100000,
       wavelength=532e-9,
   )

   rays = source.generate()

   # Time the intersection calculation
   start = time.perf_counter()
   distances, hit_mask = surface.intersect(rays.positions, rays.directions)
   elapsed = time.perf_counter() - start

   print(f"Intersected {rays.num_rays:,} rays in {elapsed*1000:.1f} ms")
   print(f"Throughput: {rays.num_rays/elapsed/1e6:.1f} million rays/second")

Typical performance:

* **GPU (RTX 3080)**: ~500 million rays/second
* **CPU (8-core)**: ~5 million rays/second

Troubleshooting
---------------

CUDA Not Available
~~~~~~~~~~~~~~~~~~

If ``cuda.is_available()`` returns ``False``:

1. **Check NVIDIA driver**::

      nvidia-smi

   If this fails, install/reinstall NVIDIA drivers.

2. **Check CUDA toolkit**::

      nvcc --version

   If this fails, install CUDA toolkit or set ``CUDA_HOME``.

3. **Verify GPU compute capability**:

   Your GPU must have Compute Capability >= 3.5. Check at
   `CUDA GPUs <https://developer.nvidia.com/cuda-gpus>`_.

4. **Reinstall numba**::

      pip install --upgrade --force-reinstall numba

Conda Package Not Found
~~~~~~~~~~~~~~~~~~~~~~~

If you see ``PackagesNotFoundError``::

   # Update conda
   conda update -n base conda

   # Clear cache and retry
   conda clean --all
   conda env create -f environment.yml

Out of GPU Memory
~~~~~~~~~~~~~~~~~

If you see ``CudaAPIError: Out of memory``:

1. Reduce ``num_rays`` in your simulation
2. Use batched processing for large simulations
3. Close other GPU applications
4. Check memory usage with ``nvidia-smi``

.. code-block:: python

   # Process rays in batches
   batch_size = 100000
   for i in range(0, total_rays, batch_size):
       batch = rays[i:i+batch_size]
       # Process batch...

Slow Performance
~~~~~~~~~~~~~~~~

If simulations are slower than expected:

1. **Verify GPU is being used**:

   .. code-block:: python

      from numba import cuda
      print(cuda.is_available())  # Should be True

2. **Monitor GPU utilization** during simulation::

      watch -n 0.5 nvidia-smi

3. **Increase ray count** - GPUs perform better with more parallel work::

      # Too few rays - GPU underutilized
      source = sr.CollimatedBeam(num_rays=1000)  # Bad

      # Better GPU utilization
      source = sr.CollimatedBeam(num_rays=100000)  # Good

4. **Check for Numba warnings** about suboptimal grid sizes.

Import Errors
~~~~~~~~~~~~~

If you get ``ModuleNotFoundError: No module named 'lsurf'``::

   # Verify installation
   pip show lsurf

   # Reinstall if needed
   pip install -e ".[dev]"

Platform-Specific Notes
-----------------------

Windows
~~~~~~~

* Install NVIDIA drivers from `NVIDIA Driver Downloads <https://www.nvidia.com/Download/index.aspx>`_
* Install CUDA Toolkit from `CUDA Downloads <https://developer.nvidia.com/cuda-downloads>`_
* Use Anaconda/Miniconda for Python environment management

macOS
~~~~~

* CUDA is **not supported** on macOS (Apple Silicon or Intel)
* L-SURF will use CPU-only mode automatically
* Performance will be limited compared to NVIDIA GPU systems

WSL2 (Windows Subsystem for Linux)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GPU passthrough works with WSL2:

1. Install latest NVIDIA Windows driver (supports WSL2 GPU)
2. **Do not** install CUDA inside WSL2 - it uses Windows driver
3. Install L-SURF normally inside WSL2

::

   # Inside WSL2
   nvidia-smi  # Should show your Windows GPU
   conda env create -f environment.yml
   conda activate lsurf