https://github.com/Jong-Min-Moon/FShybridPLS

FSHybridPLS is an R package designed to perform Partial Least Squares (PLS) regression on “hybrid” predictors. A hybrid predictor is a single mathematical object containing both functional data (curves, time-series represented as fd objects) and scalar covariates (standard numeric matrices).

This package defines a joint Hilbert space $\mathcal{H} = \mathcal{F} \times \mathbb{R}^p$ and implements the arithmetic and algorithms necessary to perform penalized PLS directly on this space. This package provides an R implementation and simulation replications of my paper, “Hybrid Partial Least Squares Regression with Multiple Functional and Scalar Predictors,” co-authored with Professor Jeong Hoon Jang of the University of Texas Medical Branch.

Installation

This package is generated using litr, a literate programming tool developed by Jacob Bien and Patrick Vossler (https://jacobbien.github.io/litr-project/). I used litr to build the package and you don’t have to install litr just to use the package. To install the package:

devtools::install("your_downloaded_directory/FSHybridPLS")

Table of Contents

Helper Functions

compute_gram_matrix

Description: Calculates the Gram matrix for a given functional basis. The Gram matrix $G$ is a symmetric positive semi-definite matrix containing the inner products of the basis functions. Specifically, the entry $(i, j)$ corresponds to the integral of the product of the $i$-th and $j$-th basis functions over the domain: \(G_{ij} = \langle b_i, b_j \rangle = \int b_i(t) b_j(t) \, dt\)

Inputs:

Output:

#' Compute the Gram matrix for a basis object
#'
#' This function calculates the inner product of a basis with itself to form
#' the Gram matrix.
#'
#' @param basis A basis object of class `basisfd` from the `fda` package.
#' @return A square matrix where entry (i,j) is the inner product of basis functions i and j.
#' @export
compute_gram_matrix <- function(basis) {
  # Ensure the input is a valid basis object from the fda package
  stopifnot(inherits(basis, "basisfd"))
  
  # Compute the inner product of the basis with itself
  # This calculates the integral of b_i(t) * b_j(t) using the metric defined on the basis
  fda::inprod(basis, basis) 
}

is_same_basis

Description: Checks if two composite functional data objects (represented as lists containing functional data) share the exact same basis functions. It verifies that both lists have the same length and that the basis for every corresponding element is identical.

Inputs:

Output:

#' Check if two functional data lists share the same basis
#'
#' @description
#' Checks if two composite functional data objects share the exact same basis functions
#' by comparing their internal list structures.
#'
#' @param input A list containing functional data objects (expected to have a `$functional_list` structure).
#' @param other Another list containing functional data objects to compare against.
#' @return `TRUE` if both lists have the same length and every corresponding basis is equivalent; otherwise `FALSE`.
#' @export
is_same_basis <- function(input, other) {
  all(
    # Check if both lists have the same number of functional elements
    length(input$functional_list) == length(other$functional_list),
    
    # Check if the basis for each corresponding element is identical
    all(
      mapply(function(fd1, fd2) {
        # Use fda package function to check basis equality
        fda::is.eqbasis(fd1$basis, fd2$basis)
      }, input$functional_list, other$functional_list)
    )
  )
}

are_all_gram_matrices_identical

Description: Iterates through a list of matrices to verify if they are all numerically identical to the first matrix in the list. This function handles floating-point inaccuracies by using a tolerance parameter.

Inputs:

Output:

#' Check if all matrices in a list are identical (with tolerance for numeric comparison)
#'
#' This function iterates through a list of matrices and checks if all subsequent
#' matrices are numerically identical to the first matrix in the list.
#' It uses `all.equal` for robust comparison of numeric matrices, which accounts
#' for small floating-point differences.
#'
#' @param gram_list A list of matrices.
#' @param tolerance A numeric tolerance for comparing floating-point numbers.
#'                  Defaults to `sqrt(.Machine$double.eps)`.
#' @return A logical value: `TRUE` if all matrices are identical, `FALSE` otherwise.
#' @export
are_all_gram_matrices_identical <- function(gram_list, tolerance = sqrt(.Machine$double.eps)) {
  # Handle edge cases: empty list or list with a single matrix
  if (length(gram_list) == 0) {
    message("The input list is empty. Returning TRUE as there are no differences.")
    return(TRUE)
  }
  if (length(gram_list) == 1) {
    message("The input list contains only one matrix. Returning TRUE.")
    return(TRUE)
  }
  
  # Get the first matrix as the reference
  first_matrix <- gram_list[[1]]
  
  # Ensure the first element is a matrix
  if (!is.matrix(first_matrix)) {
    stop("The first element of 'gram_list' is not a matrix.")
  }
  
  # Iterate from the second matrix onwards and compare with the first
  for (i in 2:length(gram_list)) {
    current_matrix <- gram_list[[i]]
    
    # Ensure current element is a matrix
    if (!is.matrix(current_matrix)) {
      stop(paste("Element", i, "of 'gram_list' is not a matrix."))
    }
    
    # Compare the current matrix with the first matrix
    # all.equal returns TRUE if identical, or a character string describing differences
    # We convert the result to a logical TRUE/FALSE
    if (!isTRUE(all.equal(first_matrix, current_matrix, tolerance = tolerance))) {
      # If not equal, return FALSE immediately
      message(paste("Matrices at index 1 and", i, "are not identical."))
      return(FALSE)
    }
  }
  
  # If the loop completes, all matrices are identical
  return(TRUE)
}

rep_fd

Description: Takes a list of functional data objects (where each object represents a single sample/curve) and broadcasts them by replicating their coefficients. This creates new functional objects containing $n$ identical copies of the original curve.

Inputs:

Output:

#' Replicate a list of single-sample fd objects multiple times
#'
#' This function broadcasts a list of single-sample functional predictors
#' by replicating their coefficient columns.
#'
#' @param fd_list A list of `fd` objects, each representing a single sample.
#' @param n The number of replications desired.
#'
#' @return A list of `fd` objects, each with `n` replications.
#' @export
rep_fd <- function(fd_list, n) {
  # Validate input: must be a list of 'fd' objects
  if (!is.list(fd_list) || any(!vapply(fd_list, inherits, logical(1), "fd"))) {
    stop("Input must be a list of 'fd' objects.")
  }
  
  lapply(fd_list, function(fd_obj) {
    coef_mat <- fd_obj$coefs
    
    # Ensure the object represents a single sample (one column of coefficients)
    if (is.null(dim(coef_mat)) || ncol(coef_mat) != 1) {
      stop("Each fd object in the list must have one column of coefficients.")
    }
    
    # Create a new coefficient matrix by repeating the column 'n' times
    new_coefs <- matrix(rep(coef_mat, n), nrow = nrow(coef_mat), ncol = n)
    
    # Return a new fd object with the replicated coefficients and original basis
    fd(coef = new_coefs, basisobj = fd_obj$basis)
  })
}

Hybrid predictor class

Class definition and constructor

predictor_hybrid

Description: The predictor_hybrid class represents a hybrid random object $\mathbf{W} = (X, \mathbf{Z})$ that combines functional and scalar covariates into a unified Hilbert space structure.

Inputs:

Output:

```{r Create a predictor_hybrid object} #’ Create a predictor_hybrid object #’ #’ @param Z Numeric matrix (n_sample x n_scalar). #’ @param functional_list List of ‘fd’ objects. #’ @param eval_point Evaluation points. #’ @param penalty_order Integer. Derivative order for roughness penalty (default 2). #’ #’ @export predictor_hybrid <- function(Z, functional_list, eval_point, penalty_order = 2) { stopifnot(is.matrix(Z), is.numeric(Z), is.list(functional_list))

n_sample <- nrow(Z) n_functional <- length(functional_list)

gram_list <- vector(“list”, n_functional) gram_deriv_list <- vector(“list”, n_functional) n_basis_list <- numeric(n_functional)

for (i in seq_len(n_functional)) { fd_i <- functional_list[[i]] basis_i <- fd_i$basis

# 1. Validation
if (ncol(fd_i$coefs) != n_sample) {
  stop(sprintf("Functional predictor %d sample count mismatch.", i))
}

# 2. Pre-compute Gram Matrix (Inner product of basis functions)
# This is the "J" matrix: J_ij = <phi_i, phi_j>
gram_list[[i]] <- fda::eval.penalty(basis_i, Lfdobj = 0)

# 3. Pre-compute Penalty Matrix
# This is the "R" matrix: R_ij = <D^2 phi_i, D^2 phi_j>
gram_deriv_list[[i]] <- fda::eval.penalty(basis_i, Lfdobj = penalty_order)

n_basis_list[i] <- basis_i$nbasis   }

structure( list( Z = Z, functional_list = functional_list, gram_list = gram_list, # J matrices gram_deriv_list = gram_deriv_list, # R matrices eval_point = eval_point, n_basis_list = n_basis_list, n_sample = n_sample, n_functional = n_functional, n_scalar = ncol(Z) ), class = “predictor_hybrid” ) }




### predictor_hybrid_from_coef

**Description:**
An alternative constructor that creates a single-sample `predictor_hybrid` object from a flat numeric coefficient vector. It reconstructs both the functional and scalar components by mapping the coefficients back onto the basis functions defined in a template `format` object.

**Important:** This function strictly assumes that all functional predictors in the `format` object share the same number of basis functions (). It verifies this assumption before proceeding.

**Inputs:**

- `format`: A `predictor_hybrid` object serving as a template. It must contain the necessary basis information (`functional_list`) and structural metadata (`n_scalar`, `n_functional`).
- `coef`: A numeric vector of length , where:
-  is the number of functional predictors.
-  is the number of basis functions (strictly enforced to be identical across all functional predictors).
-  is the number of scalar predictors.



**Output:**

- Returns a `predictor_hybrid` object representing a single sample (`n_sample = 1`), containing:
- Functional predictors rebuilt from the first  coefficients.
- A scalar predictor matrix `Z` derived from the remaining  coefficients.

```{r}
#' Construct a Single-Sample Predictor Hybrid Object from Coefficients
#'
#' Reconstructs a \code{predictor_hybrid} object representing one observation, using a numeric
#' coefficient vector. This alternative constructor maps the coefficients back into their functional
#' and scalar predictor representations based on the structure of a template \code{predictor_hybrid} object.
#'
#' @param format A \code{predictor_hybrid} object that provides the structure and basis information.
#' @param coef A numeric vector containing coefficients for both functional and scalar predictors.
#'
#' @return A \code{predictor_hybrid} object with updated \code{functional_list}, \code{Z}, and \code{n_sample = 1}.
#' @export
predictor_hybrid_from_coef <- function(format, coef) {
  # Extract metadata from the template
  basis_counts <- format$n_basis_list
  
  # Check if all functional predictors have the same number of basis functions
  if (length(unique(basis_counts)) != 1) {
    stop("All functional predictors must have the same number of basis functions (M).")
  }
  
  M <- basis_counts[1]  # Number of basis functions (identical across predictors)
  K <- format$n_functional # Number of functional predictors
  
  # Reconstruct each functional predictor
  for (ii in 1:K) {
    # Calculate indices for the current functional predictor in the flat vector
    start_idx <- (ii - 1) * M + 1
    end_idx   <- ii * M
    
    # Create a new fd object using the sliced coefficients and original basis
    format$functional_list[[ii]] <- fd(
      coef = as.matrix(coef[start_idx:end_idx]),
      basisobj = format$functional_list[[ii]]$basis
    )
  }
  
  # Reconstruct the scalar predictor part (Z)
  # The remaining coefficients after K*M belong to the scalar predictors
  scalar_start_idx <- K * M + 1
  format$Z <- t(as.matrix(coef[scalar_start_idx:length(coef)]))
  
  # Update sample count to 1 as this constructor creates a single observation
  format$n_sample <- 1
  
  return(format)
}

Basic arithmetic

add.predictor_hybrid

Description: Performs element-wise addition of two predictor_hybrid objects. Functional predictors are combined using plus.fd() and times.fd() from the fda package.

Usage:

add.predictor_hybrid(input, other, alpha = 1)

Arguments:

Value:

Details:

#' Add two predictor_hybrid objects
#'
#' Performs element-wise addition of two \code{predictor_hybrid} objects. 
#' Supports broadcasting if one object is a single sample.
#'
#' @param xi_1 A \code{predictor_hybrid} object.
#' @param xi_2 Another \code{predictor_hybrid} object to be added.
#' @param alpha A scalar multiplier applied to \code{xi_2} before addition (default is 1).
#'
#' @return A new \code{predictor_hybrid} object representing the result of the addition.
#' @export
add.predictor_hybrid <- function(xi_1, xi_2, alpha = 1) {
  # Safe access to is.eqbasis() from the fda namespace
  is_eqbasis <- getFromNamespace("is.eqbasis", "fda")
  
  # Type checks
  if (!inherits(xi_1, "predictor_hybrid") || !inherits(xi_2, "predictor_hybrid")) {
    stop("Both inputs must be of class 'predictor_hybrid'.")
  }
  
  # Structural checks: Number of functional and scalar predictors must match
  if (xi_1$n_functional != xi_2$n_functional) {
    stop("Mismatch in number of functional predictors.")
  }
  if (xi_1$n_scalar != xi_2$n_scalar) {
    stop("Mismatch in number of scalar predictors.")
  }
  
  # Basis checks: Ensure corresponding functional predictors share the same basis
  for (i in seq_len(xi_1$n_functional)) {
    if (!is_eqbasis(xi_1$functional_list[[i]]$basis, xi_2$functional_list[[i]]$basis)) {
      stop("Functional predictors must have the same basis.")
    }
  }
  
  # Broadcasting Logic:
  # Ensure xi_1 is always the larger object (or equal) to simplify broadcasting
  if (xi_1$n_sample == 1 && xi_2$n_sample > 1) {
    tmp <- xi_1
    xi_1 <- xi_2
    xi_2 <- tmp
  }
  
  n1 <- xi_1$n_sample
  n2 <- xi_2$n_sample
  
  # Validate sample compatibility
  if (!(n1 == n2 || n2 == 1)) {
    stop("Sample sizes are incompatible for broadcasting.")
  }
  
  # Prepare components
  f1 <- xi_1$functional_list
  f2 <- xi_2$functional_list
  Z1 <- xi_1$Z
  Z2 <- xi_2$Z
  
  # Replicate fd and Z if broadcasting is needed (n2 == 1)
  if (n2 == 1 && n1 > 1) {
    f2 <- rep_fd(f2, n1)
    Z2 <- matrix(rep(c(Z2), n1), nrow = n1, byrow = TRUE)
  }
  
  # Combine functional predictors: f1 + alpha * f2
  new_functional_list <- Map(
    function(fd1, fd2) plus.fd(fd1, times.fd(alpha, fd2)),
    f1,
    f2
  )
  
  # Combine scalar predictors: Z1 + alpha * Z2
  new_Z <- Z1 + alpha * Z2
  
  # Construct and return the new predictor_hybrid object
  # Note: eval_point is carried over from the first object
  predictor_hybrid(Z = new_Z, functional_list = new_functional_list, eval_point = xi_1$eval_point)
}

subtr.predictor_hybrid

Description: Performs element-wise subtraction of two predictor_hybrid objects. Internally uses add.predictor_hybrid to compute the result by negating the second operand.

Usage:

subtr.predictor_hybrid(input, other, alpha = 1)

Arguments:

Value:

Details:

#' Subtract two predictor_hybrid objects
#'
#' Performs element-wise subtraction of two `predictor_hybrid` objects.
#' Internally uses `add.predictor_hybrid(input, other, alpha = -1)`.
#'
#' @param input A `predictor_hybrid` object.
#' @param other Another `predictor_hybrid` object to subtract.
#' @param alpha A scalar multiplier applied to `other` before subtraction (default is 1).
#'
#' @return A new `predictor_hybrid` object representing the result of subtraction.
#' @export
subtr.predictor_hybrid <- function(input, other, alpha = 1) {
  
  add.predictor_hybrid(input, other, alpha = -alpha)
}

scalar_mul.predictor_hybrid

Description: Multiplies all components of a predictor_hybrid object by a scalar. Functional components are scaled using times.fd() from the fda package, and scalar predictors are multiplied directly using matrix operations.

Usage:

scalar_mul.predictor_hybrid(input, scalar)

Arguments:

Value:

Details:

#' Multiply a predictor_hybrid object by a scalar
#'
#' Performs scalar multiplication on both the scalar and functional components
#' of a `predictor_hybrid` object. Functional predictors are scaled using
#' `times.fd()` from the `fda` package.
#'
#' @param input A `predictor_hybrid` object.
#' @param scalar A numeric value to multiply all components by.
#'
#' @return A new `predictor_hybrid` object scaled by `scalar`.
#' @export
scalar_mul.predictor_hybrid <- function(input, scalar) {
  if (!inherits(input, "predictor_hybrid")) {
    stop("Input must be of class 'predictor_hybrid'.")
  }
  if (!is.numeric(scalar) || length(scalar) != 1) {
    stop("Scalar must be a single numeric value.")
  }

  # Scale functional components
  new_functional_list <- lapply(input$functional_list, function(fd_obj) {
    times.fd(scalar, fd_obj)
  })

  # Scale scalar predictors
  new_Z <- scalar * input$Z

  # Reconstruct hybrid object
  # Passing eval_point from the original input to maintain metadata
  predictor_hybrid(
    Z = new_Z,
    functional_list = new_functional_list,
    eval_point = input$eval_point
  )
}

inprod.predictor_hybrid

Description: Computes the inner product between two predictor_hybrid objects. The inner product in the hybrid space is defined as the sum of the inner products of the functional components in and the scalar components in :

Inputs:

Output:

Details:

inprod_pen.predictor_hybrid

Description: Computes the penalized inner product between two predictor_hybrid objects. This adds a roughness penalty term to the standard inner product, typically used for regularization.

Inputs:

Output:

subset_predictor_hybrid

Description: Extracts a specific observation from a multi-sample predictor_hybrid object and returns it as a new, single-sample predictor_hybrid object.

Inputs:

Output:

#' Extract a single observation from a predictor_hybrid object
#'
#' @param W A predictor_hybrid object with multiple samples.
#' @param i Integer index of the sample to extract.
#' @return A single-sample predictor_hybrid object.
#' @export
subset_predictor_hybrid <- function(W, i) {
  # Extract the i-th row of the scalar matrix
  new_Z <- matrix(W$Z[i, ], nrow = 1)
  
  # Extract the i-th column of coefficients for each functional object
  new_functional_list <- lapply(W$functional_list, function(fdobj) {
    fd(coef = matrix(coef(fdobj)[, i], ncol = 1), basisobj = fdobj$basis)
  })
  
  # Construct new object
  new_predictor <- predictor_hybrid(
    Z = new_Z, 
    functional_list = new_functional_list,
    eval_point = W$eval_point
  )
  
  return(new_predictor)
}

replace_obs_hybrid

Description: Replaces a specific observation within a predictor_hybrid object with a new single-sample hybrid object. This effectively updates the -th row of the scalar matrix and the -th column of the functional coefficients.

Inputs:

Output:

#' Replace a single observation in a predictor_hybrid object
#'
#' Replaces the i-th observation of a predictor_hybrid object with a new
#' single-sample predictor_hybrid object.
#'
#' @param W A predictor_hybrid object with one or more samples.
#' @param i An integer index specifying the observation to replace.
#' @param new_W A single-sample predictor_hybrid object to use for replacement.
#'
#' @return A predictor_hybrid object with the i-th observation replaced.
#' @export
replace_obs_hybrid <- function(W, i, new_W) {
  # Input validation
  if (!inherits(W, "predictor_hybrid") || !inherits(new_W, "predictor_hybrid")) {
    stop("Both W and new_W must be of class 'predictor_hybrid'.")
  }
  if (new_W$n_sample != 1) {
    stop("new_W must be a single-sample predictor_hybrid object.")
  }
  if (i < 1 || i > W$n_sample) {
    stop(paste("Index i must be between 1 and", W$n_sample))
  }
  if (W$n_scalar != new_W$n_scalar) {
    stop("Mismatch in number of scalar predictors.")
  }
  if (W$n_functional != new_W$n_functional) {
    stop("Mismatch in number of functional predictors.")
  }

  # Check for compatible basis objects
  is_eqbasis <- getFromNamespace("is.eqbasis", "fda")
  for (j in seq_len(W$n_functional)) {
    if (!is_eqbasis(W$functional_list[[j]]$basis, new_W$functional_list[[j]]$basis)) {
      stop("Functional predictors must have the same basis objects.")
    }
  }

  # Replace the i-th observation in the scalar matrix Z
  W$Z[i, ] <- new_W$Z[1, ]

  # Replace the i-th observation in each functional predictor
  for (j in seq_along(W$functional_list)) {
    # The coefficients are stored as a matrix, with columns corresponding to samples
    W$functional_list[[j]]$coefs[, i] <- new_W$functional_list[[j]]$coefs[, 1]
  }

  return(W)
}

add_broadcast (Matrix)

Description: A function that performs row-wise addition of a vector to a matrix. It broadcasts a single observation (vector) across all rows of the target matrix, verifying that the input is a matrix and the operand is a single observation.

Inputs:

Output:

#' Broadcast Addition for Matrices
#'
#' Adds a single observation (vector) to every row of a matrix.
#' Checks that the input is a matrix and the operand is a single observation.
#'
#' @param input A numeric matrix.
#' @param other A numeric vector or a 1-row matrix.
#' @param alpha A scalar multiplier (default 1).
#' @return A matrix with the broadcasted addition applied.
#' @export
add_broadcast <- function(input, other, alpha = 1) {
  
  # 1. Type Check: Ensure input is a matrix
  if (!is.matrix(input)) {
    stop("Argument 'input' must be a matrix.")
  }

  # 2. Structural Check: Ensure RHS is a single observation.
  # This uses length() for vectors, and dim()[1] for matrices.
  is_matrix_other <- is.matrix(other)
  if (is_matrix_other && dim(other)[1] > 1) {
    stop("Argument 'other' must be a vector or a single-row matrix.")
  }

  # 3. Conversion: Convert a single-row matrix into a vector for broadcasting.
  # If 'other' is already a vector, this does nothing.
  if (is_matrix_other) {
    other <- as.vector(other)
  }
  
  # 4. Dimension Check: Ensure vector length matches matrix columns
  if (length(other) != ncol(input)) {
    stop("Length of 'other' must match number of columns in 'input'.")
  }

  # 5. Native Arithmetic: Use simple R arithmetic for high performance.
  # R broadcasts the vector 'other' across the rows of the matrix 'input'.
  # Transposing twice (t(t(...) ...)) ensures correct column-wise recycling 
  # logic is applied to rows.
  return(t(t(input) + alpha * other))
}

subtr_broadcast (Matrix)

Description: A function that performs row-wise subtraction of a vector from a matrix. It is a wrapper around add_broadcast that applies a negative multiplier.

Inputs:

Output:

#' Broadcast Subtraction for Matrices
#'
#' Subtracts a single observation (vector) from every row of a matrix.
#'
#' @param input A numeric matrix.
#' @param other A numeric vector or a 1-row matrix.
#' @param alpha A scalar multiplier (default 1).
#' @return A matrix with the broadcasted subtraction applied.
#' @export
subtr_broadcast <- function(input, other, alpha = 1) {
  # Reuse add_broadcast with negated alpha
  add_broadcast(input, other, (-1 * alpha))
}

One iteration

small functions

get_gram_matrix_block

Description: Constructs the block-diagonal Gram matrix for a hybrid predictor object. This matrix represents the inner product structure of the combined functional and scalar space. It is defined as:

where:

Inputs:

Output:

get_smoothing_param_hybrid

Description: Constructs the block-diagonal smoothing parameter matrix . This matrix defines the regularization penalties applied to the coefficients. It is defined as:

where:

Inputs:

Output:

PLS component computation

get_xi_hat_linear_pen

Description: Computes the penalized PLS weight vector (direction) , represented as a predictor_hybrid object. This function solves for the coefficients that maximize the covariance between the predictor and the response , subject to a roughness penalty.

The optimization problem leads to a system of linear equations for each functional component :

where:

For the scalar part, the weights are simply the inner products . The final coefficient vector is normalized by the -norm (defined by the penalized inner product structure).

Inputs:

Output:

#' Compute Penalized PLS Weight Vector (Linear)
#'
#' Solves for the PLS weight direction xi that maximizes the covariance with y,
#' subject to roughness penalties on the functional components.
#'
#' @param W A `predictor_hybrid` object.
#' @param y A numeric vector of response values.
#' @param lambda A numeric vector of smoothing parameters.
#'
#' @return A single-sample `predictor_hybrid` object representing the weight direction xi.
#' @export
get_xi_hat_linear_pen <- function(W, y, lambda) {
  
  n <- W$n_sample
  K <- W$n_functional
  u <- gamma <- list()

  # --- Scalar Component ---
  # Compute scalar weights: v = Z^T * y
  v <- t(W$Z) %*% y
  
  # Initialize squared norm q with scalar part contribution
  # q = ||xi||^2_pen = sum(v^2) + functional_parts
  q <- sum(v^2)

  # --- Functional Components ---
  for (j in 1:K) {
    # Theta_t: coefficient matrix of the functional data (basis x samples)
    Theta_t <- W$functional_list[[j]]$coefs
    
    # B: Gram matrix for the j-th basis
    B <- W$gram_list[[j]]
    
    # Compute RHS of the linear system: u = B * Theta * y
    # This represents the unpenalized gradient term
    u[[j]] <- B %*% Theta_t %*% y

    # Linear System Matrix: R = Gram + lambda * Penalty
    R <- W$gram_list[[j]] + lambda[j] * W$gram_deriv_list[[j]]
    
    # Solve for gamma_j: (Gram + Penalty) * gamma = Gram * Theta * y
    gamma[[j]] <- solve(R, u[[j]])

    # Update squared norm q:
    # Add contribution gamma^T * R * gamma (penalized norm of functional weight)
    q <- q + sum(gamma[[j]] * (R %*% gamma[[j]]))
  }
  
  # --- Combine and Normalize ---
  # Flatten gamma list and append scalar weights v
  d_vec <- c(do.call(c, gamma), v)
  
  # Normalize the vector to have unit penalized norm
  d_vec <- d_vec / sqrt(q)
  
  # Reconstruct the result as a predictor_hybrid object
  xi_hat <- predictor_hybrid_from_coef(format = W, coef = d_vec)
  
  return(xi_hat)
}

get_rho

Description: Computes the vector of PLS scores by projecting the hybrid predictor data onto the weight direction vector (which represents ).

The projection is computed as:

where:

Inputs:

Output:

#' Exact Hybrid Inner Product
#' Computes <W, xi> using basis inner product matrices.
#' @param W The hybrid predictor object (contains coefs for N samples)
#' @param xi The hybrid weight object (contains coefs for 1 direction)
#' @return A vector of scores (length N)
get_rho <- function(W, xi) {
  
  # 1. Scalar Part (Standard dot product)
  # (N x p_scalar) %*% (p_scalar x 1)
  scores <- as.matrix(W$Z) %*% as.matrix(xi$Z)
  
  # 2. Functional Part (Basis Matrix Algebra)
  # Loop through each functional variable
  for(k in seq_along(W$functional_list)) {
    
    # Extract basis and coefficients
    # Coefs_W is (Basis_Dim x N)
    # Coefs_xi is (Basis_Dim x 1)
    fd_W  <- W$functional_list[[k]]
    fd_xi <- xi$functional_list[[k]]
    
    # A. Compute J Matrix (Basis Inner Product)
    # J[i,j] = Integral(phi_i(t) * phi_j(t) dt)
    # Note: computed once per variable, very fast for B-splines
    J <- fda::inprod(fd_W$basis, fd_W$basis)
    
    # B. Compute Integral via Matrix Multiplication
    # Formula: Score = Coefs_W^T * J * Coefs_xi
    # Dimensions: (N x B) * (B x B) * (B x 1) -> (N x 1)
    
    # Optimization: Pre-multiply J * xi to get a "weighted weight"
    weighted_xi <- J %*% fd_xi$coefs 
    
    # Project data onto this weighted weight
    scores <- scores + t(fd_W$coefs) %*% weighted_xi
  }
  
  return(as.vector(scores))
}

Response Residualization

get_nu

Description: Calculates the scalar regression coefficient obtained by regressing the response vector onto the PLS score vector . This minimizes the least squares error .

Inputs:

Output:

residualize_y

Description: Computes the residual of the response vector after removing the component explained by the current PLS score .

Inputs:

Output:

#' Residualize the response vector
#'
#' Subtracts the projection of y onto rho from y.
#'
#' @param y The current response vector.
#' @param rho The current PLS score vector.
#' @param nu The regression coefficient.
#'
#' @return The residualized response vector.
#' @export
residualize_y <- function(y, rho, nu) {
  y_next <- y - nu * rho
  return(y_next)
}

Predictor Residualization

get_delta

Description: Computes the hybrid regression coefficient for regressing the predictor onto the PLS scores . is a single-sample predictor_hybrid object representing the direction in the predictor space that corresponds to .

Inputs:

Output:

residualize_predictor

Description: Computes the residuals of the predictor object by subtracting the component explained by .

Inputs:

Output:

Main algorithm

fit.hybridPLS

Description: Implements the Hybrid Penalized Partial Least Squares (PLS) algorithm. This function iteratively extracts latent components that maximize the covariance between the hybrid predictors and the response, subject to roughness penalties.

Algorithm Steps: For each iteration :

  1. Weight Direction (): Computes the penalized weight vector that maximizes covariance with the current response residual .
  2. Scores (): Projects the current predictor residuals onto the weight direction to obtain PLS scores.
  3. loadings (): Computes the regression coefficients for projecting and onto the scores .
  4. Deflation: Updates the predictor and response by removing the variance explained by the current component (residualization).
  5. Coefficient Update: Computes the “whitened” direction (transforming weights back to the original predictor space) and updates the cumulative regression coefficient .

Inputs:

Output:

#' Fit Hybrid Penalized PLS with Optional Validation
#'
#' Runs the iterative PLS algorithm. If 'validation_data' is provided,
#' it computes and stores the RMSE on the validation set for each component.
#'
#' @param W Initial `predictor_hybrid` object (Training).
#' @param y Numeric vector of response values (Training).
#' @param n_iter Integer, number of components to extract.
#' @param lambda Numeric vector of smoothing parameters.
#' @param validation_data (Optional) A list containing `W_test` (predictor_hybrid) and `y_test`.
#'
#' @return A list containing the model (rho, xi, beta) and optional `validation_rmse`.
#' @export
fit.hybridPLS <- function(W, y, n_iter, lambda, validation_data = NULL) {
  # 1. Initialize storage
  W_now <- rho <- xi <- delta <- nu <- iota <- beta <- list()
  validation_rmse <- numeric(n_iter)
  
  # 2. Initialize current state
  W_now[[1]] <- W
  y_now <- y
  
  # 3. Main PLS Loop
  for (l in 1:n_iter) {
    # --- Step A: Weight Direction (xi) ---
    xi[[l]] <- get_xi_hat_linear_pen(W_now[[l]], y_now, lambda)
    
    # --- Step B: Scores (rho) ---
    rho[[l]] <- inprod.predictor_hybrid(W_now[[l]], xi[[l]])
    
    # --- Step C: Loadings (delta, nu) ---
    delta[[l]] <- get_delta(W_now[[l]], rho[[l]])
    nu[[l]] <- get_nu(y_now, rho[[l]])
    
    # --- Step D: Deflation ---
    W_now[[l + 1]] <- residualize_predictor(W_now[[l]], rho[[l]], delta[[l]])
    y_now <- residualize_y(y_now, rho[[l]], nu[[l]])
    
    # --- Step E: Cumulative Coefficient (Beta) ---
    iota[[l]] <- xi[[l]]
    
    if (l == 1) {
      beta[[l]] <- scalar_mul.predictor_hybrid(iota[[l]], nu[[l]])
    } else {
      for (u in 1:(l - 1)) {
        adjustment <- inprod.predictor_hybrid(delta[[u]], xi[[l]])
        iota[[l]] <- subtr.predictor_hybrid(iota[[l]], iota[[u]], adjustment)
      }
      beta[[l]] <- add.predictor_hybrid(beta[[l - 1]], iota[[l]], nu[[l]])
    }
    
    # --- Step F: Validation (Optional) ---
    if (!is.null(validation_data)) {
      # Predict on test set using the current cumulative beta
      y_pred_test <- inprod.predictor_hybrid(validation_data$W_test, beta[[l]])
      
      # Compute RMSE (standardized scale usually, but here just raw RMSE)
      validation_rmse[l] <- sqrt(mean((validation_data$y_test - y_pred_test)^2))
    }
  }
  
  return(list(
    rho = rho,
    xi = xi,
    beta = beta,
    W = W_now,
    delta = delta,
    nu = nu,
    validation_rmse = if (!is.null(validation_data)) validation_rmse else NULL
  ))
}

Simulation tools

sample splitting

create_idx_train_test

Description: Randomly splits the indices of a dataset into training and testing sets based on a specified ratio. This function ensures that the indices are sorted and returned as a list containing both sets.

Usage:

create_idx_train_test(n_sample, train_ratio)

Arguments:

Value:

#' Split Sample Indices into Train and Test Sets
#'
#' Randomly partitions the indices of a dataset into training and testing sets.
#'
#' @param n_sample Integer. The total number of samples.
#' @param train_ratio Numeric. The proportion of samples for the training set (0 to 1).
#'
#' @return A list with named elements `idx_train` and `idx_test`.
#' @export
create_idx_train_test <- function(n_sample, train_ratio) {
  # Calculate number of training samples
  n_train <- floor(n_sample * train_ratio)
  
  # Sample indices without replacement
  idx_train <- sort(sample(seq_len(n_sample), n_train, replace = FALSE))
  
  # The remaining indices form the test set
  idx_test <- sort(setdiff(seq_len(n_sample), idx_train))
  
  return(list(
    idx_train = idx_train,
    idx_test = idx_test
  ))
}

get_idx_train

Description: A simplified version of the splitting function that returns only the training indices. Useful when the testing set is defined implicitly as the complement or not immediately needed.

Usage:

get_idx_train(n_sample, train_ratio)

Arguments:

Value:

#' Generate Train Indices
#'
#' Randomly generates a sorted vector of indices for a training set.
#'
#' @param n_sample Integer. Number of samples.
#' @param train_ratio Numeric. Ratio of train samples.
#' @return Vector of sorted train indices.
#' @export
get_idx_train <- function(n_sample, train_ratio) {
  n_train <- floor(n_sample * train_ratio)
  idx_train <- sort(sample(seq_len(n_sample), n_train, replace = FALSE))
  return(idx_train)
}

create_idx_kfold

Description: Generates indices for K-fold cross-validation. It relies on the caret package to create balanced folds and organizes them into pairs of training and validation indices for each fold.

Usage:

create_idx_kfold(n_sample, n_fold)

Arguments:

Value:

#' Generate K-Fold Cross-Validation Indices
#'
#' Creates a list of training and validation indices for K-fold cross-validation.
#' Uses `caret::createFolds` to ensure stratified or random splitting.
#'
#' @param n_sample Integer. The total number of samples.
#' @param n_fold Integer. The number of folds.
#'
#' @return A list of lists, where each sub-list contains `idx_train` and `idx_valid`.
#' @export
create_idx_kfold <- function(n_sample, n_fold) {
  # Generate folds using caret (returns validation indices by default)
  idx_list <- caret::createFolds(seq_len(n_sample), k = n_fold, list = TRUE, returnTrain = FALSE)
  
  idx_list_list <- vector("list", length(idx_list))
  
  for (i in seq_along(idx_list)) {
    valid_hyper_index <- idx_list[[i]]
    # Training set is the complement of the validation set
    train_hyper_index <- setdiff(seq_len(n_sample), valid_hyper_index)
    
    idx_list_list[[i]] <- list(
      "idx_train" = train_hyper_index, 
      "idx_valid" = valid_hyper_index
    )
  }
  
  return(idx_list_list)
}

n_sample.fd

Description: Extracts the number of samples (replications) from a functional data object (fd class from the fda package). This implementation robustly checks the dimensions of the coefficient matrix.

Usage:

n_sample.fd(fd_obj)

Arguments:

Value:

#' Get Number of Samples from an fd Object
#'
#' Extracts the number of observations (replications) stored in a functional data object.
#'
#' @param fd_obj An object of class `fd`.
#' @return Integer representing the number of samples.
#' @export
n_sample.fd <- function(fd_obj) {
  if (!inherits(fd_obj, "fd")) {
    stop("Input must be of class 'fd'.")
  }
  # The coefficient matrix has dimensions (nbasis x n_sample)
  return(ncol(fd_obj$coefs))
}

split.all

Description: Splits a predictor_hybrid object and a corresponding response vector into training and testing sets based on a specified ratio. This function ensures that both the scalar matrix and the list of functional predictors are subsetted consistently.

Inputs:

Output:

#' Split Hybrid Predictor and Response Data
#'
#' Randomly partitions the hybrid predictor object and response vector into 
#' training and testing sets based on a given ratio. Handles any number of 
#' functional predictors.
#'
#' @param W_hybrid A `predictor_hybrid` object containing all functional and scalar data.
#' @param response A numeric vector. The response variable corresponding to the samples in W_hybrid.
#' @param train_ratio A numeric scalar between 0 and 1 indicating the proportion of data to use for training.
#'
#' @return A list containing split train and test sets for predictors and response.
#' @export
split.all <- function(W_hybrid, response, train_ratio) {
  # --- Input Validation ---
  if (!inherits(W_hybrid, "predictor_hybrid")) {
    stop("'W_hybrid' must be a predictor_hybrid object.")
  }
  if (!is.vector(response) && !is.factor(response)) {
    stop("'response' must be a vector.")
  }
  if (length(response) != W_hybrid$n_sample) {
    stop("Response length must match the number of samples in W_hybrid.")
  }
  if (!is.numeric(train_ratio) || train_ratio <= 0 || train_ratio >= 1) {
    stop("'train_ratio' must be a numeric value strictly between 0 and 1.")
  }

  N <- W_hybrid$n_sample
  N_train <- floor(N * train_ratio)

  # Generate random indices
  # set.seed() should be called by the user before this function if reproducibility is needed
  train_idx <- sort(sample(seq_len(N), N_train))
  test_idx  <- sort(setdiff(seq_len(N), train_idx))

  # --- Subset Scalar Predictors ---
  Z_train <- W_hybrid$Z[train_idx, , drop = FALSE]
  Z_test  <- W_hybrid$Z[test_idx,  , drop = FALSE]

  # --- Subset Functional Predictors ---
  # Use lapply to handle any number of functional predictors (K)
  # The '[' operator on an fd object subsets the samples (replicates)
  fun_list_train <- lapply(W_hybrid$functional_list, function(fd_obj) fd_obj[train_idx])
  fun_list_test  <- lapply(W_hybrid$functional_list, function(fd_obj) fd_obj[test_idx])

  # --- Reconstruct Hybrid Objects ---
  # We reuse the original evaluation points
  W_train <- predictor_hybrid(Z = Z_train, functional_list = fun_list_train, eval_point = W_hybrid$eval_point)
  W_test  <- predictor_hybrid(Z = Z_test,  functional_list = fun_list_test,  eval_point = W_hybrid$eval_point)

  return(list(
    predictor_train = W_train,
    predictor_test  = W_test,
    response_train  = response[train_idx],
    response_test   = response[test_idx],
    train_idx       = train_idx, # Useful to return indices for reference
    test_idx        = test_idx
  ))
}

normalization

The goal is to perform a two-step standardization: first normalizing within each data type (functional vs. scalar) to account for unit differences, and second normalizing between the two types to balance their variances for the PLS framework.

curve_normalize

Description: Performs pointwise centering and scalar scaling on a functional data object. It subtracts a mean function and divides by a scaling constant (denominator). This implementation manually broadcasts the mean subtraction to ensure compatibility with fda objects where standard minus.fd can sometimes struggle with broadcasting single mean functions across multiple samples.

Usage:

curve_normalize(functional, mean_functional, deno)

Arguments:

Value:

#' Normalize a functional data object
#'
#' Subtracts a mean function and divides by a scalar constant.
#' Manual broadcasting is used to ensure robust subtraction across samples.
#'
#' @param functional An `fd` object to normalize.
#' @param mean_functional An `fd` object representing the mean.
#' @param deno A numeric scalar for scaling (division).
#'
#' @return A normalized `fd` object.
#' @export
curve_normalize <- function(functional, mean_functional, deno) {
  # Replicate the mean coefficients to match the number of samples in 'functional'
  # This creates a matrix of identical mean columns for broadcasting
  mean_coefs_matrix <- coef(mean_functional) %*% matrix(1, ncol = ncol(coef(functional)))
  
  # Subtract mean and create new fd object
  functional_normalized <- fd(
    coef = coef(functional) - mean_coefs_matrix,
    basisobj = functional$basis
  )
  
  # Apply scaling factor
  # times.fd usually handles scalar multiplication well
  functional_normalized <- times.fd(1 / deno, functional_normalized)
  
  return(functional_normalized)
}

curve_normalize_train_test

Description: Standardizes the functional components of hybrid predictors. It centers them to have mean zero and scales them to have unit integrated variance. Crucially, the mean and standard deviation are derived only from the training data to prevent data leakage. These training statistics are then applied to normalize the test set.

Usage:

curve_normalize_train_test(train, test)

Arguments:

Value:

#' Normalize Functional Predictors (Train/Test Split)
#'
#' Standardizes functional predictors to have mean zero and unit integrated variance.
#' Statistics (mean and SD) are computed from the training set and applied to the test set.
#'
#' @param train A `predictor_hybrid` object representing the training set.
#' @param test A `predictor_hybrid` object representing the testing set.
#'
#' @return A list containing:
#'   \item{predictor_train}{The normalized training object.}
#'   \item{predictor_test}{The normalized testing object.}
#'   \item{mean_train}{List of mean functions used for centering.}
#'   \item{deno_train}{Vector of scaling factors used.}
#' @export
curve_normalize_train_test <- function(train, test) {
  stopifnot(
    inherits(train, "predictor_hybrid"),
    inherits(test, "predictor_hybrid"),
    train$n_functional == test$n_functional
  )

  train_normalized <- train
  test_normalized <- test
  deno_train <- numeric(train$n_functional)
  mean_train <- list()

  # Iterate through each functional predictor
  for (k in seq_len(train$n_functional)) {
    # 1. Compute Statistics from TRAINING data
    train_fd <- train$functional_list[[k]]
    
    mean_train[[k]] <- fda::mean.fd(train_fd)
    sd_train <- fda::sd.fd(train_fd)
    
    # Scaling factor is the L2 norm of the standard deviation function
    deno_train[k] <- sqrt(fda::inprod(sd_train, sd_train))

    # Safety check for zero variance
    if (deno_train[k] <= 1e-10) {
      warning(paste("Functional predictor", k, "has near-zero variance. Skipping scaling."))
      deno_train[k] <- 1 
    }

    # 2. Normalize TRAINING set
    train_normalized$functional_list[[k]] <- curve_normalize(
      train_fd, 
      mean_train[[k]], 
      deno_train[k]
    )

    # 3. Normalize TEST set using TRAIN stats
    test_normalized$functional_list[[k]] <- curve_normalize(
      test$functional_list[[k]], 
      mean_train[[k]], 
      deno_train[k]
    )
  }

  return(list(
    predictor_train = train_normalized,
    predictor_test = test_normalized,
    mean_train = mean_train,
    deno_train = deno_train
  ))
}

scalar_normalize

Description: Helper function to normalize a scalar matrix. It subtracts a mean vector and divides by a standard deviation vector.

Usage:

scalar_normalize(scalar_predictor, mean, deno)

Arguments:

Value:

#' Normalize a scalar predictor matrix
#'
#' Centers columns by 'mean' and scales them by 'deno'.
#'
#' @param scalar_predictor Numeric matrix.
#' @param mean Numeric vector of column means.
#' @param deno Numeric vector of column standard deviations.
#' @return Normalized matrix.
#' @export
scalar_normalize <- function(scalar_predictor, mean, deno) {
  # Subtract mean from each row (broadcasted)
  # Uses the helper subtr_broadcast function defined previously
  scalar_predictor_normalized <- subtr_broadcast(scalar_predictor, mean)
  
  # Create a matrix of denominators matching the dimensions of the predictor
  deno_mat <- matrix(rep(deno, each = nrow(scalar_predictor)), 
                     nrow = nrow(scalar_predictor), 
                     ncol = length(deno))
  
  # Element-wise division
  scalar_predictor_normalized <- scalar_predictor_normalized / deno_mat
  return(scalar_predictor_normalized)
}

scalar_normalize_train_test

Description: Standardizes the scalar components () of the hybrid predictors. It ensures each scalar covariate has mean zero and unit variance. As with the functional part, statistics are derived solely from the training set.

Usage:

scalar_normalize_train_test(predictor_train, predictor_test)

Arguments:

Value:

#' Normalize Scalar Predictors (Train/Test Split)
#'
#' Standardizes scalar predictors (Z) to have mean zero and unit variance.
#' Statistics are computed from the training set.
#'
#' @param predictor_train A `predictor_hybrid` object (train).
#' @param predictor_test A `predictor_hybrid` object (test).
#'
#' @return A list containing normalized objects and statistics.
#' @export
scalar_normalize_train_test <- function(predictor_train, predictor_test) {
  stopifnot(
    inherits(predictor_train, "predictor_hybrid"),
    inherits(predictor_test, "predictor_hybrid"),
    predictor_train$n_scalar == predictor_test$n_scalar
  )

  predictor_train_normalized <- predictor_train
  predictor_test_normalized <- predictor_test

  # 1. Compute Statistics from TRAINING data
  mean_train <- colMeans(predictor_train$Z)
  sd_train <- apply(predictor_train$Z, 2, sd)
  
  # Handle zero variance columns
  sd_train[sd_train == 0] <- 1

  # 2. Normalize TRAINING set
  predictor_train_normalized$Z <- scalar_normalize(
    predictor_train$Z,
    mean_train,
    sd_train
  )

  # 3. Normalize TEST set using TRAIN stats
  predictor_test_normalized$Z <- scalar_normalize(
    predictor_test$Z,
    mean_train,
    sd_train
  )

  return(list(
    predictor_train = predictor_train_normalized,
    predictor_test = predictor_test_normalized,
    mean_train = mean_train,
    sd_train = sd_train
  ))
}

btwn_normalize_train_test

Description: Performs the second step of standardization: balancing the variance between functional and scalar predictors. It scales the scalar component by a factor so that the total variance of the scalar part matches the total variance of the functional parts.

The weight is calculated as:

where the numerator is the sum of squared norms of all functional predictors, and the denominator is the sum of squared Euclidean norms of the scalar predictors (calculated on the training set).

Usage:

btwn_normalize_train_test(train, test)

Arguments:

Value:

#' Inter-Modality Normalization (Between Functional and Scalar)
#'
#' Scales the scalar predictors (Z) so that their total variability is comparable
#' to the total variability of the functional predictors. This addresses the 
#' scale invariance issue in PLS by weighting the scalar part.
#'
#' @param train A `predictor_hybrid` object (train).
#' @param test A `predictor_hybrid` object (test).
#'
#' @return A list containing normalized objects and the scaling factor.
#' @export
btwn_normalize_train_test <- function(train, test) {
  stopifnot(inherits(train, "predictor_hybrid"), inherits(test, "predictor_hybrid"))

  train_normalized <- train
  test_normalized <- test
  Z_s <- train$Z

  # --- 1. Calculate Numerator (Total Variance of Functional Parts) ---
  omega_numerator <- 0
  for (k in seq_len(train$n_functional)) {
    # Compute inner product matrix <X_i, X_j>
    # The diagonal elements <X_i, X_i> represent the squared L2 norm ||X_i||^2
    functional_inprod_matrix <- fda::inprod(
      train$functional_list[[k]],
      train$functional_list[[k]]
    )
    omega_numerator <- omega_numerator + sum(diag(functional_inprod_matrix))
  }

  # --- 2. Calculate Denominator (Total Variance of Scalar Part) ---
  omega_denominator <- sum(Z_s^2)

  # --- 3. Compute Weight omega ---
  if (omega_denominator == 0) {
    warning("Scalar component has zero variance. Skipping inter-modality scaling.")
    omega <- 1
  } else {
    omega <- omega_numerator / omega_denominator
  }

  # --- 4. Apply Scaling (sqrt(omega)) ---
  # This effectively weights the scalar inner product by omega
  scaling_factor <- sqrt(omega)
  
  train_normalized$Z <- scaling_factor * train$Z
  test_normalized$Z  <- scaling_factor * test$Z

  return(list(
    predictor_train = train_normalized,
    predictor_test = test_normalized,
    scaling_factor = scaling_factor
  ))
}

split_and_normalize.all

Description: A wrapper function that orchestrates the entire data preprocessing pipeline. It performs the following steps in order:

  1. Splitting: divides predictors and response into training and testing sets.
  2. Within-Normalization (Functional): Standardizes functional predictors (mean 0, unit integrated var).
  3. Within-Normalization (Scalar): Standardizes scalar predictors (mean 0, unit var).
  4. Between-Normalization: Balances variance between functional and scalar components.
  5. Response Normalization: Standardizes the response variable (mean 0, unit var).

Usage:

split_and_normalize.all(W_hybrid, response, train_ratio)

Arguments:

Value:

#' Split and Preprocess Hybrid Data
#'
#' Executes the full preprocessing pipeline: splitting data, standardizing within
#' modalities (functional/scalar), balancing variance between modalities, and 
#' standardizing the response.
#'
#' @param W_hybrid The full `predictor_hybrid` object.
#' @param response The numeric response vector.
#' @param train_ratio Numeric scalar (0-1) for train/test split.
#'
#' @return A list containing processed datasets and normalization parameters.
#' @export
split_and_normalize.all <- function(W_hybrid, response, train_ratio) {
  
  # 1. Split Data
  # Note: Requires the previously defined split.all function
  split_result <- split.all(W_hybrid, response, train_ratio)

  # 2. Normalize Functional Predictors (Within)
  curve_norm <- curve_normalize_train_test(
    split_result$predictor_train, 
    split_result$predictor_test
  )
  
  # 3. Normalize Scalar Predictors (Within)
  scalar_norm <- scalar_normalize_train_test(
    curve_norm$predictor_train, 
    curve_norm$predictor_test
  )
  
  # 4. Normalize Between Modalities (Balance Variances)
  btwn_norm <- btwn_normalize_train_test(
    scalar_norm$predictor_train,
    scalar_norm$predictor_test
  )
  
  # 5. Normalize Response Variable
  response_mean_train <- mean(split_result$response_train)
  response_sd_train   <- sd(split_result$response_train)
  
  # Prevent division by zero if response is constant
  if (response_sd_train == 0) response_sd_train <- 1
  
  response_train_std <- (split_result$response_train - response_mean_train) / response_sd_train
  response_test_std  <- (split_result$response_test  - response_mean_train) / response_sd_train
  
  response_norm <- list(mean_train = response_mean_train, sd_train = response_sd_train)

  # Return comprehensive results
  return(list(
    predictor_train = btwn_norm$predictor_train,
    predictor_test  = btwn_norm$predictor_test,
    response_train  = response_train_std,
    response_test   = response_test_std,
    
    # Store intermediate results and statistics for reference/reconstruction
    details = list(
      curve_normalize_result  = curve_norm,
      scalar_normalize_result = scalar_norm,
      btwn_normalize_result   = btwn_norm,
      response_normalize_result = response_norm,
      split_indices = list(train = split_result$train_idx, test = split_result$test_idx)
    )
  ))
}

Baseline methods

fit_hybrid_pcr_iterative

Description: This function implements a Hybrid Principal Component Regression (PCR) algorithm designed to handle high-dimensional data containing both scalar vectors and functional curves. The core strategy is dimensionality reduction via PCA applied separately to each data modality, followed by a linear regression on the resulting scores.

1. Dimensionality Reduction

The function decouples the dimensionality reduction step for scalar and functional predictors:

2. Iterative Model Fitting

Instead of selecting a single fixed number of components, the function iterates through a range of model complexities ($l = 1, \dots, \text{n_comp_max}$).

At iteration $l$, the regression model is formulated as: \(Y = \beta_0 + \sum_{j=1}^{\min(l, \text{rank}(Z))} \alpha_j \text{Score}_{Z,j} + \sum_{p=1}^{P} \sum_{k=1}^{l} \gamma_{p,k} \text{Score}_{X^{(p)}, k} + \epsilon\)

This allows us to observe the full trajectory of the Root Mean Squared Error (RMSE) as the model complexity increases, facilitating the selection of the optimal number of components.

```{r fit_hybrid_pcr_iterative} #’ Iterative Hybrid Principal Component Regression #’ #’ @description #’ Fits a Principal Component Regression (PCR) model that integrates both scalar and #’ functional predictors. The function performs dimensionality reduction separately #’ for each modality and iteratively fits linear models with an increasing number #’ of principal components. #’ #’ @details #’ The workflow consists of three main stages: #’ \enumerate{ #’ \item \strong{Scalar PCA:} Performs standard PCA on the scalar matrix \code{Z}. #’ Test data is projected onto the training rotation matrix. #’ \item \strong{Functional PCA:} Performs FPCA on each functional predictor using #’ the \code{fda} package. Crucially, test curves are centered using the #’ \emph{training mean function} before projection onto training harmonics #’ to prevent data leakage. #’ \item \strong{Iterative Regression:} Fits a series of linear models (\code{lm}). #’ Model \code{l} includes the first \code{l} principal components from #’ both the scalar and functional sets. #’ } #’ #’ @param W_train A \code{predictor_hybrid} object containing training data. #’ @param y_train A numeric vector of training response values. #’ @param W_test A \code{predictor_hybrid} object containing test data. #’ @param y_test A numeric vector of test response values. #’ @param n_comp_max Integer. The maximum number of principal components to include #’ in the regression models. Defaults to 10. #’ #’ @return A list containing: #’ \item{validation_rmse}{A numeric vector of length \code{n_comp_max} containing #’ the RMSE on the test set for each model complexity level.} #’ #’ @importFrom stats prcomp predict lm #’ @importFrom fda pca.fd fd inprod #’ @export fit_hybrid_pcr_iterative <- function(W_train, y_train, W_test, y_test, n_comp_max = 10) {

validation_rmse <- numeric(n_comp_max)

# 1. Scalar PCA (Train) # Computes rotation matrix on training data only k_z_max <- min(n_comp_max, ncol(W_train$Z)) pca_z <- prcomp(W_train$Z, center = TRUE, scale. = TRUE)

# Extract training scores and project test data onto training rotation all_z_scores_train <- pca_z$x[, 1:k_z_max, drop = FALSE] all_z_scores_test <- predict(pca_z, newdata = W_test$Z)[, 1:k_z_max, drop = FALSE]

# 2. Functional PCA (Train) n_funcs <- length(W_train$functional_list) all_f_scores_train_list <- list() all_f_scores_test_list <- list()

for(i in 1:n_funcs) { # Fit FPCA on Training Data to get harmonics and mean pca_fd <- fda::pca.fd(W_train$functional_list[[i]], nharm = n_comp_max) all_f_scores_train_list[[i]] <- pca_fd$scores

# Project Test Data (Robust Manual Centering)
mean_fd <- pca_fd$meanfd
harmonics <- pca_fd$harmonics

# Extract test coefficients and training mean coefficients
coefs_test <- W_test$functional_list[[i]]$coefs
coefs_mean <- as.vector(mean_fd$coefs)

# Sweep subtract: Test_Centered = Test_Raw - Mean_Train
coefs_centered <- sweep(coefs_test, 1, coefs_mean, "-")
centered_test  <- fd(coefs_centered, W_test$functional_list[[i]]$basis)

# Project centered test data onto training harmonics
all_f_scores_test_list[[i]] <- fda::inprod(centered_test, harmonics)   }

# 3. Iterative Regression Loop for (l in 1:n_comp_max) { # Scalar Scores Subset k_curr <- min(l, k_z_max) z_train_curr <- all_z_scores_train[, 1:k_curr, drop=FALSE] z_test_curr <- all_z_scores_test[, 1:k_curr, drop=FALSE]

# Functional Scores Subset
f_train_curr_list <- list()
f_test_curr_list  <- list()
for(i in 1:n_funcs) {
  f_train_curr_list[[i]] <- all_f_scores_train_list[[i]][, 1:l, drop=FALSE]
  f_test_curr_list[[i]]  <- all_f_scores_test_list[[i]][, 1:l, drop=FALSE]
}

# Construct Design Matrices
df_train <- data.frame(z_train_curr, do.call(cbind, f_train_curr_list))
df_test  <- data.frame(z_test_curr,  do.call(cbind, f_test_curr_list))

# Fit Linear Model
df_train$Y <- y_train
model <- lm(Y ~ ., data = df_train)

# Predict and Compute RMSE
y_pred <- predict(model, newdata = df_test)
validation_rmse[l] <- sqrt(mean((y_test - y_pred)^2))   }

return(list(validation_rmse = validation_rmse)) }


### fit_pfr

**Description:**
The `fit_pfr` function serves as a wrapper for fitting a Penalized Functional Regression (PFR) model (using the `refund` package in R) within a simulation or cross-validation pipeline. It is designed to handle a specific data structure where predictors are split into scalar matrices and functional data objects.

**Key Implementation Details:**

* **Scalar Predictor Unrolling:** A common issue when using `pfr()` or `gam()` is that passing a matrix column (e.g., `df$Z`) can sometimes trigger "length mismatch" errors during prediction. This function addresses this by converting the scalar matrix `Z` into individual columns (`Z1`, `Z2`, etc.) within the data frame.

* **Functional Data Evaluation:** The function assumes inputs are `fd` objects (from the `fda` package). It evaluates these functions at specified time points (`eval_pts`) to create a dense matrix representation required by the `lf()` (linear functional) term in `pfr`.

* **Dynamic Formula Construction:** Since the number of scalar predictors may vary, the regression formula is constructed dynamically as a string before being passed to `pfr`.

* **FPCA Smoothing:** The term `lf(..., presmooth = 'fpca.sc')` indicates that the functional predictors are pre-smoothed using Functional Principal Component Analysis (FPCA). This allows the model to handle noisy or irregular functional data effectively by regressing on the principal component scores.
 

```{r}
#' Fit Penalized Functional Regression (PFR)
#'
#' This function formats training and testing data, fits a PFR model using the `refund` package,
#' and evaluates predictive performance via RMSE. It explicitly handles the "unrolling" of
#' scalar matrix predictors to avoid common dimension errors in `gam`/`pfr`.
#'
#' @param W_train A list containing training predictors:
#'   \itemize{
#'     \item \code{Z}: Matrix of scalar predictors.
#'     \item \code{functional_list}: List of `fd` objects (functional predictors).
#'     \item \code{eval_point}: Vector of evaluation time points.
#'   }
#' @param y_train Vector of scalar response values for training.
#' @param W_test A list containing testing predictors (same structure as \code{W_train}).
#' @param y_test Vector of scalar response values for testing (used for RMSE calculation).
#'
#' @return A numeric value representing the Root Mean Squared Error (RMSE) on the test set.
#'         Returns \code{NA} if the model fit fails.
#' @export
fit_pfr <- function(W_train, y_train, W_test, y_test) {
  
  # Extract common evaluation points for the functional curves
  eval_pts <- W_train$eval_point
  
  # ------------------------------------------------------------------
  # 1. Prepare Scalar Predictors (Unrolling)
  # ------------------------------------------------------------------
  # Convert the scalar matrix Z into a data frame with explicit column names 
  # (Z1, Z2, ...). This "unrolling" is critical because `pfr`/`gam` can 
  # throw "variable length differs" errors if predictors are stored as a 
  # single matrix column inside the data frame during prediction.
  Z_train_df <- as.data.frame(W_train$Z)
  colnames(Z_train_df) <- paste0("Z", 1:ncol(Z_train_df))
  
  Z_test_df  <- as.data.frame(W_test$Z)
  colnames(Z_test_df) <- paste0("Z", 1:ncol(Z_test_df))
  
  # ------------------------------------------------------------------
  # 2. Construct Data Frames for Model Fitting
  # ------------------------------------------------------------------
  # Combine response, unrolled scalars, and evaluated functional matrices.
  
  # Training Data
  df_train <- cbind(response = y_train, Z_train_df)
  
  # Evaluate functional objects (fd) at discrete points to create predictor matrices.
  # t() is used because eval.fd returns (time x subjects), but pfr expects (subjects x time).
  df_train$F1 <- t(eval.fd(eval_pts, W_train$functional_list[[1]])) 
  df_train$F2 <- t(eval.fd(eval_pts, W_train$functional_list[[2]]))
  
  # Testing Data
  df_test <- cbind(response = y_test, Z_test_df)
  df_test$F1 <- t(eval.fd(eval_pts, W_test$functional_list[[1]]))
  df_test$F2 <- t(eval.fd(eval_pts, W_test$functional_list[[2]]))
  
  # ------------------------------------------------------------------
  # 3. Dynamically Build the Formula
  # ------------------------------------------------------------------
  # Create the scalar part of the formula: "Z1 + Z2 + ... + Zp"
  scalar_formula_part <- paste(colnames(Z_train_df), collapse = " + ")
  
  # Construct the full formula string.
  # lf(): Defines a linear functional term.
  # presmooth = 'fpca.sc': Applies FPCA to smooth functions and reduce dimensionality 
  # prior to regression, which is robust for noisy data.
  f_str <- paste0(
    "response ~ ", scalar_formula_part, " + ",
    "lf(F1, argvals = eval_pts, presmooth = 'fpca.sc') + ",
    "lf(F2, argvals = eval_pts, presmooth = 'fpca.sc')"
  )
  
  # ------------------------------------------------------------------
  # 4. Fit Model and Predict
  # ------------------------------------------------------------------
  tryCatch({
    # Fit the PFR model.
    # method = "GCV.Cp": Uses Generalized Cross-Validation for smoothing parameter selection.
    # gamma = 1.2: Inflation factor for GCV to prevent overfitting (smoother results).
    fit <- pfr(as.formula(f_str), data = df_train, method = "GCV.Cp", gamma = 1.2)
    
    # Predict on test data
    pred <- predict(fit, newdata = df_test)
    
    # Calculate RMSE
    rmse <- sqrt(mean((y_test - pred)^2))
    return(rmse)
    
  }, error = function(e) {
    # Graceful error handling prevents the entire simulation loop from crashing
    message("PFR Error: ", e$message)
    return(NA)
  })
}

Kidney Data Preprocessing Pipeline

This section details the transformation of raw renogram data into a structured Hybrid Predictor object suitable for the FSHybridPLS package. The process involves extracting patient metadata, averaging and transforming response variables, and applying a domain-specific normalization to the functional curves.

#' Load and Preprocess Kidney Data for FSHybridPLS
#'
#' This function processes the raw 'kidney' dataframe into a `predictor_hybrid` object.
#' It performs:
#' 1. Extraction of scalar and functional covariates.
#' 2. Specific medical normalization (scaling by baseline max).
#' 3. Smoothing of functional curves into B-spline basis objects.
#' 4. Transformation of Response: Min-Max scaling followed by Logit transformation.
#' 5. Construction of the hybrid predictor object.
#'
#' @param kidney_df The raw dataframe containing kidney data (columns: ID, Study, renogram_value, etc.)
#' @param n_basis Integer. Number of B-spline basis functions to use for smoothing (default 20).
#'
#' @return A list containing:
#'   \item{W}{A `predictor_hybrid` object containing the predictors.}
#'   \item{y}{A numeric vector representing the transformed response (logit of min-max scaled mean diagnosis metrics).}
#' @export
load_and_preprocess_kidney_data <- function(kidney_df, n_basis = 20) {
  
  # --- 1. Data Extraction ---
  # Split data by ID to ensure order consistency
  ids <- unique(kidney_df$ID)
  n_sample <- length(ids)
  
  # Helper to extract scalar/response from the first row of each patient's baseline
  base_data_idx <- which(kidney_df$Study == "Baseline")
  patient_meta <- kidney_df[base_data_idx, ]
  patient_meta <- patient_meta[!duplicated(patient_meta$ID), ]
  
  # --- Response Extraction & Transformation ---
  # 1. Raw Mean (Columns 13, 14, 15)
  y_mat <- as.matrix(patient_meta[, c(13, 14, 15)])
  y_raw <- rowMeans(y_mat, na.rm = TRUE)
  
  # 2. Min-Max Scaling to [0, 1]
  y_min <- min(y_raw, na.rm = TRUE)
  y_max <- max(y_raw, na.rm = TRUE)
  # Prevent division by zero if all y are identical
  if (y_max == y_min) {
    warning("Response variable has zero variance. Skipping scaling/logit.")
    y <- y_raw
  } else {
    y_scaled <- (y_raw - y_min) / (y_max - y_min)
    
    # 3. Safety Squeeze (avoid Inf/-Inf at boundaries)
    # Maps [0, 1] -> [epsilon, 1-epsilon]
    epsilon <- 1e-5
    y_safe <- y_scaled * (1 - 2 * epsilon) + epsilon
    
    # 4. Logit Transformation: log(p / (1-p))
    #y <- qlogis(y_safe)
    y <- y_scaled
    # <- y_raw
  }
  
  # --- Scalar Predictors (Z) ---
  # Columns: 12 (Age) + 16:29
  Z <- as.matrix(patient_meta[, c(12, 16:29)])
  colnames(Z) <- c("Age", colnames(patient_meta)[16:29])
  
  # --- Functional Data Extraction ---
  # Filter and sort
  df_base <- kidney_df[kidney_df$Study == "Baseline", ]
  df_post <- kidney_df[kidney_df$Study != "Baseline", ]
  
  # Reshape to (ID x Time) matrices
  reno_base_mat <- matrix(df_base$renogram_value, nrow = n_sample, byrow = TRUE)
  reno_post_mat <- matrix(df_post$renogram_value, nrow = n_sample, byrow = TRUE)
  
  # --- 2. Medical Preprocessing (Normalization) ---
  # Calculate max of baseline for each patient
  max_base <- apply(reno_base_mat, 1, max)
  max_base[max_base == 0] <- 1 
  
  # Scale matrices
  reno_base_norm <- reno_base_mat / max_base
  reno_post_norm <- reno_post_mat / max_base
  
  # --- 3. Smoothing (Convert to fd objects) ---
  # Define time grids normalized to [0,1]
  t_base <- seq(0, 1, length.out = ncol(reno_base_norm))
  t_post <- seq(0, 1, length.out = ncol(reno_post_norm))
  
  # Create Basis (B-spline)
  basis_base <- create.bspline.basis(rangeval = c(0, 1), nbasis = n_basis)
  basis_post <- create.bspline.basis(rangeval = c(0, 1), nbasis = n_basis)
  
  # Smooth (Transpose: Time x Samples)
  fd_base <- Data2fd(t_base, t(reno_base_norm), basis_base)
  fd_post <- Data2fd(t_post, t(reno_post_norm), basis_post)
  
  # Assign Names
  fd_base$fdnames <- list("Time", "Patient", "Baseline Renogram")
  fd_post$fdnames <- list("Time", "Patient", "Post-Furosemide Renogram")
  
  # --- 4. Construct Hybrid Predictor ---
  common_eval <- seq(0, 1, length.out = 100)
  
  W <- predictor_hybrid(
    Z = Z,
    functional_list = list(fd_base, fd_post),
    eval_point = common_eval
  )
  
  return(list(W = W, y = y))
}

Numerical Studies

This repository contains R scripts to replicate the numerical studies presented in the paper. All scripts are located in the numerical_studies directory.

1. 6_1_geometric_validation.R

Validates the geometric properties of the hybrid space (orthonormality, weighted norms).

Rscript numerical_studies/6_1_geometric_validation.R [RepID]

2. 6_2_beta_estimation.R

Estimates the $\beta$ coefficient function for a single hyperparameter pair.

Rscript numerical_studies/6_2_beta_estimation.R [RepID] [N] [Lam1] [Lam2]

3. 6_3_1_scenario_1.R

Runs the simulation for Scenario 1: Orthogonal Nuisance Variance.

Rscript numerical_studies/6_3_1_scenario_1.R [RepID]

4. 6_3_2_scenario_2.R

Runs the simulation for Scenario 2: Function-Driven Intermodal Correlation.

Rscript numerical_studies/6_3_2_scenario_2.R [RepID]

5. 6_4_kidney_single_rep.R

Runs the comparative analysis on the kidney dataset (HybridPLS vs. OLS, PCR, PFR). Note: Requires renogram_data.csv in the ../data/ directory.

Rscript numerical_studies/6_4_kidney_single_rep.R [Lam1] [Lam2] [RepID]