Wavelet Basis & Function Spaces: Besov vs. Hölder
I. Context: Signal Processing vs. Statistical Estimation
Before diving into Besov spaces, it is crucial to understand the fundamental shift from Convolution (used in neuroscience/signal processing) to Basis Expansion (used in statistics).
| Feature | Wavelet Convolution (CWT) | Wavelet Basis (DWT) |
|---|---|---|
| Goal | Phase, Power Extraction & Visualization | Estimation, Compression, Denoising |
| Method | Sliding window (Redundant) | Tiling / Grid (Orthogonal) |
| Structure | Smooth, overlapping coefficients | Sparse, independent coefficients |
| Wavelet | Complex Morlet (Smooth) | Haar, Daubechies (Compact/Step) |
In the statistical context (e.g., nonparametric regression), we prioritize efficiency and sparsity. We want to reconstruct a function $f(t)$ using the fewest number of coefficients possible.
II. The Haar Wavelet Basis
The text provided focuses on the Haar Multivariate Wavelet Basis. This system creates a hierarchical representation of a function using step-like “building blocks.”
1. The Structure (Setting $J=0$)
The analysis starts at the coarsest level ($J=0$), dividing the function into two categories:
- Scaling Coefficients ($\theta_\phi$):
- Symbol: $\Phi_0$
- Role: Represents the Global Trend or average of the function over the domain $[0,1]^d$. The coefficient is simply just the mean of the function.
- Wavelet Coefficients ($\theta_\psi$):
- Symbol: $\Psi_j$ (for levels $j \ge 0$)
- Role: Captures Abrupt Oscillations and details that deviate from the global trend.
- Resolution: As level $j$ increases, the wavelets become narrower and taller, capturing higher-frequency details.
2. Why Haar?
The Haar basis is chosen because projecting a density onto this basis is mathematically equivalent to Equal-Sized Binning or histogramization. The resolution level decides the bin width and the coefficients are the bin heights. This allows researchers to link abstract function space theory directly to the discretization error inherent in statistical testing.
III. The Besov Ball: A “Budgeting Game”
How do we define if a function is “smooth”? The Besov Norm ($|||f|||_{s,2,q}$) measures smoothness by calculating the “cost” of building the function using these wavelet blocks.
\[|||f|||_{s,2,q} := \left[ \sum_{j=0}^{\infty} \underbrace{2^{jsq}}_{\text{Price Tag}} \underbrace{\left( \sum_{\psi \in \Psi_j} |\theta_{\psi}(f)|^2 \right)^{q/2}}_{\text{Energy at Level } j} \right]^{1/q}\]1. The “Price Tag” ($2^{js}$)
This term acts as a weighted penalty.
- Low $j$ (Coarse levels): Cheap. You can use these blocks freely.
- High $j$ (Fine details): Expensive. The cost grows exponentially ($2^{js}$).
2. The Rule of the Besov Ball
To stay inside the Besov Ball (i.e., to have a finite norm), you must be “thrifty.” You are allowed to use high-frequency wavelets (high $j$), but you must use them sparingly (Sparsity).
This definition allows the Besov space to accommodate Spatially Inhomogeneous functions—functions that are smooth in most places but have occasional sharp spikes or jumps.
IV. Comparison: Besov ($B^s_{p,q}$) vs. Hölder ($C^s$)
This is the critical distinction for statistical modeling.
1. The Hölder Class ($C^s$)
- Philosophy: Uniform Regularity (Worst-Case).
- The Rule: The function must be smooth everywhere.
- Sensitivity: If the function has a single “bad” point (a sharp corner, a jump, a spike) anywhere, the entire function is rejected.
- Norm Analogy: Based on $L^\infty$ (Maximum error).
2. The Besov Space ($B^s_{p,q}$)
- Philosophy: Average Regularity.
- The Rule: The function must be smooth on average.
- Flexibility: It tolerates local irregularities (like jumps or spikes) as long as they are spatially sparse. The “cost” of one bad point is averaged out over the smooth regions.
- Norm Analogy: Based on $L^p$ (Integrated error).
3. Specific “Different Functions”
Because of this flexibility, Besov spaces contain functions that are banned from Hölder classes:
| Function Type | Description | Hölder Class ($C^s$) | Besov Space ($B^s_{p,q}$) |
|---|---|---|---|
| Step Function | Flat $\rightarrow$ Jump $\rightarrow$ Flat | REJECTED (Infinite derivative at jump) | ACCEPTED (Jump requires only sparse coefficients) |
| Local Spike | Smooth line with one sharp burst | REJECTED (Fails at the burst location) | ACCEPTED (Averaged out by smooth regions) |
V. Summary
By selecting the Haar Basis and Besov Spaces, one explicitly chooses a framework that can model discontinuities and abrupt changes. Unlike Fourier/Hölder methods, which assume data is uniformly smooth (like a sine wave), Besov/Haar methods assume data may have sharp edges and handle them robustly without “blowing up” the error metric.
Enjoy Reading This Article?
Here are some more articles you might like to read next: