Skip to main content

GELU

Gaussian Error Linear Unit (GELU)

A smooth approximation of ReLU commonly used in transformer models. GELU(x) = x * CDF(x) where CDF is the cumulative distribution function of the standard Gaussian distribution.

Shape Contract:

  • Input: [*shape] arbitrary shape
  • Output: [*shape] same shape as input

Notes:

  • Used in BERT, GPT-2, and many modern transformers
  • Smoother than ReLU, which can help with gradient flow
  • Element-wise operation (preserves all dimensions)

Signature

neuron GELU()

Ports

Inputs:

  • default: [*shape]

Outputs:

  • default: [*shape]

Implementation

Source { source: "core", path: "activations/GELU" }