Lecture 8

Discontinuous Galerkin Methods

Unstructured grids

Most problems have complex domains.
Use unstructured grids.
Use Finite Elements.
How to recapture spectral (like) convergence?

An unstructured grid representing parts of the UK

Communication

Parallel computing splits tasks across cores.
Communicate information as needed - ghost points.
Exascale - millions of GPUs: communication limit.
Bigger “numerical stencils” hit limit earlier.
Simple spectral method couples every point.

When the problem gets too big to solve on one compute core we have to split it across multiple cores. This reduces memory usage (on an individual core) and speeds the calculation as individual tasks are done in parallel and the results later recombined. The sketch figure shows this: the global grid in black is split across three cores. Each core gets about one third of the points. However, each point on the grid needs a neighbour to do the calculation. Therefore there has to be an overlap in the grid points that each core has access to. After the calculation is done on the interior points (shown with solid colour), the overlap (or ghost points) need to be updated. This is done by communicating the data from the other core on which the values have been computed.

The number of parallel cores has been steadily growing, and with the next generation of machines they will be able to do millions of tasks in parallel. That means the amount of data being communicated will be crucial in determining the performance. In general, communication is much slower than calculation - four to six orders of magnitude slower is not unusual.

Finite difference and finite volume methods have an easily understood stencil - the number of neighbours needed to update a single point. The sketch shows the case where two ghost points are needed, which is not unusual for a simple such method. However, methods that rely on solving linear systems, such as simple spectral and finite element methods, effectively couple every point in the domain. The communication costs will stop any gains from parallel computing.

Basis functions

Standard FE function basis continuous at nodes - couples neighbouring elements.
Discontinuous Galerkin basis functions only inside elements:
- only couples to direct neighbour;
- solution multi-valued at element boundary.

The reason why the finite element methods that we have looked at so far have bad communication patterns is that the basis functions used are indicator functions. This means we always have overlaps with other basis functions in two neighbouring elements (at least - in higher dimensions it gets worse). This means that the final global stiffness matrix implicitly couples every point in the domain. Now, fast parallel linear solvers do exist, and the matrix is sparse, so this is not disastrous (unlike the spectral method, where the matrix formed is not sparse, killing performance). However, we can do better.

In the Discontinuous Galerkin method the basis functions only have support inside a single element. We would now typically expect to use multiple basis functions inside a single element to get higher accuracy. This means that basis functions in neighbouring elements only couple through a single value at the boundary, rather than via a complete overlap integral. This minimizes inter-element coupling, massively reducing communication costs.

The downside of this approach is that the resulting solution is not continuous at element boundaries, hence the name of the method. We have to be able to update the basis function coefficients inside the element taking into account the intra element overlaps, and also the jumps at the element boundaries.

Basis functions and weak form

For eg advection \(\partial_t \psi + u \partial_x \psi = 0\), basis

\[ \psi = \sum_a \hat{\psi}_a(t) N_a(x) \]

gives mass-stiffness discrete form

\[ \hat{M}_{ab} \partial_t \hat{\psi}_a + \hat{K}_{ab} \hat{\psi}_a = \hat{F}_b \]

within one single element. \(\hat{\mathbf{F}}\) is inter-element boundary flux.

From the weak form applied to the single element we introduce the basis expansion within that single element alone. Note that we are now indexing using an ordered integer n that counts individual modes, rather than the label A which (at least in principle) has no order at all.

At this stage the mathematical steps are identical to the Galerkin finite element approaches; the weak form, expanded using the basis, gives a mass matrix multiplying the time derivatives, where the mass matrix is the integral over the element of two basis functions. The weak form also gives the stiffness matrix which, for the first derivative advective term, is the integral over the element of one basis function and one differentiated basis function. The force vector links one element to the next through the “flux” of the given mode number between elements.

Legendre polynomials

Mass matrix \(\hat{M}_{ab} = \int N_a N_b\). Choose \(N_a\) orthonormal \(\implies \hat{M}\) identity.

In 1d, map \(x \to \xi \in [-1, 1]\): use Legendre polynomials \(P_a(\xi)\).

Need boundary flux. Have \(P_a(1) = 1, P_a(-1) = (-1)^a\). Use upwind:

\[ \hat{\psi}_a(x; \hat{\psi}_a^{-}, \hat{\psi}_a^{+}) = \begin{cases} \hat{\psi}_a^{-} & u < 0 \\ \hat{\psi}_a^{+} & u > 0 \end{cases} \]

We now pick the basis functions for our convenience. If we can make the mass matrix diagonal then we are following the steps in the spectral method case, where we saw exponential convergence. We do this by choosing the basis functions to be orthogonal polynomials.

In one dimension we can, as in the Galerkin case, map to the reference element. On this reference element the Legendre polynomials have nice convergence properties. Typically when using library functions to provide the Legendre polynomials we need to rescale in order to make them orthonormal rather than orthogonal.

At this point the algorithm would appear very similar to the standard Galerkin case. We have a discrete algorithm using mass and stiffness matrices. One difference is that we have one set of matrices per element which we solve separately for each. They are never combined into a global matrix. A second difference is that the modal coefficients we are solving for do not correspond to the nodal values of the function. Instead they are closer to the spectral case, giving a series approximation separately within each element.

This algorithm would work (given more detailed implementation). It would show spectral, or exponential, convergence if the number of elements were held fixed whilst the number of basis functions per element were increased. It would show polynomial convergence if the number of basis functions were held fixed whilst the number of elements increased. The order of polynomial convergence is linked to the number of basis functions used.

However, this algorithm has one crucial problem. The communication pattern of the algorithm is determined by how many numbers need transferring between elements. This is done through the boundary flux. This modal DG algorithm transfers one number per mode per variable. As the number of modes increases, that becomes prohibitive.

Nodal DG

Count: \(\sum_{n=1}^{N_\text{modes}} \hat{\psi}_n N_n \leftrightarrow \{ \psi( \xi_n ) \equiv \psi_n \}\).

If nodes \(\{ \xi_n \}\) chosen well, solve \(\psi_n\) directly.

Gauss-Lobatto points:

nodes \(\xi = \{ \pm 1 \}\): communicate one number;
natural for Gauss quadrature.

Vandermonde matrix \(\hat{V}_{in} = P_n(\xi_n)\):

\[ \hat{V}_{in} \hat{\psi}_n = \psi_i \, . \]

If communication is the problem, then using the value of the solution itself at the boundary have to be part of the answer. As the solution at the boundary of the element gives a single number for the boundary flux, if that were known directly we would only have to communicate one number.

We can immediately note that, by a counting argument, we should be able to choose nodes inside the element so that the information contained within the solution values at the nodes is precisely the information in the modal coefficients. This is not saying that the values are the same. It is instead saying that there should be a map from one set of information to the other.

The standard choice is to use Gauss-Lobatto points for the nodes. These are like the nodes used in Gauss-Legendre quadrature, except that they force there to be points on the boundaries of the element. It’s essential that we have the boundary points to solve the communication problem. It’s useful to use these points as quadrature formulas are well known, making it easy to compute the integrals needed for the mass and stiffness matrices.

Nodal DG equations

Nodal values \(\psi_i\) also give

\[ M_{ab} \partial_t \psi_a + K_{ab} \psi_a = F_b \, . \]

Use Vandermonde matrix to map to nodes:

\[ \begin{aligned} M &= \left( \hat{V} \hat{V}^T \right)^{-1} \, , \\ K^T &= M \left( \partial_x \hat{V} \right) \hat{V}^{-1} \, . \end{aligned} \]

Force vector is flux on boundary nodes only. Map to reference scales \(M\) by \(\det(J)\), not \(K\).

When working with the nodes rather than the modes we still get the exact form of the equations, with mass and stiffness matrices. We can use the Vandermonde matrix to compute the nodal matrices from the modal case, where they are simpler.

In particular, the modal mass matrix was the identity. To map to the nodal case we need to map to modes and back, hence getting the displayed combination of the Vandermonde matrix. For the stiffness matrix here where only a single derivative is included we get a single derivative of the Vandermonde matrix, and a combination of mappings that is most easily expressed using the mass matrix.

For implementation purposes we have to note that we’ve computed everything on the reference element. To work with multiple elements we need to scale some terms by the determinant of the coordinate transform, which (in this case) applies to the mass matrix but not the stiffness matrix.

Implementation

from numpy.polynomial import legendre
from scipy.integrate import ode
import quadpy
m = 4 # modes
Ne = 10 # Number of elements
GL = quadpy.c1.gauss_lobatto(m+1)
nodes = GL.points
weights = GL.weights
V_hat = legendre.legvander(nodes, m)
c = np.eye(m+1)
for p in range(m+1):
    V_hat[:, p] /= np.sqrt(2/(2*p+1))
    c[p, p] /= np.sqrt(2/(2*p+1))
V_hat_inv = np.linalg.inv(V_hat)
d_V_hat = legendre.legval(nodes, legendre.legder(c)).T
M = np.linalg.inv(V_hat @ V_hat.T)
M_inv = V_hat @ V_hat.T
K = (M @ (d_V_hat @ np.linalg.inv(V_hat))).T

def dpsidt(t, psi):
    rhs = np.zeros_like(psi)
    dpdt = np.zeros_like(psi)
    for e in range(1, Ne+1):
        lo = e*(m+1)
        hi = (e+1)*(m+1)
        rhs[lo:hi] += u * Ks @ psi[lo:hi]
        rhs[lo] += u * psi[lo-1]
        rhs[hi-1] -= u * psi[hi-1]
        dpdt[lo:hi] = (M_inv / dx_e * 2) @ rhs[lo:hi]
    dpdt[:m+1] = dpdt[-2*(m+1):-(m+1)]
    dpdt[-(m+1):] = dpdt[(m+1):2*(m+1)]
    return dpdt

cfl = 0.5/(2*m+1)
r = ode(dpsidt).set_integrator('dopri5', max_step=cfl*dx_e)
r.set_initial_value(psi0)
r.integrate(t_end)

Having put in all that work, here is a sketch implementation.

The largest chunk of code is preliminary setup. We get the Gauss-Lobatto node locations within the reference element from quadpy. We then get the Vandermonde matrix for moving between modal and nodal coefficients from numpy. However, the Legendre definitions are not normalised, so we need to re-normalize them to make the basis functions orthonormal. From this we can define the derivatives of the Vandermonde matrix and hence the mass and stiffness matrices.

The next chunk of code defines the time derivative. Inside each element we set the update initially just using the stiffness matrix. We then correct the update using the boundary fluxes. This is using upwind corrections at the right and left edge of the element. Finally, we multiply by the inverse mass matrix. This is scaled to take into account that the element in real space is a different size to the reference element. Usually, using the explicit inverse is a bad idea. In this case, because the mass matrix is defined using an inversion, it is easy to accurately define the inverse. Boundary conditions are imposed using ghost elements.

The final step uses the method of lines to evolve the solution in time.

The results are first shown after five periods, using only ten elements and five modes (including the zero mode). There is next to no amplitude error, but the phase error is visible.

By increasing the number of modes whilst keeping the number of elements fixed we see exponential convergence. This is another main selling point of DG methods.

Summary

Discontinuous Galerkin methods work on unstructured grids and give spectral convergence.
DG methods are efficient on exascale machines.
Setting up a DG scheme is more work.
Making DG methods work with discontinuous data is hard.