Purpose: Speech-language pathologists (SLPs) typically examine narrative performance when completing a comprehensive language assessment. However, there is significant variability in the methodologies used to evaluate narration. The primary aims of this systematic review and meta-analysis were to a) investigate how narrative assessment type (e.g., macrostructure, microstructure, internal state language) differentiates typically developing (TD) children from children with developmental language disorder (DLD), or, TD–DLD group differences, b) identify specific narrative assessment measures (e.g., number of different words) that result in greater TD–DLD differences, and, c) evaluate participant and sample characteristics (e.g., DLD inclusionary criteria) that may uniquely influence performance differences. Method: Three electronic databases (PsychInfo, ERIC, and PubMed) and ASHAWire were searched on July 30, 2019 to locate studies that reported oral narrative language measures for both DLD and TD groups between ages 4 and 12 years; studies focusing on written narration or other developmental disorders only were excluded. Thirty-seven primary studies were identified via a three-step study selection procedure. We extracted data related to the sample participants, the narrative task(s) and assessment measures, and research design. Standardized mean differences using a bias-corrected Hedges’ $g$ were the calculated effect sizes ($N = 382$). Research questions were analyzed using mixed-effects meta-regression with robust variance estimation to account for effect size dependencies. Results: Searches identified eligible studies published between 1987 and 2019. An overall meta-analysis using 382 effect sizes obtained across 37 studies showed that children with DLD had decreased narrative performance relative to TD peers, with summary estimates ranging from -0.850, 95% CI [-1.016, -0.685] to -0.794, 95% CI [-0.963, -0.624], depending on the correlation assumed. Across all models, effect size estimates showed significant heterogeneity both between and within studies, even after accounting for effect size-, sample-, and study-level predictors. Grammatical accuracy (microstructure) and story grammar (macrostructure) yielded the most consistent evidence of significant TD–DLD group differences across statistical models. Conclusions: Present findings suggest some narrative assessment measures may yield significantly different performance between children with and without DLD. However, researchers need to be consistent in their inclusionary criteria, their description of sample characteristics, and in their reporting of the correlations of measures, in order to determine which assessment measures are more likely to yield group differences.