Role of regression model selection and station distribution on the estimation of oceanic anthropogenic carbon change by eMLR

  • Plancherel Y
  • Rodgers K
  • Key R
  • et al.
N/ACitations
Citations of this article
22Readers
Mendeley users who have this article in their library.

Abstract

Abstract. Quantifying oceanic anthropogenic carbon uptake by monitoring interior dissolved inorganic carbon (DIC) concentrations is complicated by the influence of natural variability. The "eMLR method" aims to address this issue by using empirical regression fits of the data instead of the data themselves, inferring the change in anthropogenic carbon in time by difference between predictions generated by the regressions at each time. The advantages of the method are that it provides in principle a means to filter out natural variability, which theoretically becomes the regression residuals, and a way to deal with sparsely and unevenly distributed data. The degree to which these advantages are realized in practice is unclear, however. The ability of the eMLR method to recover the anthropogenic carbon signal is tested here using a global circulation and biogeochemistry model in which the true signal is known. Results show that regression model selection is particularly important when the observational network changes in time. When the observational network is fixed, the likelihood that co-located systematic misfits between the empirical model and the underlying, yet unknown, true model cancel is greater, improving eMLR results. Changing the observational network modifies how the spatio-temporal variance pattern is captured by the respective datasets, resulting in empirical models that are dynamically or regionally inconsistent, leading to systematic errors. In consequence, the use of regression formulae that change in time to represent systematically best-fit models at all times does not guarantee the best estimates of anthropogenic carbon change if the spatial distributions of the stations emphasize hydrographic features differently in time. Other factors, such as a balanced and representative station coverage, vertical continuity of the regression formulae consistent with the hydrographic context and resiliency of the spatial distribution of the residual field can be used to help guide model selection. The characteristic spatial scales of the modes of inter-annual to decadal variability in relation to the size of the North Atlantic, in concert with the station coverage available, place practical limits on the ability of eMLR to fully account for natural variability. Due to its statistical nature, eMLR only efficiently removes the natural variability whose spatial scales are smaller than the system analyzed.

Figures

  • Fig. 1. (a) Change in anthropogenic carbon column inventory, in mol m−2, between July 1995 and 2005 calculated on the original MOM4/TOPAZ grid. (b) Inventory change calculated after mapping the true values sampled at GLODAP stations. (c) Mapping error, difference between (b) and (a) for GLODAP. (d) Mapping error for CLIVAR. (e) Changes in contemporary and (f) natural carbon column inventories between July 1995 and 2005 mapped from GLODAP stations. Station locations are show in green (GLODAP) or magenta (CLIVAR). Both GLODAP and CLIVAR stations are plotted in (a). In (c), (d) and (f), thin dashed (negative) and solid (positive) contour lines are drawn in increment of 6 mol m−2. Thick contours mark 0 mol m−2.
  • Fig. 2. Summary of the best fitting linear models for the July 1995 GLODAP synoptic synthetic dataset. Background colors identify models size classes (1 to 8). (a) Relative frequency (FN /max(FN )) with which models are selected in each size class (minimum root-mean-square error, black bars) and overall (minimum AIC, white bars). Frequency is computed based on the number of model layers (FN ) normalized to the most frequently identified model (max(FN )). (b) Same as (a) but for frequency weighted by the thickness of each layer (FD /max(FD)). (c) Models with with lowest AIC in each size class (black bars) and overall (white bars, red ticks on top and bottom x-axes) and each depth layer. Tick marks on the right show boundaries between model layers. Tick marks on top and bottom show model number (in steps of 5). The first model number of each size class is indicated, except for size classes 1 and 8 (number 1 and 255).
  • Fig. 3. Same as for Fig. 2 but using the July 2005 CLIVAR synoptic synthetic dataset.
  • Fig. 4. Summary of the frequency of occurrence of the variables in the formulae of the best fitting models in each size class given the 1995 GLODAP (a, b) or the 2005 CLIVAR (c, d) stations and the difference between the two cases (e, f). The color scale indicates the total number of times a variable is present summed over each best-fit formula across all size classes for each horizontal layer (vi , the maximum is 8 for each layer). (a, c, e) Relative frequency of occurrence of each variable integrated over all depth layers and normalized to the maximum
  • Fig. 5. (a) AIC values as a function of model number (strategy 2) and depth for the July GLODAP 1995 dataset. All models with AIC values within 10 % of the depth-specific range in AIC of the minimum AIC value at each depth (highlighted in magenta) are highlighted in black. Tick marks on the right show the vertical location of model layers. Corresponding vertical profiles of (b) the depth-specific range in AIC and (c) the minimum AIC values.
  • Fig. 6. Relative error (left y-axis) and absolute value (right y-axis) of the anthropogenic carbon inventory change calculated month-bymonth between 1995 and 2005. (MOM4/TOPAZ, black) Inventory changes calculated from the “true” values on the original model grid, (1Cmapping
  • Fig. 7. Relative (left y-axis) and absolute (right y-axis) error in the change in anthropogenic carbon inventory for strategy 2 (constant model structure for all layers) between July 1995 and July 2005 for all possible 255 first order linear models. Hybrid results obtained by using combinations of regression models specific for 1995 GLODAP and 2005 CLIVAR sampling networks and projected either onto the 1995 GLODAP or the 2005 CLIVAR data (1Chybrid
  • Fig. 8. (a) Absolute errors between the North Atlantic eMLR predicted inventory change, mapped from estimates at GLODAP stations, and the true inventory changes integrated on each horizontal model layer (6h) and for each first order regression model (strategy 2). (b) Vertical profiles of the layer inventory changes and (c) vertically integrated layer inventory change (from the bottom to the surface, 6v). The true, natural and contemporary (Cont.) layer inventory changes between July 1995 and July 2005 are shown, together with the “best AIC” composite solution and results from models Z100 and Z140 (dotted) and their merged products spliced at 1500 m (gray).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Plancherel, Y., Rodgers, K. B., Key, R. M., Jacobson, A. R., & Sarmiento, J. L. (2013). Role of regression model selection and station distribution on the estimation of oceanic anthropogenic carbon change by eMLR. Biogeosciences, 10(7), 4801–4831. https://doi.org/10.5194/bg-10-4801-2013

Readers over time

‘13‘14‘15‘16‘17‘18‘19‘20‘21‘23‘24‘2501234

Readers' Seniority

Tooltip

Researcher 8

53%

PhD / Post grad / Masters / Doc 7

47%

Readers' Discipline

Tooltip

Earth and Planetary Sciences 8

50%

Environmental Science 6

38%

Agricultural and Biological Sciences 1

6%

Economics, Econometrics and Finance 1

6%

Save time finding and organizing research with Mendeley

Sign up for free
0