Data Parsing Functions

gcmprocpy provides a range of functions for data extraction and manipulation. Below are the key plotting routines along with their detailed parameters and usage examples.

Note

For live examples with output, see the Data Exploration and Data Extraction notebooks.

Data Containers

These dataclasses are used throughout gcmprocpy to hold dataset metadata and extracted plot data.

class gcmprocpy.containers.PlotData(values: numpy.ndarray, variable_unit: str, variable_long_name: str, model: str, filename: str, levs: numpy.ndarray | None = None, lats: numpy.ndarray | None = None, lons: numpy.ndarray | None = None, mtime: list | None = None, mtime_values: list | None = None, selected_lat: float | None = None, selected_lon: float | None = None, selected_lev: float | None = None)[source]

Container for data returned by arr_* functions when plot_mode=True.

values

The extracted variable values (numpy array).

Type:

numpy.ndarray

variable_unit

The unit string after any conversion.

Type:

str

variable_long_name

The long descriptive name of the variable.

Type:

str

model

The model type (‘TIE-GCM’ or ‘WACCM-X’).

Type:

str

filename

The source dataset filename.

Type:

str

levs

Level/ilevel coordinate array (if applicable).

Type:

numpy.ndarray

lats

Latitude coordinate array (if applicable).

Type:

numpy.ndarray

lons

Longitude coordinate array (if applicable).

Type:

numpy.ndarray

mtime

Single model time as [day, hour, min, sec] (for single-time plots).

Type:

list

mtime_values

List of model times (for multi-time plots like lev_time, lat_time).

Type:

list

selected_lat

The latitude value used for selection (if applicable).

Type:

float

selected_lon

The longitude value used for selection (if applicable).

Type:

float

selected_lev

The level value used for selection (if applicable).

Type:

float

Model Defaults

MODEL_DEFAULTS is a dictionary containing model-specific default variable names, species mappings, wind scale factors, and color scheme configurations for TIE-GCM and WACCM-X.

gcmprocpy.containers.MODEL_DEFAULTS = {'TIE-GCM': {'density': {'cmap': 'viridis', 'line_color': 'white', 'vars': ['NE', 'DEN', 'O2', 'O1', 'N2', 'NO', 'N4S', 'HE', 'OP', 'NMF2', 'TEC']}, 'electric': {'cmap': 'bwr', 'line_color': 'black', 'vars': ['POTEN']}, 'electron_density': 'NE', 'species': {'co2': 'CO2', 'h': 'H', 'ho2': 'HO2', 'hox': 'HOX', 'n2': 'N2', 'no': 'NO', 'no2': 'NO2', 'noz': 'NOZ', 'o': 'O1', 'o2': 'O2', 'o3': 'O3', 'oh': 'OH', 'ox': 'OX', 'temp': 'TN'}, 'temperature': 'TN', 'temperature_type': {'cmap': 'inferno', 'line_color': 'white', 'vars': ['TN', 'TE', 'TI', 'QJOULE']}, 'wind': {'cmap': 'bwr', 'line_color': 'black', 'vars': ['UN', 'VN', 'WN', 'UI_ExB', 'VI_ExB', 'WI_ExB']}, 'wind_scale': 0.01, 'wind_u': 'UN', 'wind_v': 'VN', 'wind_w': 'WN'}, 'WACCM-X': {'density': {'cmap': 'viridis', 'line_color': 'white', 'vars': ['EDens', 'OpDens', 'O2p', 'NOp', 'N2p', 'Op', 'ElecColDens', 'O3', 'NO', 'NO2', 'N2O', 'CO', 'CO2', 'CH4', 'H2O', 'HE', 'O', 'O2', 'N2', 'HNO3', 'NOY', 'CLOY', 'BROY']}, 'electric': {'cmap': 'bwr', 'line_color': 'black', 'vars': ['ED1', 'ED2', 'POTEN']}, 'electron_density': 'EDens', 'radiation': {'cmap': 'plasma', 'line_color': 'white', 'vars': ['FSDS', 'FSNS', 'FSNT', 'FLDS', 'FLNS', 'FLNT', 'FLUT', 'QRL_TOT', 'QRS_TOT', 'QRS_EUV', 'QRS_AUR', 'QTHERMAL', 'SWCF', 'LWCF']}, 'species': {'co2': 'CO2', 'h': 'H', 'ho2': 'HO2', 'hox': 'HOX', 'n2': 'N2', 'no': 'NO', 'no2': 'NO2', 'noz': 'NOZ', 'o': 'O', 'o2': 'O2', 'o3': 'O3', 'oh': 'OH', 'ox': 'OX', 'temp': 'T'}, 'temperature': 'T', 'temperature_type': {'cmap': 'inferno', 'line_color': 'white', 'vars': ['T', 'TREFHT', 'THETA']}, 'wind': {'cmap': 'bwr', 'line_color': 'black', 'vars': ['U', 'V', 'OMEGA', 'UTGW_TOTAL', 'VTGW_TOTAL']}, 'wind_scale': 1.0, 'wind_u': 'U', 'wind_v': 'V', 'wind_w': 'OMEGA'}}

dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s

(key, value) pairs

dict(iterable) -> new dictionary initialized as if via:

d = {} for k, v in iterable:

d[k] = v

dict(**kwargs) -> new dictionary initialized with the name=value pairs

in the keyword argument list. For example: dict(one=1, two=2)

Example:

Access default wind variable names for a model.

from gcmprocpy import MODEL_DEFAULTS

# TIE-GCM wind variables
print(MODEL_DEFAULTS['TIE-GCM']['wind_u'])  # 'UN'
print(MODEL_DEFAULTS['TIE-GCM']['wind_v'])  # 'VN'

# WACCM-X wind variables
print(MODEL_DEFAULTS['WACCM-X']['wind_u'])  # 'U'
print(MODEL_DEFAULTS['WACCM-X']['wind_v'])  # 'V'

# Species name mapping
print(MODEL_DEFAULTS['TIE-GCM']['species']['temp'])  # 'TN'
print(MODEL_DEFAULTS['WACCM-X']['species']['temp'])  # 'T'

# Wind unit scale factor (cm/s → m/s for TIE-GCM)
print(MODEL_DEFAULTS['TIE-GCM']['wind_scale'])  # 0.01

Species Name Lookup

gcmprocpy.containers.get_species_names(model)[source]

Return species name mapping for a model type.

Uses MODEL_DEFAULTS as the single source of truth for mapping canonical role names to dataset variable names.

Parameters:

model (str) – Model type ('TIE-GCM' or 'WACCM-X').

Returns:

Mapping from canonical names (e.g. 'temp', 'o', 'o2') to dataset variable names (e.g. 'TN', 'O1', 'O2').

Return type:

dict

Raises:

ValueError – If model is not recognized.

Example:

Get species variable names for a specific model.

from gcmprocpy import get_species_names

sp = get_species_names('TIE-GCM')
print(sp['temp'])  # 'TN'
print(sp['o'])     # 'O1'
print(sp['o2'])    # 'O2'

sp = get_species_names('WACCM-X')
print(sp['temp'])  # 'T'
print(sp['o'])     # 'O'

Data Exploration

Listing Dimensions

This function reads all the datasets and returns the unique dimensions present.

gcmprocpy.data_parse.dim_list(datasets)[source]

Retrieves a sorted list of unique dimension names across all datasets.

Parameters:

datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

Returns:

A sorted list of unique dimension names across all datasets.

Return type:

list

Example:

Load datasets and list unique dimensions.

datasets = gy.load_datasets(directory, dataset_filter)
dims = gy.dim_list(datasets)
print(dims)

Listing Variables

This function reads all the datasets and reutrns the variables listed in there.

gcmprocpy.data_parse.var_list(datasets)[source]

Reads all the datasets and returns the variables listed in them.

Parameters:

datasets (xarray.Dataset) – The loaded dataset opened using xarray.

Returns:

A sorted list of variable entries in the datasets.

Return type:

list

Example:

Load datasets and list unique variables.

datasets = gy.load_datasets(directory, dataset_filter)
vars = gy.var_list(datasets)
print(vars)

Listing Timestamps

This function compiles and returns a list of all timestamps present in the provided datasets.

gcmprocpy.data_parse.time_list(datasets)[source]

Compiles and returns a list of all timestamps present in the provided datasets. This function is particularly useful for aggregating time data from multiple sources.

Parameters:

datasets (list of tuples) – Each tuple in the list contains an xarray dataset and its corresponding filename. The function will iterate through each dataset to gather timestamps.

Returns:

A list containing all the datetime64 timestamps found in the datasets.

Return type:

list of np.datetime64

Example:

Load datasets and list unique timestamps.

datasets = gy.load_datasets(directory, dataset_filter)
times = gy.time_list(datasets)
print(times)

Listing Levels

This function reads all the datasets and returns the unique lev and ilev entries in sorted order.

gcmprocpy.data_parse.level_list(datasets, log_level=True)[source]

Reads all the datasets and returns the unique lev and ilev entries in sorted order.

Parameters:
  • datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

  • log_level (bool) – A flag indicating whether to display level in log values. Default is True.

Returns:

A sorted list of unique lev and ilev entries from the datasets.

Return type:

lev_ilevs (list)

Example:

Load datasets and list unique lev and ilev entries.

datasets = gy.load_datasets(directory, dataset_filter)
lev_ilevs = gy.level_list(datasets)
print(lev_ilevs)

Listing Longitudes

This function reads all the datasets and returns the unique longitude (lon) entries in sorted order.

gcmprocpy.data_parse.lon_list(datasets)[source]

Reads all the datasets and returns the unique longitude (lon) entries in sorted order.

Parameters:

datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

Returns:

A sorted list of unique longitude entries from the datasets.

Return type:

list

Example:

Load datasets and list unique longitude entries.

datasets = gy.load_datasets(directory, dataset_filter)
lons = gy.lon_list(datasets)
print(lons)

Listing Latitudes

This function reads all the datasets and returns the unique latitude (lat) entries in sorted order.

gcmprocpy.data_parse.lat_list(datasets)[source]

Reads all the datasets and returns the unique latitude (lat) entries in sorted order.

Parameters:

datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

Returns:

A sorted list of unique latitude entries from the datasets.

Return type:

list

Example:

Load datasets and list unique latitude entries.

datasets = gy.load_datasets(directory, dataset_filter)
lats = gy.lat_list(datasets)
print(lats)

Variable Information

This function provides detailed information about a specific variable in the datasets.

gcmprocpy.data_parse.var_info(datasets, variable_name)[source]

Retrieves the attributes and dimension information of a specified variable from all datasets.

Parameters:
  • datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

  • variable_name (str) – The name of the variable to retrieve attributes for.

Returns:

A dictionary where keys are filenames and values are dictionaries of attributes for the specified variable.

Return type:

dict

Example:

Load datasets and get information about a specific variable.

datasets = gy.load_datasets(directory, dataset_filter)
info = gy.var_info(datasets, 'variable_name')
print(info)

Dimension Information

This function provides detailed information about a specific dimension in the datasets.

gcmprocpy.data_parse.dim_info(datasets, dimension)[source]

Retrieves information about a specified dimension’s size across all datasets.

Parameters:
  • datasets (list of tuples) – A list of tuples, where each tuple contains an xarray dataset and its filename.

  • dimension (str) – The name of the dimension to retrieve information for.

Returns:

A dictionary where keys are filenames and values are the size of the specified dimension.

If the dimension does not exist in a dataset, the value is None.

Return type:

dict

Example:

Load datasets and get information about a specific dimension.

datasets = gy.load_datasets(directory, dataset_filter)
info = gy.dim_info(datasets, 'dimension_name')
print(info)

Data Xarrays

Selected Time

This function extracts and processes data for a given variable at a specific time from multiple datasets. It also handles unit conversion and provides additional information if needed for plotting.

gcmprocpy.data_parse.arr_var(datasets, variable_name, time, selected_unit=None, log_level=True, plot_mode=False)
Example:

Extract all level data for a variable at a specific time.

datasets = gy.load_datasets(directory, dataset_filter)
time_value = '2022-01-01T12:00:00'

# Get raw xarray DataArray
data = gy.arr_var(datasets, 'TN', time=time_value)
print(data.shape)  # (nlev, nlat, nlon)

# Get PlotData object with metadata
result = gy.arr_var(datasets, 'TN', time=time_value, plot_mode=True)
print(result.variable_unit, result.long_name)

# Using model time (TIE-GCM)
data = gy.arr_var(datasets, 'TN', mtime=[360, 0, 0, 0])

Selected Time, Level

This function extracts data from the dataset based on the specified variable, time, and level (lev/ilev).

gcmprocpy.data_parse.arr_lat_lon(datasets, variable_name, time, selected_lev_ilev=None, selected_unit=None, plot_mode=False)
Example:

Extract a latitude-longitude slice at a specific time and pressure level.

datasets = gy.load_datasets(directory, dataset_filter)

# Raw xarray DataArray (lat x lon)
data = gy.arr_lat_lon(datasets, 'TN', time='2022-01-01T12:00:00', selected_lev_ilev=4.0)
print(data.shape)  # (nlat, nlon)

# PlotData object for use with custom plotting
result = gy.arr_lat_lon(datasets, 'TN', time='2022-01-01T12:00:00',
                        selected_lev_ilev=4.0, plot_mode=True)
print(result.lats, result.lons, result.values.shape)

# Using model time (TIE-GCM)
data = gy.arr_lat_lon(datasets, 'TN', mtime=[360, 0, 0, 0], selected_lev_ilev=4.0)

# Specify level as height in km
data = gy.arr_lat_lon(datasets, 'TN', time='2022-01-01T12:00:00',
                      selected_lev_ilev=300.0, level_type='height')

Batch Selected Time, Level (Multiple Variables)

This function extracts multiple variables at once for a given time and level, reducing redundant dataset traversal.

gcmprocpy.data_parse.batch_arr_lat_lon(datasets, variable_names, time, selected_lev_ilev=None, selected_unit=None, plot_mode=False)
Example:

Load datasets and extract multiple variables in a single pass.

datasets = gy.load_datasets(directory, dataset_filter)
results = gy.batch_arr_lat_lon(datasets, ['TN', 'O1', 'NO'], time=time_value, selected_lev_ilev=4.0, plot_mode=True)
for name, result in results.items():
    print(f'{name}: {result.values.shape}')

Selected Time, Latitude, Longitude

This function extracts data from the dataset for a given variable name, latitude, longitude, and time.

gcmprocpy.data_parse.arr_lev_var(datasets, variable_name, time, selected_lat, selected_lon, selected_unit=None, log_level=True, plot_mode=False)
Example:

Extract a vertical profile at a specific latitude, longitude, and time.

datasets = gy.load_datasets(directory, dataset_filter)

# Raw xarray DataArray (1D array of values at each level)
data = gy.arr_lev_var(datasets, 'TN', latitude=30.0,
                      time='2022-01-01T12:00:00', longitude=45.0)

# PlotData object with level information
result = gy.arr_lev_var(datasets, 'TN', latitude=30.0,
                        time='2022-01-01T12:00:00', longitude=45.0,
                        plot_mode=True)
print(result.levs, result.values)

# Using local time instead of longitude
data = gy.arr_lev_var(datasets, 'TN', latitude=0.0,
                      time='2022-01-01T12:00:00', local_time=12.0)

Variable vs Latitude (Meridional 1D)

This function extracts a 1D meridional profile of a variable along latitude at a fixed pressure level and longitude (or zonal mean).

gcmprocpy.data_parse.arr_var_lat(datasets, variable_name, time, selected_lev_ilev, selected_lon, selected_unit=None, plot_mode=False)
Example:

Extract a 1D meridional slice at a specific level, time, and longitude.

datasets = gy.load_datasets(directory, dataset_filter)

# PlotData object with 1D values aligned to latitudes
result = gy.arr_var_lat(datasets, 'TN',
                        time='2022-01-01T12:00:00',
                        selected_lev_ilev=4.0, selected_lon=30.0,
                        plot_mode=True)
print(result.lats, result.values)

# Zonal mean across all longitudes
result = gy.arr_var_lat(datasets, 'TN',
                        time='2022-01-01T12:00:00',
                        selected_lev_ilev=4.0, selected_lon='mean',
                        plot_mode=True)

Variable vs Longitude (Zonal 1D)

This function extracts a 1D zonal profile of a variable along longitude at a fixed pressure level and latitude (or meridional mean).

gcmprocpy.data_parse.arr_var_lon(datasets, variable_name, time, selected_lev_ilev, selected_lat, selected_unit=None, plot_mode=False)
Example:

Extract a 1D zonal slice at a specific level, time, and latitude.

datasets = gy.load_datasets(directory, dataset_filter)

# PlotData object with 1D values aligned to longitudes
result = gy.arr_var_lon(datasets, 'TN',
                        time='2022-01-01T12:00:00',
                        selected_lev_ilev=4.0, selected_lat=2.5,
                        plot_mode=True)
print(result.lons, result.values)

# Meridional mean across all latitudes
result = gy.arr_var_lon(datasets, 'TN',
                        time='2022-01-01T12:00:00',
                        selected_lev_ilev=4.0, selected_lat='mean',
                        plot_mode=True)

# Area-weighted meridional mean (cos-lat) — see note below
result = gy.arr_var_lon(datasets, 'TN',
                        time='2022-01-01T12:00:00',
                        selected_lev_ilev=4.0, selected_lat='wmean',
                        plot_mode=True)

Note

``’mean’`` vs ``’wmean’``. Anywhere a selected_lat / selected_lon accepts 'mean' (an unweighted average over that axis), it also accepts 'wmean' for a cos(lat) area-weighted average. Weighting only changes the result when latitude is the collapsed axis (cells around a longitude circle are equal-area), so 'wmean' matters for meridional and global means — where a plain mean over-weights the poles — and is identical to 'mean' for a zonal (longitude) mean. For a true global mean use selected_lat='wmean', selected_lon='mean' in arr_lev_var().

Selected Time Latitude

This function extracts and processes data from the dataset based on a specific variable, time, and latitude.

gcmprocpy.data_parse.arr_lev_lon(datasets, variable_name, time, selected_lat, selected_unit=None, log_level=True, plot_mode=False)
Example:

Extract a level-longitude cross section at a specific latitude and time.

datasets = gy.load_datasets(directory, dataset_filter)

# Raw xarray DataArray (nlev x nlon)
data = gy.arr_lev_lon(datasets, 'TN', latitude=30.0,
                      time='2022-01-01T12:00:00')
print(data.shape)

# PlotData object for custom contour plotting
result = gy.arr_lev_lon(datasets, 'TN', latitude=30.0,
                        time='2022-01-01T12:00:00', plot_mode=True)
print(result.levs, result.lons, result.values.shape)

Selected Time, Longitude

This function extracts data from a dataset based on the specified variable name, time, and longitude.

gcmprocpy.data_parse.arr_lev_lat(datasets, variable_name, time, selected_lon, selected_unit=None, log_level=True, plot_mode=False)
Example:

Extract a level-latitude cross section at a specific longitude and time.

datasets = gy.load_datasets(directory, dataset_filter)

# Raw xarray DataArray (nlev x nlat)
data = gy.arr_lev_lat(datasets, 'TN', time='2022-01-01T12:00:00',
                      selected_lon=45.0)
print(data.shape)

# PlotData object
result = gy.arr_lev_lat(datasets, 'TN', time='2022-01-01T12:00:00',
                        selected_lon=45.0, plot_mode=True)
print(result.levs, result.lats, result.values.shape)

Selected Latitude, Longitude Over Time-range

This function extracts and processes data from multiple datasets using data across different levels and times for a given latitude and longitude.

gcmprocpy.data_parse.arr_lev_time(datasets, variable_name, selected_lat, selected_lon, selected_unit=None, log_level=True, plot_mode=False)
Example:

Extract a level-time cross section at a specific latitude and longitude.

datasets = gy.load_datasets(directory, dataset_filter)

# Raw xarray DataArray (nlev x ntime)
data = gy.arr_lev_time(datasets, 'TN', latitude=30.0, longitude=45.0)
print(data.shape)

# With time range filter
data = gy.arr_lev_time(datasets, 'TN', latitude=30.0, longitude=45.0,
                       time_minimum='2022-01-01T00:00:00',
                       time_maximum='2022-01-02T00:00:00')

# PlotData object
result = gy.arr_lev_time(datasets, 'TN', latitude=30.0, longitude=45.0,
                         plot_mode=True)
print(result.levs, result.times, result.values.shape)

Selected Level, Longitude Over Time-range

This function extracts and processes data from the dataset based on the specified variable name, longitude, and level/ilev.

gcmprocpy.data_parse.arr_lat_time(datasets, variable_name, selected_lon, selected_lev_ilev=None, selected_unit=None, plot_mode=False)
Example:

Extract a latitude-time cross section at a specific level and longitude.

datasets = gy.load_datasets(directory, dataset_filter)

# Raw xarray DataArray (nlat x ntime)
data = gy.arr_lat_time(datasets, 'TN', selected_lev_ilev=4.0,
                       longitude=45.0)
print(data.shape)

# With time range filter
data = gy.arr_lat_time(datasets, 'TN', selected_lev_ilev=4.0,
                       longitude=45.0,
                       time_minimum='2022-01-01T00:00:00',
                       time_maximum='2022-01-02T00:00:00')

# PlotData object
result = gy.arr_lat_time(datasets, 'TN', selected_lev_ilev=4.0,
                         longitude=45.0, plot_mode=True)
print(result.lats, result.times, result.values.shape)

# Specify level as height in km
data = gy.arr_lat_time(datasets, 'TN', selected_lev_ilev=300.0,
                       longitude=0.0, level_type='height')

Selected Level, Latitude Over Time-range

This function extracts and processes data from the dataset based on the specified variable name, latitude, and level/ilev. Returns a 2D array of longitudes x time.

gcmprocpy.data_parse.arr_lon_time(datasets, variable_name, selected_lat, selected_lev_ilev=None, selected_unit=None, plot_mode=False)
Example:

Extract a longitude-time cross section at a specific level and latitude.

datasets = gy.load_datasets(directory, dataset_filter)

# Raw xarray DataArray (nlon x ntime)
data = gy.arr_lon_time(datasets, 'TN', latitude=0.0,
                       selected_lev_ilev=4.0)
print(data.shape)

# PlotData object
result = gy.arr_lon_time(datasets, 'TN', latitude=0.0,
                         selected_lev_ilev=4.0, plot_mode=True)
print(result.lons, result.times, result.values.shape)

# Specify level as height in km
data = gy.arr_lon_time(datasets, 'TN', latitude=0.0,
                       selected_lev_ilev=250.0, level_type='height')

Selected Level, Latitude, Longitude Over Time-range

This function extracts a 1D time series of a variable at a specific latitude, longitude, and level/ilev.

gcmprocpy.data_parse.arr_var_time(datasets, variable_name, selected_lat, selected_lon, selected_lev_ilev=None, selected_unit=None, plot_mode=False)
Example:

Extract a time series at a specific latitude, longitude, and level.

datasets = gy.load_datasets(directory, dataset_filter)

# Raw xarray DataArray (1D time series)
data = gy.arr_var_time(datasets, 'TN', latitude=0.0, longitude=45.0,
                       selected_lev_ilev=4.0)
print(data.shape)

# PlotData object
result = gy.arr_var_time(datasets, 'TN', latitude=0.0, longitude=45.0,
                         selected_lev_ilev=4.0, plot_mode=True)
print(result.times, result.values)

# Specify level as height in km
data = gy.arr_var_time(datasets, 'TN', latitude=0.0, longitude=45.0,
                       selected_lev_ilev=300.0, level_type='height')

Satellite Track Interpolation

This function interpolates model data along a satellite trajectory using trilinear interpolation (time, latitude, longitude). Input is three arrays of equal length representing the satellite’s position at each point along its orbit.

gcmprocpy.data_parse.arr_sat_track(datasets, variable_name, sat_time, sat_lat, sat_lon, selected_lev_ilev=None, selected_unit=None, plot_mode=False)[source]

Interpolates model data along a satellite trajectory.

Takes arrays of satellite time/lat/lon points and interpolates the model field to those locations using xarray’s built-in interpolation.

Parameters:
  • datasets (list[ModelDataset]) – Loaded model datasets.

  • variable_name (str) – The name of the variable to extract.

  • sat_time (array-like) – Satellite timestamps as numpy datetime64 values.

  • sat_lat (array-like) – Satellite latitudes in degrees.

  • sat_lon (array-like) – Satellite longitudes in degrees.

  • selected_lev_ilev (Union[float, str, None]) – Level value to extract at, ‘mean’ to average over all levels, or None to return all levels.

  • selected_unit (str, optional) – Desired unit for the variable.

  • plot_mode (bool, optional) – If True, returns a PlotData object.

Returns:

If selected_lev_ilev is given: 1D array of shape (n_points,). If selected_lev_ilev is None: 2D array of shape (n_levels, n_points). If plot_mode is True, returns a PlotData object.

Return type:

Union[numpy.ndarray, PlotData]

Example:

Interpolate temperature along a satellite track.

import numpy as np
datasets = gy.load_datasets(directory, dataset_filter)
times = gy.time_list(datasets)

sat_time = np.array([times[0] + np.timedelta64(i * 6, 'm') for i in range(20)])
sat_lat = np.linspace(-60, 60, 20)
sat_lon = np.linspace(-120, 120, 20)

# 1D array at a specific level
values = gy.arr_sat_track(datasets, 'TN', sat_time, sat_lat, sat_lon, selected_lev_ilev=5.0)

# 2D array (levels x track points)
values = gy.arr_sat_track(datasets, 'TN', sat_time, sat_lat, sat_lon)

# PlotData object
result = gy.arr_sat_track(datasets, 'TN', sat_time, sat_lat, sat_lon, selected_lev_ilev=5.0, plot_mode=True)

Data Utilities

mTime to Time

This function searches for a specific time in a dataset based on the provided model time (mtime) and returns the corresponding np.datetime64 time value. It iterates through multiple datasets to find a match.

gcmprocpy.data_parse.get_time(datasets, mtime)[source]

Searches for a specific time in a dataset based on the provided model time (mtime) and returns the corresponding np.datetime64 time value. It iterates through multiple datasets to find a match.

Parameters:
  • datasets (list[tuple]) – Each tuple contains an xarray dataset and its filename. The function will search each dataset for the time value.

  • mtime (list[int]) – Model time represented as a list of integers in the format [day, hour, minute].

Returns:

The corresponding datetime value in the dataset for the given mtime. Returns None if no match is found.

Return type:

np.datetime64

Example:

Convert a model time (mtime) to a datetime value.

datasets = gy.load_datasets(directory, dataset_filter)

# TIE-GCM model time: [Day, Hour, Min, Sec]
mtime = [360, 0, 0, 0]
time = gy.get_time(datasets, mtime)
print(time)  # np.datetime64 value

Time to mTime

This function finds and returns the model time (mtime) array that corresponds to a specific time in a dataset. The mtime is an array representing [Day, Hour, Min].

gcmprocpy.data_parse.get_mtime(ds, time)[source]

Finds and returns the model time (mtime) array that corresponds to a specific time in a dataset. The mtime is an array representing [Day, Hour, Min].

Parameters:
  • ds (xarray.Dataset) – The dataset opened using xarray, containing time and mtime data.

  • time (Union[str, numpy.datetime64]) – The timestamp for which the corresponding mtime is to be found.

Returns:

The mtime array containing [Day, Hour, Min] for the given timestamp.

Returns None if no corresponding mtime is found.

Return type:

numpy.ndarray

Example:

Convert a datetime string to model time (mtime).

datasets = gy.load_datasets(directory, dataset_filter)

# Get mtime for a specific datetime
mtime = gy.get_mtime(datasets, '2022-01-01T12:00:00')
print(mtime)  # e.g., [1, 12, 0] for Day 1, 12:00

# Use with time_list to convert all times
times = gy.time_list(datasets)
for t in times[:5]:
    mt = gy.get_mtime(datasets, t)
    print(f'{t} -> mtime {mt}')

Data Caching

All arr_* data extraction functions (and derived-variable handlers) are transparently memoized by a bounded LRU cache. Repeated calls with the same (datasets, variable, time, level, ...) tuple return the cached result in O(1), which speeds up timeline scrubbing, re-plotting, and composite plots that extract the same field multiple times.

The cache is keyed on the Python identity (id) of the datasets list plus all positional and keyword arguments (lists are normalized to tuples so batch calls cache correctly). Unhashable arguments (e.g. raw numpy arrays in arr_sat_track) transparently bypass the cache.

gcmprocpy.containers.clear_data_cache()[source]

Drop all cached results. Call on dataset reload.

The default cache holds up to 128 entries and evicts least-recently-used results. Call clear_data_cache() after reloading datasets or otherwise mutating them in place, so that stale results don’t leak across sessions. The GUI does this automatically on dataset reload.

clear_derived_cache is kept as a backwards-compatible alias for clear_data_cache.

Example:

Invalidate the cache after reloading datasets.

from gcmprocpy import clear_data_cache, load_datasets

datasets = load_datasets(directory, dataset_filter)
# ... use datasets ...

# Reload from disk — drop stale cached extractions
datasets = load_datasets(directory, dataset_filter)
clear_data_cache()

Height Interpolation

gcmprocpy supports converting between pressure levels and geometric height (km) using the model’s height variable (ZG for TIE-GCM, Z3 for WACCM-X). This enables specifying levels as heights and plotting vertical axes in km instead of pressure coordinates.

Height to Pressure Level

This function converts a target height in km to the nearest pressure level by looking up the model’s geometric height field (ZG or Z3).

gcmprocpy.data_parse.height_to_pres_level(datasets, time, target_height_km, latitude=None, longitude=None)[source]

Convert a target height (km) to the nearest pressure level.

Finds the pressure level whose average geometric height is closest to the requested height. Optionally narrows to a specific lat/lon.

Parameters:
  • datasets – Loaded datasets.

  • time – Timestamp for height lookup.

  • target_height_km (float) – Desired height in km.

  • latitude (float, optional) – Latitude to evaluate height at.

  • longitude (float, optional) – Longitude to evaluate height at.

Returns:

The pressure level (lev or ilev value) closest to target_height_km.

Return type:

float

Example:

Find the pressure level closest to 300 km altitude.

datasets = gy.load_datasets(directory, dataset_filter)
time = '2022-01-01T12:00:00'

# Global average — find the level whose mean height is closest to 300 km
pres_level = gy.height_to_pres_level(datasets, time, 300.0)
print(f'300 km ≈ pressure level {pres_level}')

# At a specific location
pres_level = gy.height_to_pres_level(datasets, time, 300.0, latitude=0.0, longitude=45.0)
print(f'300 km at equator, 45°E ≈ pressure level {pres_level}')

Interpolate to Height

This function interpolates a 2D field from pressure levels to constant height surfaces using the model’s geometric height field. Supports both linear and exponential (log) interpolation.

gcmprocpy.data_parse.interpolate_to_height(datasets, variable_values, levs, time, target_heights=None, n_heights=50, log_interp=False)[source]

Interpolate a field from pressure levels to constant height surfaces.

Parameters:
  • datasets – Loaded datasets (to access ZG/Z3).

  • variable_values (np.ndarray) – 2D array (nlev, nlat) or (nlev, nlon) on pressure levels.

  • levs (np.ndarray) – Pressure level coordinate values matching axis 0 of variable_values.

  • time – Timestamp for height field lookup.

  • target_heights (np.ndarray, optional) – Desired height levels in km. If None, auto-generates n_heights levels spanning the data range.

  • n_heights (int) – Number of height levels if target_heights is None.

  • log_interp (bool) – If True, use exponential interpolation (for densities).

Returns:

(interpolated_values, target_heights_km)

interpolated_values: 2D array (n_heights, n_spatial) target_heights_km: 1D array of height levels in km

Return type:

tuple

Example:

Interpolate a latitude-altitude cross section from pressure to height coordinates.

import numpy as np
datasets = gy.load_datasets(directory, dataset_filter)
time = '2022-01-01T12:00:00'

# Extract lev vs lat data on pressure levels
result = gy.arr_lev_lat(datasets, 'TN', time, selected_lon=0.0, plot_mode=True)

# Interpolate to 40 evenly spaced height levels
interp_values, heights_km = gy.interpolate_to_height(
    datasets, result.values, result.levs, time, n_heights=40)
print(f'Height range: {heights_km[0]:.1f}{heights_km[-1]:.1f} km')
print(f'Interpolated shape: {interp_values.shape}')

# Interpolate to specific heights
target_heights = np.array([100, 200, 300, 400, 500])
interp_values, _ = gy.interpolate_to_height(
    datasets, result.values, result.levs, time, target_heights=target_heights)

# Use exponential interpolation for density-like variables
ne_result = gy.arr_lev_lat(datasets, 'NE', time, selected_lon=0.0, plot_mode=True)
interp_ne, heights = gy.interpolate_to_height(
    datasets, ne_result.values, ne_result.levs, time, log_interp=True)

Height in Plot Functions

All plot functions that accept a level parameter also accept level_type to specify whether the level value is a pressure level (default) or a height in km. When level_type='height', the height is automatically converted to the nearest pressure level using the model’s geometric height field (ZG for TIE-GCM, Z3 for WACCM-X).

All level-axis plots (plt_lev_var, plt_lev_lon, plt_lev_lat, plt_lev_time) also accept y_axis='height' to display the vertical axis in km instead of pressure coordinates.

Example:

Specify a level as height instead of pressure.

datasets = gy.load_datasets(directory, dataset_filter)

# Lat-lon plot at 300 km altitude (automatically finds nearest pressure level)
plot = gy.plt_lat_lon(datasets, 'TN', time='2022-01-01T12:00:00',
                      level=300.0, level_type='height')

# Latitude vs time at 400 km altitude
plot = gy.plt_lat_time(datasets, 'TN', level=400.0, level_type='height',
                       longitude=0.0)

# Longitude vs time at 250 km altitude
plot = gy.plt_lon_time(datasets, 'TN', latitude=0.0, level=250.0,
                       level_type='height')

# Variable vs time at 300 km altitude
plot = gy.plt_var_time(datasets, 'TN', latitude=0.0, longitude=0.0,
                       level=300.0, level_type='height')
Example:

Plot vertical axis in km instead of pressure.

datasets = gy.load_datasets(directory, dataset_filter)

# Vertical profile with height axis
plot = gy.plt_lev_var(datasets, 'TN', latitude=0.0,
                      time='2022-01-01T12:00:00', longitude=0.0,
                      y_axis='height')

# Longitude cross-section with height axis
plot = gy.plt_lev_lon(datasets, 'TN', latitude=0.0,
                      time='2022-01-01T12:00:00', y_axis='height')

# Latitude cross-section with height axis
plot = gy.plt_lev_lat(datasets, 'TN', time='2022-01-01T12:00:00',
                      longitude=0.0, y_axis='height')

# Level vs time with height axis
plot = gy.plt_lev_time(datasets, 'TN', latitude=0.0, longitude=0.0,
                       y_axis='height')