Demo for the AUS2200 intake catalogue

How to load output for an experiment without knowledge where the output lives on NCI

[1]:
import intake
import cf_xarray
[2]:
catalog = intake.cat.access_nri['AUS2200']

# Until https://github.com/ACCESS-NRI/access-nri-intake-catalog/pull/621 is merged
catalog = catalog.unwrap()

You can also explore datasets in the intake catalog via https://access-nri.github.io/interactive-data-catalogue/#/, which will give you an interactive way to explore them in a browser

List all the datasets available.

[3]:
list(catalog)[:5]
[3]:
['f.AUS2200.6hrPlev.zg.v1-0',
 'f.AUS2200.1hr.pfull.v1-0',
 'f.AUS2200.1hr.rsdsdiff.v1-0',
 'f.AUS2200.1hr.hus.v1-0',
 'f.AUS2200.1hr.va.v1-0']
[4]:
# This is not very helpful - so lets figure out what is in the datastore

catalog

AUS2200 catalog with 61 dataset(s) from 21471 asset(s):

unique
path 21471
file_type 1
realm 2
model_id 1
experiment_id 16
frequency 6
variable_id 48
version 1
time_range 2054
derived_variable_id 0
[5]:
# experiment_id looks like it might be helpful!
catalog.unique().experiment_id
[5]:
['mjo-lanina2018',
 'mjo-elnino2016',
 'coralsea-sstreduced',
 'ashwed1983',
 'mjo-neutral2013',
 'canberra2003',
 'flood2022',
 'ashwed1980',
 'ecoastlow-corclimsst',
 'blacksat2009',
 'ecoastlow-smooth',
 'ecoastlow-evolvsst',
 'ecoastlow-climsst',
 'ecoastlow-tasclimsst',
 'coralsea-sstobs',
 'ecoastlow-fixsst']
[6]:
# Lets pick the Canberra 2003 experiment
experiment = catalog.search(experiment_id='canberra2003')

At the moment we have one dataset for each separate simulation and a combined dataset which includes all simulations.

Example of dataset for a single simulation: canberra03

[7]:
# Lets pick the Canberra 2003 experiment
experiment = catalog.search(experiment_id='canberra2003')
experiment

AUS2200 catalog with 56 dataset(s) from 341 asset(s):

unique
path 341
file_type 1
realm 2
model_id 1
experiment_id 1
frequency 4
variable_id 47
version 1
time_range 35
derived_variable_id 0

What are the available variables?

[8]:
experiment.unique()['variable_id']
[8]:
['rainmxrat',
 'vas',
 'refl',
 'cl',
 'wa',
 'clmed',
 'clw',
 'ta',
 'cli',
 'grplmxrat',
 'hus',
 'pralsns',
 'eow',
 'theta',
 'tke',
 'uas',
 'hfss',
 'va',
 'pralsprof',
 'hfls',
 'ua',
 'pfull',
 'huss',
 'clmxro',
 'rsds',
 'mrsol',
 'psl',
 'rsdt',
 'reflmax',
 'estot',
 'wsgmax10m',
 'evspsbl',
 'lmask',
 'z0',
 'clhigh',
 'cllow',
 'rsut',
 'rss',
 'rsdsdir',
 'orog',
 'mrso',
 'tas',
 'zmla',
 'rlds',
 'rls',
 'rsdsdiff',
 'rlut']

Let’s get one (e.g., the temperature) and do some super-duper analysis!

[9]:
ds = experiment.search(variable_id='tas', frequency="1hr").to_dask()
ds
[9]:
<xarray.Dataset> Size: 2GB
Dimensions:    (time: 96, bnds: 2, lat: 2120, lon: 2600)
Coordinates:
  * time       (time) datetime64[ns] 768B 2003-01-16T00:29:59.999999872 ... 2...
  * lat        (lat) float64 17kB -48.79 -48.77 -48.75 ... -6.871 -6.852 -6.832
  * lon        (lon) float64 21kB 107.5 107.5 107.6 107.6 ... 158.9 159.0 159.0
    height     float64 8B ...
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) datetime64[ns] 2kB dask.array<chunksize=(96, 2), meta=np.ndarray>
    lat_bnds   (lat, bnds) float64 34kB dask.array<chunksize=(2120, 2), meta=np.ndarray>
    lon_bnds   (lon, bnds) float64 42kB dask.array<chunksize=(2600, 2), meta=np.ndarray>
    tas        (time, lat, lon) float32 2GB dask.array<chunksize=(6, 2120, 2600), meta=np.ndarray>
Attributes: (12/57)
    Conventions:                     CF-1.7 ACDD1.3
    creation_date:                   2023-10-19T06:04:51Z
    data_specs_version:              01.00.00
    date_created:                    2023-06-05
    exp_description:                 A limited area model study of the entire...
    external_variables:              areacella
    ...                              ...
    intake_esm_attrs:frequency:      1hr
    intake_esm_attrs:variable_id:    tas
    intake_esm_attrs:version:        v1-0
    intake_esm_attrs:time_range:     200301160030-200301192330
    intake_esm_attrs:_data_format_:  netcdf
    intake_esm_dataset_key:          f.AUS2200.1hr.tas.v1-0
[10]:
tas = ds['tas']
tas
[10]:
<xarray.DataArray 'tas' (time: 96, lat: 2120, lon: 2600)> Size: 2GB
dask.array<open_dataset-tas, shape=(96, 2120, 2600), dtype=float32, chunksize=(6, 2120, 2600), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 768B 2003-01-16T00:29:59.999999872 ... 200...
  * lat      (lat) float64 17kB -48.79 -48.77 -48.75 ... -6.871 -6.852 -6.832
  * lon      (lon) float64 21kB 107.5 107.5 107.6 107.6 ... 158.9 159.0 159.0
    height   float64 8B ...
Attributes:
    standard_name:          air_temperature
    long_name:              Near-Surface Air Temperature
    comment:                near-surface (for access 1.5 meters) air temperature
    units:                  K
    cell_methods:           area: mean time: mean
    cell_measures:          area: areacella
    history:                2023-10-19T06:04:25Z altered by CMOR: Treated sca...
    coverage_content_type:  modelResult
[11]:
tas.cf.sel(time = '2003-01-16T02:30:00').plot()
[11]:
<matplotlib.collections.QuadMesh at 0x15278ab032c0>
../_images/Recipes_AUS2200_intake_demo_18_1.png

Plot a Hovmoller

[12]:
tas.cf.sel(longitude = 130, method='nearest').plot()
[12]:
<matplotlib.collections.QuadMesh at 0x15271aea5310>
../_images/Recipes_AUS2200_intake_demo_20_1.png