Skip to content

neptoon.hub

crns_data_hub

Classes:

Functions:

CRNSDataHub

CRNSDataHub(crns_data_frame, flags_data_frame=None, sensor_info=None, quality_assessor=None, validation=True, calibration_samples_data=None, data_storage=None)

The CRNSDataHub is used to manage the time series data throughout the processing steps. Some key features:

  • It stores a DataFrame for a site
  • As we progress through the steps, data can be added to the DataFrame and the shadow DataFrame's updated.

Raw data is checked against the RawDataSchema which is a first line of defense against incorrectly formatted tables. Should a fail happen here data must be either reformatted using one of the provided routines or manually formatted to match the standard.

Inputs to the CRNSDataHub.

Parameters:

Name Type Description Default
crns_data_frame DataFrame

CRNS data in a dataframe format. It will be validated to ensure it has been formatted correctly.

required
configuration_manager ConfigurationManager

A ConfigurationManager instance storing configuration YAML information, by default None

required
quality_assessor SaQC

SaQC object which is used for quality assessment. Used for the creation of flags to define poor data.

None
validation bool

Toggle for whether to have continued validation of data tables during processing (see data_management>data_validation_tables.py for examples of tables being validated). These checks ensure data is correctly formatted for internal processing.

True
calibration_samples_data DataFrame

The sample data taken during the calibration campaign.

None
data_storage DataStorageConfig

Data storage configuration from the sensor config. When provided, save_data() uses save_location as the default output directory instead of cwd().

None

validate_dataframe

validate_dataframe(schema)

Validates the dataframe against a pandera schema See data_validation_table.py for schemas.

Parameters:

Name Type Description Default
schema str

The name of the schema to use for the check.

required

attach_nmdb_data

attach_nmdb_data(station='JUNG', new_column_name=str(INCOMING_NEUTRON_INTENSITY), resolution='60', nmdb_table='revori', reference_value=None)

Utilises the NMDBDataAttacher class to attach NMDB incoming intensity data to the crns_data_frame. Collects data using www.NMDB.eu

See NMDBDataAttacher documentation for more information.

Parameters:

Name Type Description Default
station str

The station to collect data from, by default "JUNG"

'JUNG'
new_column_name str

The name of the column were data will be written to, by default "incoming_neutron_intensity"

str(INCOMING_NEUTRON_INTENSITY)
resolution str

The resolution in minutes, by default "60"

'60'
nmdb_table str

The table to pull data from, by default "revori"

'revori'
reference_value int

The reference value of the neutron monitor, if left as None it will use the value from the first data point in the time series.

None
Report

Neutron monitoring data was attached from NMDB.eu. The station used was {station} at a resolution of {resolution} minutes. The data table used was {nmdb_table}.

add_quality_flags

add_quality_flags(custom_flags=None, add_check=None)

Add QualityChecks to undertake on the dataframe

Parameters:

Name Type Description Default
custom_flags QualityAssessmentFlagBuilder

user can build a QualityAssessmentFlagBuilder with checks and attach this as a whole, by default None

None
add_check Check

user can add individual Checks, or a list of Checks. These will be then added to the QualityAssessmentFlagBuilder, by default None

None

apply_quality_flags

apply_quality_flags()

Flags data based on quality assessment. A user can supply a QualityAssessmentFlagBuilder object that has been custom built, they can flag using the config file (if supplied), or they can choose a standard flagging routine.

Everything is off by default so a user must choose.

Parameters:

Name Type Description Default
custom_flags QualityAssessmentFlagBuilder

A custom built set of Flags , by default None

required
flags_from_config bool

State if to conduct QA using config supplied configuration, by default False

required
flags_default str

A string representing a default version of flagging, by default None

required

select_correction

select_correction(correction_type='empty', correction_theory=None)

Method to select corrections to be applied to data.

Individual corrections can be applied using a CorrectionType and CorrectionTheory. If a user assigns a CorrectionType without a CorrectionTheory, then the default correction for that CorrectionType is applied.

Parameters:

Name Type Description Default
correction_type CorrectionType

A CorrectionType, by default "empty"

'empty'
correction_theory CorrectionTheory

A CorrectionTheory, by default None

None

correct_neutrons

correct_neutrons()

Create correction factors as well as the corrected epithermal neutrons column.

smooth_data

smooth_data(column_to_smooth, smooth_method='rolling_mean', window=12, min_proportion_good_data=0.7, poly_order=4, auto_update_final_col=True)

Applies a smoothing method to a series of data in the crns_data_frame using the SmoothData class.

A column_to_smooth attribute must be supplied, and should be written using the "str(ColumnInfo.Name.COLUMN)" format. The two most likely to be used are:

  • str(ColumnInfo.Name.SOIL_MOISTURE)
  • str(ColumnInfo.Name.EPI_NEUTRONS)

If parameters are left as None, it uses defaults from SmoothData (i.e., rolling_mean, window size == 12).

Parameters:

Name Type Description Default
column_to_smooth str(VALUE)

The column in the crns_data_frame that needs to be smoothed. Automatically

required
Report

Data smoothing was done on {column_to_smooth}. This was done using {smooth_method} with a window of {window}.

calibrate_station

calibrate_station(config=None)

Calibrate the sensor

Parameters:

Name Type Description Default
config CalibrationConfiguration

Config file which contains all the required info for calibration, by default None

None

Raises:

Type Description
ValueError

When no calibration data provided

Report

Calibration was undertaken. The N0 number was calculated as {n0}, using the {config.neutron_conversion_method} method. From the samples, the average dry soil bulk density is {avg_dry_soil_bulk_density}, the average soil organic carbon is {avg_soil_organic_carbon}, and the average lattice water content is {avg_lattice_water}.

align_time_stamps

align_time_stamps(align_method='time')

Aligns timestamps to occur on the hour. E.g., 01:00 not 01:05.

Uses the TimeStampAligner class.

Parameters:

Name Type Description Default
method str

method to use for shifting, defaults to shifting to nearest hour, by default "time"

required

aggregate_data_frame

aggregate_data_frame(output_resolution, max_na_fraction=0.3, aggregate_method='bagg')

Aggregate a crns data frame to a new resolution.

Parameters:

Name Type Description Default
output_resolution str

Desired output resolution (e.g., '1h' or '1day')

required
max_na_fraction float

fraction of acceptable nan values in aggregation period, by default 0.3

0.3
aggregate_method str

description, by default "bagg"

'bagg'

produce_soil_moisture_estimates

produce_soil_moisture_estimates(n0=None, conversion_theory='desilets_etal_2010', dry_soil_bulk_density=None, lattice_water=None, soil_organic_carbon=None, koehli_parameters='Mar21_mcnp_drf')

Produces SM estimates with the NeutronsToSM class. If values for n0, dry_soil_bulk_density, lattice_water, or soil_organic_carbon are not supplied, the values are taken from the internal sensor_info class.

Parameters:

Name Type Description Default
n0 float

n0 calibration term, by default None

None
dry_soil_bulk_density float

given in g/cm3, by default None

None
lattice_water float

given as decimal percent e.g., 0.01, by default None

None
soil_organic_carbon float

Given as decimal percent, e.g., 0.001, by default None

None
Report

Soil moisture was estimated using an n0 of {default_params[n0]}, a bulk density of {default_params[dry_soil_bulk_density]}, a lattice water content of {default_params[lattice_water]}, and a soil organic carbon content of

mask_flagged_data

mask_flagged_data(data_frame)

Returns a pd.DataFrame() where flagged data has been replaced with np.nan values

prepare_static_values

prepare_static_values()

Attaches the static values from the SensorInfo Pydantic model as columns of values in the crns_data_frame.

This method: 1. Converts the Pydantic model to a dictionary 2. Checks if each key already exists in the DataFrame 3. Skips None values 4. Adds the remaining values as new columns

The method preserves existing column values if they are already present in the DataFrame to avoid accidental overwrites.

prepare_additional_columns

prepare_additional_columns()

Prepares and adds additional columns required for processing.

Such as: - absolute humidity

create_figures

create_figures(create_all=True, ignore_sections=[], selected_figures=[], show_figures=False)

Handles creating the figures using the FigureHandler.

Parameters:

Name Type Description Default
create_all bool

Default to create all figures in the FigureHandler._register, by default True

True
ignore_sections list

Ignore a whole topic section of figure names, by default []

[]
selected_figures list

A list of the figures to be created if not using create_all. See FigureHandler._figure_registry for the names of possible figures, by default []

[]
show_figures bool

Turn to False to not show Figures in the kernel, by default True

False

save_data

save_data(folder_name=None, save_folder_location=None, use_custom_column_names=False, custom_column_names_dict=None, append_timestamp=True)

Save processed outputs to disk.

Output directory precedence: 1. save_folder_location argument — resolved relative to cwd. 2. data_storage.save_location from the sensor config — already resolved to an absolute path at config load time. 3. Current working directory as a last-resort fallback.

Parameters:

Name Type Description Default
folder_name str

Sub-folder name inside the save location (defaults to site name).

None
save_folder_location str or Path

Explicit output directory; relative paths are cwd-relative.

None