neptoon.hub

crns_data_hub¶

Classes:

CRNSDataHub

Functions:

crns_data_frame
crns_data_frame
flags_data_frame
flags_data_frame
sensor_info
sensor_info
validation
quality_assessor
quality_assessor
correction_factory
calibration_samples_data
calibration_samples_data
correction_builder
correction_builder
validate_dataframe
attach_nmdb_data
add_quality_flags
apply_quality_flags
select_correction
correct_neutrons
smooth_data
calibrate_station
align_time_stamps
aggregate_data_frame
produce_soil_moisture_estimates
mask_flagged_data
prepare_static_values
prepare_additional_columns
create_figures
save_data

CRNSDataHub ¶

CRNSDataHub(crns_data_frame, flags_data_frame=None, sensor_info=None, quality_assessor=None, validation=True, calibration_samples_data=None, data_storage=None)

The CRNSDataHub is used to manage the time series data throughout the processing steps. Some key features:

It stores a DataFrame for a site
As we progress through the steps, data can be added to the DataFrame and the shadow DataFrame's updated.

Raw data is checked against the RawDataSchema which is a first line of defense against incorrectly formatted tables. Should a fail happen here data must be either reformatted using one of the provided routines or manually formatted to match the standard.

Inputs to the CRNSDataHub.

Parameters:

Name	Type	Description	Default
`crns_data_frame`	`DataFrame`	CRNS data in a dataframe format. It will be validated to ensure it has been formatted correctly.	required
`configuration_manager`	`ConfigurationManager`	A ConfigurationManager instance storing configuration YAML information, by default None	required
`quality_assessor`	`SaQC`	SaQC object which is used for quality assessment. Used for the creation of flags to define poor data.	`None`
`validation`	`bool`	Toggle for whether to have continued validation of data tables during processing (see data_management>data_validation_tables.py for examples of tables being validated). These checks ensure data is correctly formatted for internal processing.	`True`
`calibration_samples_data`	`DataFrame`	The sample data taken during the calibration campaign.	`None`
`data_storage`	`DataStorageConfig`	Data storage configuration from the sensor config. When provided, save_data() uses save_location as the default output directory instead of cwd().	`None`

validate_dataframe ¶

validate_dataframe(schema)

Validates the dataframe against a pandera schema See data_validation_table.py for schemas.

Parameters:

Name	Type	Description	Default
`schema`	`str`	The name of the schema to use for the check.	required

attach_nmdb_data ¶

attach_nmdb_data(station='JUNG', new_column_name=str(INCOMING_NEUTRON_INTENSITY), resolution='60', nmdb_table='revori', reference_value=None)

Utilises the NMDBDataAttacher class to attach NMDB incoming intensity data to the crns_data_frame. Collects data using www.NMDB.eu

See NMDBDataAttacher documentation for more information.

Parameters:

Name	Type	Description	Default
`station`	`str`	The station to collect data from, by default "JUNG"	`'JUNG'`
`new_column_name`	`str`	The name of the column were data will be written to, by default "incoming_neutron_intensity"	`str(INCOMING_NEUTRON_INTENSITY)`
`resolution`	`str`	The resolution in minutes, by default "60"	`'60'`
`nmdb_table`	`str`	The table to pull data from, by default "revori"	`'revori'`
`reference_value`	`int`	The reference value of the neutron monitor, if left as None it will use the value from the first data point in the time series.	`None`

Report

Neutron monitoring data was attached from NMDB.eu. The station used was {station} at a resolution of {resolution} minutes. The data table used was {nmdb_table}.

add_quality_flags ¶

add_quality_flags(custom_flags=None, add_check=None)

Add QualityChecks to undertake on the dataframe

Parameters:

Name	Type	Description	Default
`custom_flags`	`QualityAssessmentFlagBuilder`	user can build a QualityAssessmentFlagBuilder with checks and attach this as a whole, by default None	`None`
`add_check`	`Check`	user can add individual Checks, or a list of Checks. These will be then added to the QualityAssessmentFlagBuilder, by default None	`None`

apply_quality_flags ¶

apply_quality_flags()

Flags data based on quality assessment. A user can supply a QualityAssessmentFlagBuilder object that has been custom built, they can flag using the config file (if supplied), or they can choose a standard flagging routine.

Everything is off by default so a user must choose.

Parameters:

Name	Type	Description	Default
`custom_flags`	`QualityAssessmentFlagBuilder`	A custom built set of Flags , by default None	required
`flags_from_config`	`bool`	State if to conduct QA using config supplied configuration, by default False	required
`flags_default`	`str`	A string representing a default version of flagging, by default None	required

select_correction ¶

select_correction(correction_type='empty', correction_theory=None)

Method to select corrections to be applied to data.

Individual corrections can be applied using a CorrectionType and CorrectionTheory. If a user assigns a CorrectionType without a CorrectionTheory, then the default correction for that CorrectionType is applied.

Parameters:

Name	Type	Description	Default
`correction_type`	`CorrectionType`	A CorrectionType, by default "empty"	`'empty'`
`correction_theory`	`CorrectionTheory`	A CorrectionTheory, by default None	`None`

correct_neutrons ¶

correct_neutrons()

Create correction factors as well as the corrected epithermal neutrons column.

smooth_data ¶

smooth_data(column_to_smooth, smooth_method='rolling_mean', window=12, min_proportion_good_data=0.7, poly_order=4, auto_update_final_col=True)

Applies a smoothing method to a series of data in the crns_data_frame using the SmoothData class.

A column_to_smooth attribute must be supplied, and should be written using the "str(ColumnInfo.Name.COLUMN)" format. The two most likely to be used are:

str(ColumnInfo.Name.SOIL_MOISTURE)
str(ColumnInfo.Name.EPI_NEUTRONS)

If parameters are left as None, it uses defaults from SmoothData (i.e., rolling_mean, window size == 12).

Parameters:

Name	Type	Description	Default
`column_to_smooth`	`str(VALUE)`	The column in the crns_data_frame that needs to be smoothed. Automatically	required

Report

Data smoothing was done on {column_to_smooth}. This was done using {smooth_method} with a window of {window}.

calibrate_station ¶

calibrate_station(config=None)

Calibrate the sensor

Parameters:

Name	Type	Description	Default
`config`	`CalibrationConfiguration`	Config file which contains all the required info for calibration, by default None	`None`

Raises:

Type	Description
`ValueError`	When no calibration data provided

Report

Calibration was undertaken. The N0 number was calculated as {n0}, using the {config.neutron_conversion_method} method. From the samples, the average dry soil bulk density is {avg_dry_soil_bulk_density}, the average soil organic carbon is {avg_soil_organic_carbon}, and the average lattice water content is {avg_lattice_water}.

align_time_stamps ¶

align_time_stamps(align_method='time')

Aligns timestamps to occur on the hour. E.g., 01:00 not 01:05.

Uses the TimeStampAligner class.

Parameters:

Name	Type	Description	Default
`method`	`str`	method to use for shifting, defaults to shifting to nearest hour, by default "time"	required

aggregate_data_frame ¶

aggregate_data_frame(output_resolution, max_na_fraction=0.3, aggregate_method='bagg')

Aggregate a crns data frame to a new resolution.

Parameters:

Name	Type	Description	Default
`output_resolution`	`str`	Desired output resolution (e.g., '1h' or '1day')	required
`max_na_fraction`	`float`	fraction of acceptable nan values in aggregation period, by default 0.3	`0.3`
`aggregate_method`	`str`	description, by default "bagg"	`'bagg'`

produce_soil_moisture_estimates ¶

produce_soil_moisture_estimates(n0=None, conversion_theory='desilets_etal_2010', dry_soil_bulk_density=None, lattice_water=None, soil_organic_carbon=None, koehli_parameters='Mar21_mcnp_drf')

Produces SM estimates with the NeutronsToSM class. If values for n0, dry_soil_bulk_density, lattice_water, or soil_organic_carbon are not supplied, the values are taken from the internal sensor_info class.

Parameters:

Name	Type	Description	Default
`n0`	`float`	n0 calibration term, by default None	`None`
`dry_soil_bulk_density`	`float`	given in g/cm3, by default None	`None`
`lattice_water`	`float`	given as decimal percent e.g., 0.01, by default None	`None`
`soil_organic_carbon`	`float`	Given as decimal percent, e.g., 0.001, by default None	`None`

Report

Soil moisture was estimated using an n0 of {default_params[n0]}, a bulk density of {default_params[dry_soil_bulk_density]}, a lattice water content of {default_params[lattice_water]}, and a soil organic carbon content of

mask_flagged_data ¶

mask_flagged_data(data_frame)

Returns a pd.DataFrame() where flagged data has been replaced with np.nan values

prepare_static_values ¶

prepare_static_values()

Attaches the static values from the SensorInfo Pydantic model as columns of values in the crns_data_frame.

This method: 1. Converts the Pydantic model to a dictionary 2. Checks if each key already exists in the DataFrame 3. Skips None values 4. Adds the remaining values as new columns

The method preserves existing column values if they are already present in the DataFrame to avoid accidental overwrites.

prepare_additional_columns ¶

prepare_additional_columns()

Prepares and adds additional columns required for processing.

Such as: - absolute humidity

create_figures ¶

create_figures(create_all=True, ignore_sections=[], selected_figures=[], show_figures=False)

Handles creating the figures using the FigureHandler.

Parameters:

Name	Type	Description	Default
`create_all`	`bool`	Default to create all figures in the FigureHandler._register, by default True	`True`
`ignore_sections`	`list`	Ignore a whole topic section of figure names, by default []	`[]`
`selected_figures`	`list`	A list of the figures to be created if not using create_all. See FigureHandler._figure_registry for the names of possible figures, by default []	`[]`
`show_figures`	`bool`	Turn to False to not show Figures in the kernel, by default True	`False`

save_data ¶

save_data(folder_name=None, save_folder_location=None, use_custom_column_names=False, custom_column_names_dict=None, append_timestamp=True)

Save processed outputs to disk.

Output directory precedence: 1. save_folder_location argument — resolved relative to cwd. 2. data_storage.save_location from the sensor config — already resolved to an absolute path at config load time. 3. Current working directory as a last-resort fallback.

Parameters:

Name	Type	Description	Default
`folder_name`	`str`	Sub-folder name inside the save location (defaults to site name).	`None`
`save_folder_location`	`str or Path`	Explicit output directory; relative paths are cwd-relative.	`None`