neptoon.hub
crns_data_hub¶
Classes:
Functions:
- crns_data_frame
- crns_data_frame
- flags_data_frame
- flags_data_frame
- sensor_info
- sensor_info
- validation
- quality_assessor
- quality_assessor
- correction_factory
- calibration_samples_data
- calibration_samples_data
- correction_builder
- correction_builder
- validate_dataframe
- attach_nmdb_data
- add_quality_flags
- apply_quality_flags
- select_correction
- correct_neutrons
- smooth_data
- calibrate_station
- align_time_stamps
- aggregate_data_frame
- produce_soil_moisture_estimates
- mask_flagged_data
- prepare_static_values
- prepare_additional_columns
- create_figures
- save_data
CRNSDataHub ¶
CRNSDataHub(crns_data_frame, flags_data_frame=None, sensor_info=None, quality_assessor=None, validation=True, calibration_samples_data=None, data_storage=None)
The CRNSDataHub is used to manage the time series data throughout the processing steps. Some key features:
- It stores a DataFrame for a site
- As we progress through the steps, data can be added to the DataFrame and the shadow DataFrame's updated.
Raw data is checked against the RawDataSchema which is a first line of defense against incorrectly formatted tables. Should a fail happen here data must be either reformatted using one of the provided routines or manually formatted to match the standard.
Inputs to the CRNSDataHub.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
crns_data_frame
|
DataFrame
|
CRNS data in a dataframe format. It will be validated to ensure it has been formatted correctly. |
required |
configuration_manager
|
ConfigurationManager
|
A ConfigurationManager instance storing configuration YAML information, by default None |
required |
quality_assessor
|
SaQC
|
SaQC object which is used for quality assessment. Used for the creation of flags to define poor data. |
None
|
validation
|
bool
|
Toggle for whether to have continued validation of data tables during processing (see data_management>data_validation_tables.py for examples of tables being validated). These checks ensure data is correctly formatted for internal processing. |
True
|
calibration_samples_data
|
DataFrame
|
The sample data taken during the calibration campaign. |
None
|
data_storage
|
DataStorageConfig
|
Data storage configuration from the sensor config. When provided, save_data() uses save_location as the default output directory instead of cwd(). |
None
|
validate_dataframe ¶
Validates the dataframe against a pandera schema See data_validation_table.py for schemas.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
schema
|
str
|
The name of the schema to use for the check. |
required |
attach_nmdb_data ¶
attach_nmdb_data(station='JUNG', new_column_name=str(INCOMING_NEUTRON_INTENSITY), resolution='60', nmdb_table='revori', reference_value=None)
Utilises the NMDBDataAttacher class to attach NMDB incoming intensity data to the crns_data_frame. Collects data using www.NMDB.eu
See NMDBDataAttacher documentation for more information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
station
|
str
|
The station to collect data from, by default "JUNG" |
'JUNG'
|
new_column_name
|
str
|
The name of the column were data will be written to, by default "incoming_neutron_intensity" |
str(INCOMING_NEUTRON_INTENSITY)
|
resolution
|
str
|
The resolution in minutes, by default "60" |
'60'
|
nmdb_table
|
str
|
The table to pull data from, by default "revori" |
'revori'
|
reference_value
|
int
|
The reference value of the neutron monitor, if left as None it will use the value from the first data point in the time series. |
None
|
Report
Neutron monitoring data was attached from NMDB.eu. The station used was {station} at a resolution of {resolution} minutes. The data table used was {nmdb_table}.
add_quality_flags ¶
Add QualityChecks to undertake on the dataframe
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
custom_flags
|
QualityAssessmentFlagBuilder
|
user can build a QualityAssessmentFlagBuilder with checks and attach this as a whole, by default None |
None
|
add_check
|
Check
|
user can add individual Checks, or a list of Checks. These will be then added to the QualityAssessmentFlagBuilder, by default None |
None
|
apply_quality_flags ¶
Flags data based on quality assessment. A user can supply a QualityAssessmentFlagBuilder object that has been custom built, they can flag using the config file (if supplied), or they can choose a standard flagging routine.
Everything is off by default so a user must choose.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
custom_flags
|
QualityAssessmentFlagBuilder
|
A custom built set of Flags , by default None |
required |
flags_from_config
|
bool
|
State if to conduct QA using config supplied configuration, by default False |
required |
flags_default
|
str
|
A string representing a default version of flagging, by default None |
required |
select_correction ¶
Method to select corrections to be applied to data.
Individual corrections can be applied using a CorrectionType and CorrectionTheory. If a user assigns a CorrectionType without a CorrectionTheory, then the default correction for that CorrectionType is applied.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
correction_type
|
CorrectionType
|
A CorrectionType, by default "empty" |
'empty'
|
correction_theory
|
CorrectionTheory
|
A CorrectionTheory, by default None |
None
|
correct_neutrons ¶
Create correction factors as well as the corrected epithermal neutrons column.
smooth_data ¶
smooth_data(column_to_smooth, smooth_method='rolling_mean', window=12, min_proportion_good_data=0.7, poly_order=4, auto_update_final_col=True)
Applies a smoothing method to a series of data in the crns_data_frame using the SmoothData class.
A column_to_smooth attribute must be supplied, and should be
written using the "str(ColumnInfo.Name.COLUMN)" format. The two
most likely to be used are:
- str(ColumnInfo.Name.SOIL_MOISTURE)
- str(ColumnInfo.Name.EPI_NEUTRONS)
If parameters are left as None, it uses defaults from SmoothData (i.e., rolling_mean, window size == 12).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_to_smooth
|
str(VALUE)
|
The column in the crns_data_frame that needs to be smoothed. Automatically |
required |
Report
Data smoothing was done on {column_to_smooth}. This was done using {smooth_method} with a window of {window}.
calibrate_station ¶
Calibrate the sensor
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
CalibrationConfiguration
|
Config file which contains all the required info for calibration, by default None |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
When no calibration data provided |
Report
Calibration was undertaken. The N0 number was calculated as {n0}, using the {config.neutron_conversion_method} method. From the samples, the average dry soil bulk density is {avg_dry_soil_bulk_density}, the average soil organic carbon is {avg_soil_organic_carbon}, and the average lattice water content is {avg_lattice_water}.
align_time_stamps ¶
Aligns timestamps to occur on the hour. E.g., 01:00 not 01:05.
Uses the TimeStampAligner class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
method to use for shifting, defaults to shifting to nearest hour, by default "time" |
required |
aggregate_data_frame ¶
Aggregate a crns data frame to a new resolution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_resolution
|
str
|
Desired output resolution (e.g., '1h' or '1day') |
required |
max_na_fraction
|
float
|
fraction of acceptable nan values in aggregation period, by default 0.3 |
0.3
|
aggregate_method
|
str
|
description, by default "bagg" |
'bagg'
|
produce_soil_moisture_estimates ¶
produce_soil_moisture_estimates(n0=None, conversion_theory='desilets_etal_2010', dry_soil_bulk_density=None, lattice_water=None, soil_organic_carbon=None, koehli_parameters='Mar21_mcnp_drf')
Produces SM estimates with the NeutronsToSM class. If values for n0, dry_soil_bulk_density, lattice_water, or soil_organic_carbon are not supplied, the values are taken from the internal sensor_info class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n0
|
float
|
n0 calibration term, by default None |
None
|
dry_soil_bulk_density
|
float
|
given in g/cm3, by default None |
None
|
lattice_water
|
float
|
given as decimal percent e.g., 0.01, by default None |
None
|
soil_organic_carbon
|
float
|
Given as decimal percent, e.g., 0.001, by default None |
None
|
Report
Soil moisture was estimated using an n0 of {default_params[n0]}, a bulk density of {default_params[dry_soil_bulk_density]}, a lattice water content of {default_params[lattice_water]}, and a soil organic carbon content of
mask_flagged_data ¶
Returns a pd.DataFrame() where flagged data has been replaced with np.nan values
prepare_static_values ¶
Attaches the static values from the SensorInfo Pydantic model as columns of values in the crns_data_frame.
This method: 1. Converts the Pydantic model to a dictionary 2. Checks if each key already exists in the DataFrame 3. Skips None values 4. Adds the remaining values as new columns
The method preserves existing column values if they are already present in the DataFrame to avoid accidental overwrites.
prepare_additional_columns ¶
Prepares and adds additional columns required for processing.
Such as: - absolute humidity
create_figures ¶
Handles creating the figures using the FigureHandler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
create_all
|
bool
|
Default to create all figures in the FigureHandler._register, by default True |
True
|
ignore_sections
|
list
|
Ignore a whole topic section of figure names, by default [] |
[]
|
selected_figures
|
list
|
A list of the figures to be created if not using create_all. See FigureHandler._figure_registry for the names of possible figures, by default [] |
[]
|
show_figures
|
bool
|
Turn to False to not show Figures in the kernel, by default True |
False
|
save_data ¶
save_data(folder_name=None, save_folder_location=None, use_custom_column_names=False, custom_column_names_dict=None, append_timestamp=True)
Save processed outputs to disk.
Output directory precedence:
1. save_folder_location argument — resolved relative to cwd.
2. data_storage.save_location from the sensor config — already
resolved to an absolute path at config load time.
3. Current working directory as a last-resort fallback.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder_name
|
str
|
Sub-folder name inside the save location (defaults to site name). |
None
|
save_folder_location
|
str or Path
|
Explicit output directory; relative paths are cwd-relative. |
None
|