CRNSDataHub Class¶

The CRNSDataHub class serves as a central management system for Cosmic-Ray Neutron Sensor (CRNS) data processing within the neptoon package. It coordinates various processing steps and maintains data integrity throughout the workflow.

Class Overview¶

The CRNSDataHub class manages: - CRNS data storage and manipulation - Quality assessment of data - Application of corrections to neutron counts - Conversion of neutron counts to soil moisture estimates - Data validation at various processing stages

Key Attributes¶

crns_data_frame: pandas DataFrame containing the CRNS time series data
flags_data_frame: pandas DataFrame containing quality flags for the data (created)
site_information: SiteInformation object containing metadata about the CRNS site
quality_assessor: DataQualityAssessor object for performing quality checks
correction_factory: CorrectionFactory object for creating correction instances
correction_builder: CorrectionBuilder object for managing multiple corrections

Main Methods¶

init¶

def __init__(self, crns_data_frame: pd.DataFrame, flags_data_frame: pd.DataFrame = None, 
             configuration_manager: ConfigurationManager = None, 
             quality_assessor: DataQualityAssessor = None, validation: bool = True, 
             site_information: SiteInformation = None, process_with_config: bool = False):

Initializes the CRNSDataHub with the provided data and configuration.

validate_dataframe¶

def validate_dataframe(self, schema: str):

Validates the data frame against a specified schema to ensure data integrity.

update_site_information¶

def update_site_information(self, new_site_information: SiteInformation):

Updates the site information and reinitializes the correction factory.

attach_nmdb_data¶

def attach_nmdb_data(self, station="JUNG", new_column_name="incoming_neutron_intensity", 
                     resolution="60", nmdb_table="revori"):

Attaches incoming neutron intensity data from NMDB to the CRNS data frame.

apply_quality_flags¶

def apply_quality_flags(self, custom_flags: QualityAssessmentFlagBuilder = None, 
                        flags_from_config: bool = False, flags_default: str = None):

Applies quality flags to the data based on specified criteria.

select_correction¶

def select_correction(self, correction_type: CorrectionType = "empty", 
                      correction_theory: CorrectionTheory = None, 
                      use_all_default_corrections=False):

Selects and adds corrections to be applied to the neutron count data.

correct_neutrons¶

def correct_neutrons(self, correct_flagged_values_too=False):

Applies selected corrections to the neutron count data.

smooth_data¶

def smooth_data(self, column_to_smooth: str, smooth_method: Literal["rolling_mean", "savitsky_golay"] = "rolling_mean", 
                window: Optional[Union[int, str]] = 12, poly_order: int = 4, auto_update_final_col: bool = True):

Applies smoothing to a specified data column.

produce_soil_moisture_estimates¶

def produce_soil_moisture_estimates(self, n0: float = None, dry_soil_bulk_density: float = None, 
                                    lattice_water: float = None, soil_organic_carbon: float = None):

Calculates soil moisture estimates based on corrected neutron counts and site parameters.

save_data¶

def save_data(
    self,
    folder_name: str | None = None,
    save_folder_location: str | Path | None = None,
    use_custom_column_names: bool = False,
    custom_column_names_dict: dict | None = None,
    append_timestamp: bool = True,
):

Saves the processed data, flags, figures, and (optionally) the PDF report to a folder.

folder_name defaults to sensor_info.name.
save_folder_location defaults to data_storage.save_location from the sensor config (when the hub was built from config), and otherwise to the current working directory. See Saving Data for the full precedence rules.
append_timestamp appends _YYYYMMDDhhmmss to the folder name so repeated runs do not overwrite each other.

Usage Example¶

import pandas as pd
from neptoon.data_management import CRNSDataHub, SiteInformation

# Assume we have a pandas DataFrame 'crns_df' with CRNS data
crns_df = pd.read_csv('crns_data.csv')

# Create a SiteInformation object
site_info = SiteInformation(latitude=52.3676, longitude=4.9041, elevation=1, ...)

# Initialize CRNSDataHub
data_hub = CRNSDataHub(crns_data_frame=crns_df, site_information=site_info)

# Perform data processing steps
data_hub.validate_dataframe(schema="initial_check")
data_hub.attach_nmdb_data()
data_hub.apply_quality_flags()
data_hub.select_correction(correction_type="pressure")
data_hub.correct_neutrons()
data_hub.produce_soil_moisture_estimates()

# Save processed data
data_hub.save_data(folder_name='processed_crns_data', save_folder_location='/path/to/output')

Notes¶

The CRNSDataHub is designed to be flexible, allowing users to customize various aspects of the data processing workflow.
It's important to ensure that all required data and metadata are properly initialized before proceeding with data processing steps.
The class includes various validation checks to maintain data integrity throughout the processing pipeline.