Quality Assessment in neptoon¶
Overview¶
The quality assessment (QA) system in neptoon provides tools to identify and flag bad data in CRNS time series. Built on top of SaQC framework, neptoon's QA system allows users to apply standardized quality checks to their data, helping to ensure that soil moisture estimates are based on reliable measurements.
Key Components¶
The quality assessment system in neptoon is built from several interconnected parts:
- QualityCheck: This is the fundamental building block that defines what you want to check (like neutron counts), how you want to check it (like range validation), and the specific thresholds or criteria to use.
- QualityAssessmentFlagBuilder: Collects multiple quality checks into a set
- DataQualityAssessor: Applies quality checks to data and manages the SaQC integration
- QAMethod: This is a set of pre-defined checking methods you can choose from, such as range checks (is the value between X and Y?), spike detection, or constant value detection.
- QATarget: This defines which data columns in your CRNS dataset you want to check, such as raw neutron counts, air pressure, or humidity measurements.
Getting Started¶
Creating Quality Checks¶
Quality checks are created using the QualityCheck class, which requires three key pieces of information:
- Target: What data column to check (using
QATargetenum) - Method: What type of check to perform (using
QAMethodenum) - Parameters: Configuration parameters specific to the method (provided as a dictionary)
from neptoon.quality_control import QualityCheck, QAMethod, QATarget
# Create a range check for neutron counts
neutron_range_check = QualityCheck(
target=QATarget.RAW_EPI_NEUTRONS,
method=QAMethod.RANGE_CHECK,
parameters={
"min": 500, # Minimum acceptable value
"max": 2000 # Maximum acceptable value
}
)
# Create a spike detection check for air pressure
pressure_spike_check = QualityCheck(
target=QATarget.AIR_PRESSURE,
method=QAMethod.SPIKE_OFFSET, # Univariate Local Outlier Factor method
parameters={
"window": "24h",
"threshold": (0.2, -0.2) # Must use tuple with positive and negative spike thresholds
}
)
)
Building a Set of Quality Checks¶
Multiple quality checks can be combined using the QualityAssessmentFlagBuilder:
from neptoon.quality_control import QualityAssessmentFlagBuilder
# Create a flag builder
flag_builder = QualityAssessmentFlagBuilder()
# Add multiple checks
flag_builder.add_check(neutron_range_check)
flag_builder.add_check(pressure_spike_check)
# Alternatively, add multiple checks at once
flag_builder.add_check(neutron_range_check, pressure_spike_check)
Applying Quality Checks to Data¶
Once you have defined your quality checks, you can apply them to your data using the DataQualityAssessor:
from neptoon.quality_control import DataQualityAssessor
from neptoon.hub import CRNSDataHub
# Assuming you have a CRNSDataHub with data
data_hub = CRNSDataHub(crns_data_frame=your_data_frame)
# Add quality checks
data_hub.add_quality_flags(custom_flags=flag_builder)
# Apply the quality checks
data_hub.apply_quality_flags()
# The flagged data is now masked in the data_hub.crns_data_frame
# The flags themselves are in data_hub.flags_data_frame
Quality Check Methods¶
neptoon provides several quality check methods through the QAMethod enum:
| Method | Description | Common Parameters |
|---|---|---|
RANGE_CHECK |
Flags values outside a specified range | min, max |
SPIKE_OFFSET |
Flags offsets by a relative amount of previous value | threshold_relative, window |
SPIKE_UNILOF |
Detects spikes using the univariate Local Outlier Factor algorithm | periods_in_calculation, threshold |
CONSTANT |
Flags periods where values remain constant | - |
ABOVE_N0 |
Flags neutron counts above a factor of the N0 calibration value | N0, percent_maximum |
BELOW_N0_FACTOR |
Flags neutron counts below a factor of the N0 calibration value | N0, percent_minimum |
Finding Required Parameters¶
To discover what parameters are required for a specific quality check method, use the WhatParamsDoINeed utility:
from neptoon.quality_control import WhatParamsDoINeed, QAMethod
# Display parameter information for a method
WhatParamsDoINeed(QAMethod.RANGE_CHECK)
This will print detailed information about required and optional parameters for the specified method.
Quality Check Targets¶
The following data columns can be targeted for quality checks via the QATarget enum:
| Target | Description |
|---|---|
RAW_EPI_NEUTRONS |
Raw epithermal neutron counts |
CORRECTED_EPI_NEUTRONS |
Corrected epithermal neutron counts |
RELATIVE_HUMIDITY |
Air relative humidity |
AIR_PRESSURE |
Atmospheric pressure |
TEMPERATURE |
Air temperature |
SOIL_MOISTURE |
Calculated soil moisture |
CUSTOM |
User-defined column (requires specifying column_name in parameters) |
Advanced Usage¶
Custom Quality Check Targets¶
To quality-check a column not covered by the standard QATarget enum:
from neptoon.quality_control import QualityCheck, QAMethod, QATarget
custom_check = QualityCheck(
target=QATarget.CUSTOM,
method=QAMethod.RANGE_CHECK,
parameters={
"column_name": "your_custom_column_name", # Specify the column name
"min": 0,
"max": 100
}
)
Changing the SaQC Flagging Scheme¶
neptoon uses SaQC's "simple" flagging scheme by default, but you can change this:
quality_assessor = DataQualityAssessor(data_frame=your_data_frame)
quality_assessor.change_saqc_scheme("dmp") # Other options: "float", "positional", "annotated-float"
Direct Access to Flags¶
You can access the generated flags directly:
# Get the flags dataframe
flags_df = data_hub.flags_data_frame
# Values are "UNFLAGGED" or specific flags (depends on the SaQC scheme)
print(flags_df.head())
Recommendations for CRNS Data¶
For CRNS data processing, we recommend implementing the following quality checks as a starting point in your workflow:
1. Spike Detection on Raw Neutron Counts¶
Identify anomalous spikes in your raw neutron count data:
QualityCheck(
target=QATarget.RAW_EPI_NEUTRONS,
method=QAMethod.SPIKE_OFFSET,
parameters={
"window": "24h",
"threshold": (0.2, -0.2) # Must do tuple with positive and negative spike thresholds
}
)
2. Meteorological Variable Validation¶
Ensure meteorological variables are within physically reasonable ranges:
# Air pressure range check
QualityCheck(
target=QATarget.AIR_PRESSURE,
method=QAMethod.RANGE_CHECK,
parameters={"min": 800, "max": 1100}
)
# Relative humidity range check
QualityCheck(
target=QATarget.RELATIVE_HUMIDITY,
method=QAMethod.RANGE_CHECK,
parameters={"min": 0, "max": 100} # Percentage (0-100%)
)
# Temperature checks....
further processing here
At this point you would correct your neutrons and calibrate your sensor to get an N0 value
3. Calibration-Based Checks (Apply After Calibration)¶
After performing calibration and determining your N0 value, add these checks based on the N0:
qa_flags_2 = QualityAssessmentFlagBuilder()
qa_flags_2.add_check(
QualityCheck(
target=QATarget.CORRECTED_EPI_NEUTRONS,
method=QAMethod.ABOVE_N0,
parameters={
"percent_maximum":1.075,
"N0":data_hub.sensor_info.N0
}),
QualityCheck(
target=QATarget.CORRECTED_EPI_NEUTRONS,
method=QAMethod.BELOW_N0_FACTOR,
parameters={
"N0":data_hub.sensor_info.N0,
"percent_minimum":0.3
}),
)
data_hub.add_quality_flags(custom_flags=qa_flags_2)
data_hub.apply_quality_flags()
The parameter values presented here are general guidelines. You should adjust these values based on your: - Site-specific environmental conditions - Sensor model and characteristics - Elevation and local atmospheric patterns - Expected soil moisture range
These quality checks should be applied at appropriate stages in your processing pipeline: spike detection on raw data, meteorological variable validation before corrections, and N0-based checks after calibration has been performed.