Overview¶
The sensor configuration file tells neptoon about the sensor being processed. The sections in this file are: config, sensor_info, raw_data_parse_options, time_series_data, input_data_qa, soil_moisture_qa, calibration, data_storage and figures. Below is an example file which you can use a starting point for your own sensor, a quick start guide and a more detailed reference to each possiblility.
Some of the inputs can be left blank initially, and neptoon will calculate the values during processing (e.g., N0 when calibrating). The output folder when processing is complete will include a file called sensor_config.yaml. This is the config file used as input with any additional calculated information included.
File Structure¶
config: sensor
sensor_info:
name: Cunnesdorf_test_site
country: Germany
identifier: A102
install_date: 2016-10-21
latitude: 51.369597
longitude: 12.557120
elevation: 113
time_zone: +1
site_cutoff_rigidity: 2.94
avg_lattice_water: 0.0043
avg_soil_organic_carbon: 0.0184
avg_dry_soil_bulk_density: 1.6
N0: 1100
beta_coefficient:
mean_pressure:
above_ground_biomass:
##################################################################
# The below section is optional and is only necessary #
# when you are processing data which has been taken from #
# a logger directly. E.g., An SD card from your CRNS. #
# #
# If you already have your data in a single csv file set #
# parse_raw_data to False and go to the next section. #
##################################################################
raw_data_parse_options:
parse_raw_data: True
data_location: ../example_data/CRNS-station_data-Hydroinnova-A.zip
column_names:
prefix: ""
suffix: ""
encoding: "cp850"
skip_lines: 0
separator: ","
decimal: "."
skip_initial_space: True
parser_kw: # leave blank
strip_left: True
digit_first: True
starts_with: ""
multi_header: False
strip_names: True
remove_prefix: "//"
##################################################################
# This section provides information to neptoon about your #
# input data. Things like where to find it (path_to_data) #
# and what the key columns are names. When multiple columns #
# are present (often the case for redundancy) they will be #
# merged into one. In this case add them as a list as shown #
# below and state the merge method (priority recommended). #
##################################################################
time_series_data:
path_to_data: # blank if parsing raw data
key_column_info:
epithermal_neutron_columns:
- N2Cts
thermal_neutron_columns:
- N1Cts
neutron_count_units: absolute_count
pressure_columns:
- P4_mb # first priority goes here
- P3_mb
- P1_mb
pressure_units: hectopascals
pressure_merge_method: priority
temperature_columns:
- T1_C
- T2_C
temperature_units: celcius
temperature_merge_method: priority
relative_humidity_columns:
- RH1
relative_humidity_units: percent
relative_humidity_merge_method: priority
date_time_columns:
- Date Time(UTC)
date_time_format: "%Y/%m/%d %H:%M:%S"
initial_time_zone: utc
convert_time_zone_to: utc
temporal_diff_col_name: none # Leave as none is you don't have a diff col
input_data_qa:
air_relative_humidity:
flag_range:
min: 0
max: 100
air_pressure:
flag_range:
min: 600
max: 1300
soil_moisture_qa:
soil_moisture:
flag_range:
min: 0
max: 1
calibration:
calibrate: False
data_format: custom
location: /path/to/location
key_column_names:
profile_id_column: PROF
sample_depth: DETPH_AVG
radial_distance_from_sensor: LOC_rad
data_storage:
save_location:
append_yaml_hash_to_folder_name: False
create_report: True
figures:
create_figures: True
make_all_figures: True
custom_list:
- nmdb_incoming_radiation
Configuration Quick Reference Guide¶
Sensor Information¶
This config section provides information on individual sensor being processed. Some of this information is crucial for data processing, such as elevation or latitude. Others are used in organising the data outputs - like adding the name to save folders.
It is always better to fill this in as best you can.
The beta_coefficient can be automatically calculated if left blank using elevation and latitude
Same for the N0 however this requires the calibration section to be correctly filled (otherwise you'll have to guess). Generally speaking when calibrating it's also possible to automatically generate the values for avg_lattice_water and avg_soil_organic_carbon, as long as they are available in your calibration dataset.
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| name | Yes | string | Cunnesdorf_test_site |
Site identifier used for file naming and metadata |
| country | No | string | DEU |
Country where sensor is located |
| identifier | No | string | A102 |
Unique sensor identifier code |
| install_date | Yes | string | 2016-10-21 |
Date sensor was installed (YYYY-MM-DD) |
| latitude | Yes | float | 51.369597 |
Site latitude in decimal degrees |
| longitude | Yes | float | 12.557120 |
Site longitude in decimal degrees |
| elevation | Yes | float | 113 |
Site elevation in meters |
| time_zone | Yes | string | +1 |
Time zone offset from UTC |
| site_cutoff_rigidity | No | float | 2.94 |
Geomagnetic cutoff rigidity in GV |
| avg_lattice_water | No | float | 0.0043 |
Average lattice water content as decimal (e.g., 0.0043 = 0.43%) |
| avg_soil_organic_carbon | No | float | 0.0184 |
Soil organic carbon content as decimal (e.g., 0.0184 = 1.84%) |
| avg_dry_soil_bulk_density | No | float | 1.6 |
Dry soil bulk density in g/cm³ |
| N0 | No | float | 1100 |
Calibration parameter for neutron-to-soil moisture conversion |
| beta_coefficient | No | float | - | Site-specific coefficient for pressure correction |
| mean_pressure | No | float | - | Reference atmospheric pressure for corrections |
Raw Data Parse Options¶
Raw Data Parsing refers to the required step to take your raw data files (e.g., found on the SD card in the logger) and converting them to a single csv file. No data manipulation is done at this stage. In it's simplest form it will take a list of .txt files and order them by date. When things get more complicated (e.g., only files with a certain prefix contain CRNS data), other settings are required.
Is this step needed?
You can skip this step if your data is already available as a single csv file. Simply set parse_raw_data to False and move on to time series data below.
Requirements
For this section the Required column will change if you select True for parse_raw_data.
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| parse_raw_data | Yes | boolean | True |
Toggle for raw data parsing functionality. When False, this entire section is ignored |
| data_location | No | Path | "data/CRNS-station_data.zip" |
Path to raw data files/directory. Supports folders, zip, or tar archives |
| column_names | No | List[str] | ["date", "time", "counts"] |
Expected column names in order. If not provided, will attempt auto-detection |
| prefix | No | string | "CRNS_" |
Filter raw files by filename prefix |
| suffix | No | string | ".dat" or .txt |
Filter raw files by filename suffix |
| encoding | No | string | "cp850" |
File encoding format. Common alternatives: utf-8, ascii |
| skip_lines | No | integer | 2 |
Number of header/metadata lines to skip before data |
| separator | No | string | "," |
Column delimiter character (e.g., comma, tab, semicolon) |
| decimal | No | string | "." |
Decimal point character. |
| skip_initial_space | No | boolean | True |
Remove leading whitespace in data fields |
| parser_kw | No | object | - | Advanced parser configuration |
| ├─ strip_left | No | boolean | True |
Remove leading whitespace in fields |
| ├─ digit_first | No | boolean | True |
Expect numeric data at start of line |
| starts_with | No | string | "#" |
Required prefix for header lines |
| multi_header | No | boolean | False |
Support for multi-line header formats |
| strip_names | No | boolean | True |
Remove whitespace from column names |
| remove_prefix | No | string | "//" |
Remove lines that start with this |
Additional Information
- Paths in
data_locationfollow the unified path policy: absolute,~/home-relative, or relative to the config file's directory. - When
column_namesis not provided, the parser attempts to detect headers from the first file - For compressed data, both .zip and .tar formats are automatically detected and extracted
Time Series Data¶
The time series data section is interested in how we prepare your CRNS time series data for processing. It imagines that your data is at least in a datetime ordered csv format. You state where that data is with the path_to_data setting and it will read it in and begin preperations. If you needed to run the above "Raw Data Parse Options" stage, the path to the data is not really needed, but the remaining settings are!
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| path_to_data | No | string | - | Path to pre-processed data (leave blank if parsing raw data) |
Key Column Information (Time Series Data)¶
Here we define settings to prepare the data for processing. For example in neptoon we standardise all neutron counts to counts per hour (cph). So if your data is in another format, state it here and neptoon will take care of the conversion.
Other things include if you have multiple columns of certain data readings (e.g., multiple pressure sensors). State the names in a list under the specific column section and state how you wish to merge them into a single column. priority means it will use the first value in the list and gap fill with the next if missing. average means it will take the mean.
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| epithermal_neutron_columns | Yes | list | [N2Cts] |
Columns containing epithermal neutron counts |
| thermal_neutron_columns | No | list | [N1Cts] |
Columns containing thermal neutron counts |
| neutron_count_units | Yes | string | absolute_count or counts_per_hour or counts_per_second |
Units for neutron measurements |
| pressure_columns | Yes | list | [P4_mb, P3_mb, P1_mb] |
Pressure columns in priority order |
| pressure_units | Yes | string | hectopascals |
Units for pressure measurements |
| pressure_merge_method | Yes | string | priority |
How to handle multiple pressure columns |
| temperature_columns | Yes | list | [T1_C, T2_C] |
Temperature measurement columns |
| temperature_units | Yes | string | celcius |
Units for temperature measurements |
| temperature_merge_method | Yes | string | priority |
How to handle multiple temperature columns |
| relative_humidity_columns | Yes | list | [RH1] |
Relative humidity measurement columns |
| relative_humidity_units | Yes | string | percent |
Units for humidity measurements |
| date_time_columns | Yes | list | [Date Time(UTC)] |
Columns containing date/time data |
| date_time_format | Yes | string | "%Y/%m/%d %H:%M:%S" |
Format string for parsing dates |
Time Formats
DateTime format strings must be enclosed in quotes (e.g., "%Y/%m/%d %H:%M:%S") to comply with YAML syntax.
Quality Assessment Settings¶
We include some simple options for quality assessment in neptoon using SaQC as the back-end. More information about how to do this is provided further below on this page (and examples are shown in the example config above).
We do not plan to expand this further to avoid scope creep. Neptoon is designed to process CRNS data. We provide some options to QA data used directly in this process. To QA any additional co-located sensors, we would recommend using a system designed for QA specifically (e.g., SaQC).
Calibration¶
Calibration finds your N0 term. For this we need sample data acquired from the site. When available the following section tells neptoon where the data is and what the format is.
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| calibrate | Yes | boolean | True |
Toggle for whether calibration will be done |
| location | No | string | home_dir/example_data/FSCD001_calibration.csv |
Location of the calibration data |
| date_time_format | No | string | "%d.%m.%Y %H:%M" |
DateTime format of the calibration data |
Key Column Names (Calibration)¶
These values are required if calibrate is set to true in the above section.
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| date_time | No | string | "DateTime_utc" |
Name of the column with DateTime |
| profile_id | No | string | "Profile_ID" |
Name of the column with profile ID |
| sample_depth | No | string | "Profile_Depth_cm" |
Name of the column with sample depth values |
| radial_distance_from_sensor | No | string | "Distance_to_CRNS_m" |
Name of the column with distance of the sample from the sensor (m) |
| bulk_density_of_sample | No | string | "DryBulkDensity_g_cm3" |
Name of the column with bulk density of the samples |
| gravimetric_soil_moisture | No | string | "SoilMoisture_g_g" |
Name of the column with gravimetric soil moisture values |
| soil_organic_carbon | No | string | "SoilOrganicCarbon_g_g" |
Name of the column with soil organic carbon values |
| lattice_water | No | string | "LatticeWater_g_g" |
Name of the column with lattice water values |
Data Storage¶
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| save_location | No | string | ~/outputs, /data/outputs, or ./output |
Directory for saving outputs. Accepts absolute, home-relative (~/), or relative paths (resolved against the config file's directory). If blank, outputs go to the current working directory. Can be overridden at run-time with --save-location. |
| append_timestamp_to_folder_name | No | boolean | True |
Whether to append a timestamp to the output folder name. Useful when experimenting to avoid overwriting data. |
| create_report | No | boolean | true |
Whether to create a detailed report of your data outputs during the processing run and save it into the output folder |
Figures¶
Figures are tightly coupled to the create_report feature above. Neptoon will produce some useful figures helping to describe your data for quick visual checks post processing. These can be turned off if not required. Otherwise the figures are saved into a folder in the output folder, and included in the report if this is turned on.
| Parameter | Required | Type | Example | Description |
|---|---|---|---|---|
| create_figures | Yes | boolean | True |
Generate visualization figures |
| make_all_figures | No | boolean | True |
Generate all available figure types in figure registry |
| custom_list | No | list | [nmdb_incoming_radiation] |
List of specific figures to generate |
Paths in config files¶
All path fields (data_location, path_to_data, calibration.location, save_location) follow the same four-rule policy:
| Input | Resolved to |
|---|---|
blank / null |
None — triggers any applicable fallback (e.g. cwd in save_data) |
~/foo |
User home directory |
/abs/path |
As-is (normalised) |
./rel or rel |
Config file's own directory |
This means you can use relative paths freely — neptoon always knows which directory to anchor them to.
Deprecated: working_directory:
An older top-level working_directory: key was previously used to override the resolution basis for relative paths. It is now deprecated and will be removed in a future release. Use relative paths instead — they already resolve against the config file's directory automatically.
Detailed Configuration Reference¶
Below is more details on some of the features of sensor config file.
Sensor Information (sensor_info)¶
This section contains essential metadata about your Cosmic-Ray Neutron Sensor (CRNS) station and site characteristics. This information is crucial for accurate soil moisture estimation and data organization.
name¶
Description
A unique identifier for the monitoring station that will be used in file naming and outputs.
Specification
- Type: string
- Required: Yes
- Example:
"Cunnesdorf_test_site"
Technical Details
- Should be URL-safe (avoid special characters)
- Used as default folder name for outputs
- No spaces recommended (use underscores)
identifier¶
Description
The unique hardware identifier for the CRNS unit. This can be an additional was to identify the site.
Specification
- Type: string
- Required: No
- Example:
"A102"
install_date¶
Description
The date when the CRNS was installed at the monitoring site.
Specification
- Type: Date string
- Format: YYYY-MM-DD
- Required: Yes
- Example:
"2016-10-21"
Technical Details
- Used as cutoff for data processing
- Single-digit months/days require leading zeros
- Must be in format YYYY-MM-DD to be registered as a date
latitude and longitude¶
Description
Geographic coordinates of the CRNS installation location.
Specification
- Type: float
- Required: Yes
- Range: -90 to 90 (latitude), -180 to 180 (longitude)
- Example:
51.369597, 12.557120
Technical Details
- Decimal degrees format
elevation¶
Description
Height above sea level of the CRNS installation site.
Specification
- Type: float
- Required: Yes
- Units: meters above sea level
- Example:
113
Technical Details
- Used in atmospheric pressure corrections
- Important for neutron flux calculations
site_cutoff_rigidity¶
Description
The geomagnetic cutoff rigidity at the installation site, which affects cosmic ray flux.
Specification
- Type: float
- Required: No
- Units: GV (gigavolts)
- Example:
2.94
Technical Details
- Affects incoming neutron corrections
- Location-dependent parameter
- www.crnslab.org provides methods to calculate this with latitude and longitude values
- If not supplied in config it will use a lookup table to find the value with lat and lon
avg_lattice_water¶
Description
The average lattice water content in soil minerals at the monitoring site.
Specification
- Type: float
- Required: No
- Units: g/g (decimal percentage)
- Example:
0.0043
Technical Details
- Represented as decimal (0.0043 = 0.43%)
- Site-specific constant
- Used in soil moisture conversion
- If not supplied defaults to 0
- Can be automatically calculated if calibration sample data is available and lattice water content is a provided data.
avg_soil_organic_carbon¶
Description
The average soil organic carbon content at the monitoring site.
Specification
- Type: float
- Required: No
- Units: g/g (decimal percentage)
- Example:
0.0184
Technical Details
- Represented as decimal (0.0184 = 1.84%)
- If not supplied defaults to 0
- Can be automatically calculated if calibration sample data is provided with this in it (a site average is used)
- Used in soil moisture conversion equations
avg_dry_soil_bulk_density¶
Description
The average dry soil bulk density across the CRNS footprint. This parameter is essential for converting gravimetric to volumetric soil moisture content.
Specification
- Type: float
- Required: No
- Units: g/cm³
- Example:
1.6
Technical Details
- Important for use in converting neutrons to soil moisture (particularly converting gravimetric to volumetric soil moisture)
- Influences effective measurement depth
- Can be automatically calculated if calibration sample data is provided with this data in it (a site average is used)
N0¶
Description
Site-specific calibration parameter that converts corrected neutron counts to soil moisture. This parameter is crucial for the accuracy of soil moisture measurements.
Specification
- Type: float
- Required: No
- Example:
1100
Technical Details
- Determined through field calibration
- Can be calibrated with soil sampling data if this option turned on
- If no calibration data is availble, you will have to guess it. Although this will mean there is a bias problem in your data.
beta_coefficient¶
Description
Site-specific coefficient used in the atmospheric pressure correction of neutron count rates.
Specification
- Type: float
- Required: No
- Units: hPa⁻¹
- Example:
0.0076
Technical Details
- Used in pressure correction equations
- Location and elevation dependent
- Affects neutron count normalization
- Will be automatically calculated in neptoon if not provided using supplied elevation and latitude data
mean_pressure¶
Description
The long-term average atmospheric pressure at the monitoring site. Used as a reference pressure for neutron count corrections.
Specification
- Type: float
- Required: No
- Units: hPa (hectopascals)
- Example:
1013.25
Technical Details
- Used for pressure corrections
- Elevation dependent
- Will be automatically calculated in neptoon if not provided using elevation and lat/lon data
time_zone¶
Description
The time zone offset from UTC for the monitoring site. Essential for proper temporal alignment of data.
Specification
- Type: string
- Required: Yes
- Format: ±H
- Example:
"+1"or"-10"
Raw Data Parse Options (raw_data_parse_options)¶
This section configures how neptoon reads and interprets raw data files from CRNS sensors. These settings are crucial for correctly importing data from various sensor manufacturers and file formats.
parse_raw_data¶
Description
Primary toggle that determines whether neptoon should process raw data files or expect pre-processed data.
Specification
- Type: boolean
- Required: Yes
- Example:
True
Technical Details
- Controls entire raw data processing pipeline
- Determines workflow path
data_location¶
Description
Path to the raw data files or archive. Supports individual files, directories, or compressed archives.
Specification
- Type: string (path)
- Required: Yes (if parse_raw_data is True)
- Example:
"data/CRNS-station_data.zip"or"../raw_data/"
Technical Details
- Supports absolute or relative paths
- Handles zip and tar archives automatically
- Recursive directory scanning
- Path resolution relative to config file
- Supported archive formats: .zip, .tar
column_names¶
Description
Explicit list of column names in the order they appear in the raw data files. Provides direct control over column identification and naming.
Specification
- Type: list[string]
- Required: No
- Example:
Technical Details
- Overrides automatic header detection
- Case-sensitive matching
- Maintains column order
prefix¶
Description
String pattern used to filter raw data files by their filename prefix.
Specification
- Type: string
- Required: No
- Example: "CRNS_"
Technical Details
- Case-sensitive matching
- Used in file selection phase
suffix
Description String pattern used to filter raw data files by their filename suffix. Specification
- Type: string
- Required: No
- Example: ".dat" or ".txt"
Technical Details
- Case-sensitive matching
- Include the dot for file extensions
- Applied after prefix filtering
skip_lines¶
Description
Number of lines to skip at the beginning of each data file.
Specification
- Type: integer
- Required: No
- Default:
0 - Example:
3
Technical Details
- Affects all files in batch
encoding¶
Description
Specifies the character encoding used in the raw data files. Critical for correct text interpretation, especially with international characters.
Specification
- Type: string
- Required: No
- Default:
"utf-8" - Example:
"cp850"
Technical Details
- Common options:
"utf-8": Universal encoding"cp850": Windows Western European"ascii": 7-bit ASCII"latin1": ISO-8859-1
separator¶
Description
Character used to separate columns in the raw data files. Must be explicitly defined to ensure correct data parsing.
Specification
- Type: string
- Required: Yes
- Example:
","
Technical Details
- Common separators:
- ",": CSV files
- "\t": Tab-separated
- ";": European CSV
- "|": Pipe-separated
- Must be in quotes
decimal¶
Description
Character used as decimal separator in numeric values.
Specification
- Type: string
- Required: No
- Default:
"." - Example:
","
Technical Details
- Must be in quotes
skip_initial_space¶
Description
Controls whether leading whitespace in data fields should be removed during parsing.
Specification
- Type: boolean
- Required: No
- Default:
True - Example:
True
parser_kw¶
Description
AAdditional parser key words
Specification
- Type: object
- Required: No
- Properties:
strip_left: boolean - Remove leading whitespacedigit_first: boolean - Expect numeric data at start
Technical Details
- Specialized parsing behavior
- Applied during data import
starts_with¶
Description
String pattern that identifies header lines in the data files.
Specification
- Type: string
- Required: No
- Default:
"" - Example:
"#"
Technical Details
- Used in header detection
- Case-sensitive matching
multi_header¶
Description
Indicates whether data files contain multiple header lines that need special processing.
Specification
- Type: boolean
- Required: No
- Default:
False - Example:
False
strip_names¶
Description
Controls whether whitespace should be removed from column names during parsing.
Specification
- Type: boolean
- Required: No
- Default:
True - Example:
True
Technical Details
- Applied to column headers
- Affects column name matching
remove_prefix¶
Description
String pattern to be removed from the beginning of column names.
Specification
- Type: string
- Required: No
- Example:
"//"
Technical Details
- Must be in quotes
- Used for cleanup of raw headers
Time Series Data (time_series_data)¶
This section defines how data is formatted ready for use in neptoon. It presumes that the format has already been compiled into a .csv format.
path_to_data¶
Description
The path to the .csv containing time series data
Specification
- Type: string
- Required: Yes (if no parsing done)
- Example:
/path/to/data.csv
Temporal Configuration (time_series_data.temporal)¶
input_resolution¶
Description
Specifies the time step of the input data, critical for proper temporal processing and aggregation.
Specification
- Type: string
- Required: Yes
- Format:
<number><unit> - Example:
"15mins"or"1hour"
Technical Details
- Valid units:
- Minutes: "min", "minute", "minutes"
- Hours: "hour", "hours", "hr", "hrs"
- Days: "day", "days"
- Number must be positive integer
output_resolution¶
Description
Desired time step for processed data output. Determines the temporal resolution of final results.
Specification
- Type: string
- Required: Yes
- Format:
<number><unit>or"None" - Example:
"1hour"
Technical Details
- Must be greater than or equal to input_resolution
- Use "None" to maintain input resolution
- When different from input aggregation will occur
align_timestamps¶
Description
Controls whether timestamps should be aligned to regular intervals.
Specification
- Type: boolean
- Required: Yes
- Example:
true
Technical Details
- Ensures consistent temporal spacing
- Affects data aggregation methods
- If aggregation occurs this is ignored (already aligned)
alignment_method¶
Description
Specifies how timestamps should be aligned when processing data.
Specification
- Type: string
- Required: Yes
- Example:
"time","nshift"
Technical Details
- "time": Aligns to clock intervals
- "index": Maintains equal spacing
- See here for more details
Key Column Configuration¶
epithermal_neutron_columns¶
Description
Specifies which columns contain epithermal neutron count data, the primary measurement for soil moisture estimation.
Specification
- Type: list[string]
- Required: Yes
- Example:
Technical Details
- Must match column names exactly
thermal_neutron_columns¶
Description
Identifies columns containing thermal neutron count data, used for advanced corrections and quality control.
Specification
- Type: list[string]
- Required: No
- Example:
Technical Details
- Optional, recommended if available
neutron_count_units¶
Description
Specifies the units of the neutron count measurements.
Specification
- Type: string
- Required: Yes
- Options:
"absolute_count""counts_per_hour""counts_per_second"
- Example:
"absolute_count"
Technical Details
- Affects count rate calculations
- Critical for cross-site comparisons
- Must match sensor configuration
- Internally counts are converted into absolute counts (raw) and counts_per_hour (corrected)
pressure_columns¶
Description
List of columns containing atmospheric pressure measurements, in order of priority.
Specification
- Type: list[string]
- Required: Yes
- Example:
Technical Details
- Order determines priority in 'priority' merge method
- All must use same units
pressure_units¶
Description
Units of the pressure measurements in the specified columns.
Specification
- Type: string
- Required: Yes
- Options:
"hectopascals","millibars" - Example:
"hectopascals"
Technical Details
- Must be consistent across all pressure columns
- Standard is hectopascals
pressure_merge_method¶
Description
Method used to combine multiple pressure measurements when available.
Specification
- Type: string
- Required: Yes
- Options:
"priority","mean" - Example:
"priority"
Technical Details
- "priority": Uses highest priority available
- "mean": Averages all available values
- Handles missing data automatically
temperature_columns¶
Description
List of columns containing air temperature measurements, in order of priority.
Specification
- Type: string
- Required: Yes
- Example:
Technical Details
- Order determines priority
- All must use same units
- Used in humidity corrections
temperature_units¶
Description
Units of the temperature measurements.
Specification
- Type: string
- Required: Yes
- Options:
"celcius","kelvin","fahrenheit" - Example:
"celcius"
Technical Details
- Must be consistent across all temperature columns
temperature_merge_method¶
Description
Method used to combine multiple temperature measurements when available.
Specification
- Type: string
- Required: Yes
- Options:
"priority","mean" - Example:
"priority"
Technical Details
- Follows same logic as pressure_merge_method
date_time_columns¶
Description
Columns containing temporal information for measurements.
Specification
- Type: list[string]
- Required: Yes
- Example:
["Date Time(UTC)"]
Technical Details
- Must contain valid datetime information
- Used for all temporal alignment
- Critical for data processing
- Multiple columns can be merged e.g.,
['Date', 'Time']
date_time_format¶
Description
Format string specifying how datetime information is encoded.
Specification
- Type: string
- Required: Yes
- Format: Python datetime format string
- Example:
"%Y/%m/%d %H:%M:%S"
Technical Details
- Must be in quotes
- Follows Python strftime format
initial_time_zone¶
Description
Timezone of data. Most CRNS data is given in UTC, but if it's not we can handle that here.
Specification
- Type: string
- Required: Yes
- Example:
utcorEurope/Berlin
convert_time_zone_to¶
Description
Timezone to convert data to STRONG recommendation to leave this as utc.
Specification
- Type: string
- Required: Yes
- Example:
utc
Technical Details
Quality Assessment Configuration¶
The Quality Assessment (QA) system in neptoon allows you to validate meteorological data used in soil moisture estimation. The system currently supports QA checks on three key meteorological variables and provides two different assessment methods.
Supported Variables¶
Quality assessment can be performed on the following meteorological variables:
air_relative_humidityair_pressureair_temperature
For QA on soil moisture data the style is like:
Assessment Methods¶
1. Range Check (flag_range)¶
The range check method flags values that fall outside specified minimum and maximum thresholds.
Required Parameters¶
min: Minimum acceptable value (in data units)max: Maximum acceptable value (in data units)
Example Configuration¶
input_data_qa:
air_pressure:
flag_range:
min: 850 # hPa
max: 1050 # hPa
air_relative_humidity:
flag_range:
min: 0 # %
max: 100 # %
air_temperature:
flag_range:
min: -30 # °C
max: 50 # °C
2. Univariate Local Outlier Factor (spike_uni_lof)¶
This method uses the Local Outlier Factor algorithm to detect anomalies in univariate time series data. More information on this here
Optional Parameters¶
-
periods_in_calculation: Number of time steps included in LOF calculation- Default: 20
- Units: time steps
-
threshold: Threshold for flagging outliers- Default: 1.5
- Units: decimal
-
algorithm: Algorithm for calculating nearest neighbors- Default: "ball_tree"
- Options: ["ball_tree", "kd_tree", "brute", "auto"]
Example Configuration¶
input_data_qa:
air_temperature:
spike_uni_lof:
periods_in_calculation: 24 # Use 24 time steps
threshold: 2.0 # More permissive threshold
algorithm: "ball_tree" # Default algorithm
Complete Example¶
Here's a complete example showing how to combine both methods:
input_data_qa:
air_pressure:
flag_range:
min: 850
max: 1050
spike_uni_lof:
periods_in_calculation: 12
threshold: 1.8
air_relative_humidity:
flag_range:
min: 0
max: 100
spike_uni_lof:
periods_in_calculation: 6
threshold: 1.3
air_temperature:
flag_range:
min: -30
max: 50
Best Practices¶
-
Range Selection
- Choose ranges based on physically possible values for your location
- Consider seasonal variations when setting thresholds
-
LOF Parameters
periods_in_calculation: Choose based on your data's temporal resolution- Hourly data: 24 periods = 1 day window
- 15-min data: 96 periods = 1 day window
threshold: Start conservative (1.5) and adjust based on resultsalgorithm: Use default unless you have specific performance requirements
Notes¶
- QA configuration is optional but recommended
- Methods can be applied individually or in combination
- Configuration is applied during data processing via the CRNSDataHub
- Flagged data will be excluded from subsequent processing steps
Calibration (calibration)¶
calibrate¶
Description
Toggle for whether calibration will be done.
Specification
- Type: boolean
- Required: Yes
- Example:
True
data_format¶
Description
(WIP) automatic formatting for set styles.
Specification
- Type: string
- Required: No
- Example:
custom
location¶
Description
Location of the calibration data.
Specification
- Type: string
- Required: No
- Example:
home_dir/example_data/FSCD001_calibration.csv
date_time_format¶
Description
DateTime format of the calibration data.
Specification
- Type: string
- Required: No
- Example:
"%d.%m.%Y %H:%M"
Key Column Names (calibration.key_column_names)¶
date_time¶
Description
Name of the column with DateTime.
Specification
- Type: string
- Required: No
- Example:
"DateTime_utc"
profile_id¶
Description
Name of the column with profile ID.
Specification
- Type: string
- Required: No
- Example:
"Profile_ID"
sample_depth¶
Description
Name of the column with sample depth values.
Specification
- Type: string
- Required: No
- Example:
"Profile_Depth_cm"
radial_distance_from_sensor¶
Description
Name of the column with distance of the sample from the sensor (m).
Specification
- Type: string
- Required: No
- Example:
"Distance_to_CRNS_m"
bulk_density_of_sample¶
Description
Name of the column with bulk density of the samples.
Specification
- Type: string
- Required: No
- Example:
"DryBulkDensity_g_cm3"
gravimetric_soil_moisture¶
Description
Name of the column with gravimetric soil moisture values.
Specification
- Type: string
- Required: No
- Example:
"SoilMoisture_g_g"
soil_organic_carbon¶
Description
Name of the column with soil organic carbon values.
Specification
- Type: string
- Required: No
- Example:
"SoilOrganicCarbon_g_g"
lattice_water¶
Description
Name of the column with lattice water values.
Specification
- Type: string
- Required: No
- Example:
"LatticeWater_g_g"
Data Storage Options (data_storage)¶
save_location¶
Description
Directory for saving outputs (data, flags, figures, report).
Specification
- Type: string
- Required: No
- Example:
~/neptoon-outputs/station_A101or/data/outputs
Technical Details
- Accepts absolute paths (
/home/user/outputs) or home-relative paths (~/outputs); the~is expanded to the user's home directory. - Relative paths are rejected at config load time with a clear error. Output directories are ambiguous when relative (config-file-relative vs. cwd-relative), so neptoon refuses to guess.
- Missing parent directories are created automatically.
- If left blank, outputs are written to the current working directory.
- Used by
save_data()as the default output directory for both theProcessWithConfigandDataHubFromConfig(step-by-step / notebook) workflows. Passingsave_folder_locationtosave_data()explicitly still takes precedence.
append_yaml_hash_to_folder_name¶
Description
(WIP) Add configuration hash to folder names.
Specification
- Type: boolean
- Required: No
- Example:
False
Technical Details
- Work In Progress - check back soon
create_report¶
Description
Whether to create the pdf report during the processing run. When selected the Magazine system is turned on and information and figures are prepared in a report and saved with the data.
Specification
- Type: boolean
- Required: Yes
- Example:
True
---¶
Figures Options (figures)¶
create_figures¶
Description
Generate visualization figures and automatically store them when saved.
Specification
- Type: boolean
- Required: Yes
- Example:
True
Technical Details
- Figures are saved in the folder when saved
make_all_figures¶
Description
Generate all available figure types in figure registry.
Specification
- Type: boolean
- Required: Yes
- Example:
True
custom_list¶
Description
List of specific figures to generate if not doing all
Specification
- Type: list
- Required: No
- Example:
[nmdb_incoming_radiation]
Technical Details
- If
make_all_figuresis true this is ignored.