Overview¶

The sensor configuration file tells neptoon about the sensor being processed. The sections in this file are: config, sensor_info, raw_data_parse_options, time_series_data, input_data_qa, soil_moisture_qa, calibration, data_storage and figures. Below is an example file which you can use a starting point for your own sensor, a quick start guide and a more detailed reference to each possiblility.

Some of the inputs can be left blank initially, and neptoon will calculate the values during processing (e.g., N0 when calibrating). The output folder when processing is complete will include a file called sensor_config.yaml. This is the config file used as input with any additional calculated information included.

File Structure¶

config: sensor

sensor_info:
  name: Cunnesdorf_test_site
  country: Germany
  identifier: A102
  install_date: 2016-10-21
  latitude: 51.369597
  longitude: 12.557120
  elevation: 113
  time_zone: +1
  site_cutoff_rigidity: 2.94
  avg_lattice_water: 0.0043
  avg_soil_organic_carbon: 0.0184
  avg_dry_soil_bulk_density: 1.6
  N0: 1100
  beta_coefficient:
  mean_pressure:
  above_ground_biomass:

##################################################################
#    The below section is optional and is only necessary         #
#    when you are processing data which has been taken from      #
#    a logger directly. E.g., An SD card from your CRNS.         #
#                                                                #
#    If you already have your data in a single csv file set      #
#    parse_raw_data to False and go to the next section.         #
##################################################################

raw_data_parse_options:
  parse_raw_data: True
  data_location: ../example_data/CRNS-station_data-Hydroinnova-A.zip
  column_names:
  prefix: ""
  suffix: ""
  encoding: "cp850"
  skip_lines: 0
  separator: ","
  decimal: "."
  skip_initial_space: True
  parser_kw: # leave blank
    strip_left: True
    digit_first: True
  starts_with: ""
  multi_header: False
  strip_names: True
  remove_prefix: "//"

##################################################################
#    This section provides information to neptoon about your    #
#    input data. Things like where to find it (path_to_data)    #
#    and what the key columns are names. When multiple columns  #
#    are present (often the case for redundancy) they will be   #
#    merged into one. In this case add them as a list as shown  #
#    below and state the merge method (priority recommended).   #
##################################################################

time_series_data:
  path_to_data: # blank if parsing raw data
  key_column_info:
    epithermal_neutron_columns:
      - N2Cts
    thermal_neutron_columns:
      - N1Cts
    neutron_count_units: absolute_count
    pressure_columns:
      - P4_mb # first priority goes here
      - P3_mb
      - P1_mb
    pressure_units: hectopascals
    pressure_merge_method: priority
    temperature_columns:
      - T1_C
      - T2_C
    temperature_units: celcius
    temperature_merge_method: priority
    relative_humidity_columns:
      - RH1
    relative_humidity_units: percent
    relative_humidity_merge_method: priority
    date_time_columns:
      - Date Time(UTC)
    date_time_format: "%Y/%m/%d %H:%M:%S"
    initial_time_zone: utc
    convert_time_zone_to: utc
    temporal_diff_col_name: none # Leave as none is you don't have a diff col

input_data_qa:
  air_relative_humidity:
    flag_range:
      min: 0
      max: 100
  air_pressure:
    flag_range:
      min: 600
      max: 1300

soil_moisture_qa:
  soil_moisture:
    flag_range:
      min: 0
      max: 1

calibration:
  calibrate: False
  data_format: custom
  location: /path/to/location
  key_column_names:
    profile_id_column: PROF
    sample_depth: DETPH_AVG
    radial_distance_from_sensor: LOC_rad

data_storage:
  save_location:
  append_yaml_hash_to_folder_name: False
  create_report: True

figures:
  create_figures: True
  make_all_figures: True
  custom_list:
    - nmdb_incoming_radiation

Configuration Quick Reference Guide¶

Sensor Information¶

This config section provides information on individual sensor being processed. Some of this information is crucial for data processing, such as elevation or latitude. Others are used in organising the data outputs - like adding the name to save folders.

It is always better to fill this in as best you can.

The beta_coefficient can be automatically calculated if left blank using elevation and latitude

Same for the N0 however this requires the calibration section to be correctly filled (otherwise you'll have to guess). Generally speaking when calibrating it's also possible to automatically generate the values for avg_lattice_water and avg_soil_organic_carbon, as long as they are available in your calibration dataset.

Parameter	Required	Type	Example	Description
name	Yes	string	`Cunnesdorf_test_site`	Site identifier used for file naming and metadata
country	No	string	`DEU`	Country where sensor is located
identifier	No	string	`A102`	Unique sensor identifier code
install_date	Yes	string	`2016-10-21`	Date sensor was installed (YYYY-MM-DD)
latitude	Yes	float	`51.369597`	Site latitude in decimal degrees
longitude	Yes	float	`12.557120`	Site longitude in decimal degrees
elevation	Yes	float	`113`	Site elevation in meters
time_zone	Yes	string	`+1`	Time zone offset from UTC
site_cutoff_rigidity	No	float	`2.94`	Geomagnetic cutoff rigidity in GV
avg_lattice_water	No	float	`0.0043`	Average lattice water content as decimal (e.g., 0.0043 = 0.43%)
avg_soil_organic_carbon	No	float	`0.0184`	Soil organic carbon content as decimal (e.g., 0.0184 = 1.84%)
avg_dry_soil_bulk_density	No	float	`1.6`	Dry soil bulk density in g/cm³
N0	No	float	`1100`	Calibration parameter for neutron-to-soil moisture conversion
beta_coefficient	No	float	-	Site-specific coefficient for pressure correction
mean_pressure	No	float	-	Reference atmospheric pressure for corrections

Raw Data Parse Options¶

Raw Data Parsing refers to the required step to take your raw data files (e.g., found on the SD card in the logger) and converting them to a single csv file. No data manipulation is done at this stage. In it's simplest form it will take a list of .txt files and order them by date. When things get more complicated (e.g., only files with a certain prefix contain CRNS data), other settings are required.

Is this step needed?

You can skip this step if your data is already available as a single csv file. Simply set parse_raw_data to False and move on to time series data below.

Requirements

For this section the Required column will change if you select True for parse_raw_data.

Parameter	Required	Type	Example	Description
parse_raw_data	Yes	boolean	`True`	Toggle for raw data parsing functionality. When False, this entire section is ignored
data_location	No	Path	`"data/CRNS-station_data.zip"`	Path to raw data files/directory. Supports folders, zip, or tar archives
column_names	No	List[str]	`["date", "time", "counts"]`	Expected column names in order. If not provided, will attempt auto-detection
prefix	No	string	`"CRNS_"`	Filter raw files by filename prefix
suffix	No	string	`".dat"` or `.txt`	Filter raw files by filename suffix
encoding	No	string	`"cp850"`	File encoding format. Common alternatives: utf-8, ascii
skip_lines	No	integer	`2`	Number of header/metadata lines to skip before data
separator	No	string	`","`	Column delimiter character (e.g., comma, tab, semicolon)
decimal	No	string	`"."`	Decimal point character.
skip_initial_space	No	boolean	`True`	Remove leading whitespace in data fields
parser_kw	No	object	-	Advanced parser configuration
├─ strip_left	No	boolean	`True`	Remove leading whitespace in fields
├─ digit_first	No	boolean	`True`	Expect numeric data at start of line
starts_with	No	string	`"#"`	Required prefix for header lines
multi_header	No	boolean	`False`	Support for multi-line header formats
strip_names	No	boolean	`True`	Remove whitespace from column names
remove_prefix	No	string	`"//"`	Remove lines that start with this

Additional Information

Paths in data_location follow the unified path policy: absolute, ~/home-relative, or relative to the config file's directory.
When column_names is not provided, the parser attempts to detect headers from the first file
For compressed data, both .zip and .tar formats are automatically detected and extracted

Time Series Data¶

The time series data section is interested in how we prepare your CRNS time series data for processing. It imagines that your data is at least in a datetime ordered csv format. You state where that data is with the path_to_data setting and it will read it in and begin preperations. If you needed to run the above "Raw Data Parse Options" stage, the path to the data is not really needed, but the remaining settings are!

Parameter	Required	Type	Example	Description
path_to_data	No	string	-	Path to pre-processed data (leave blank if parsing raw data)

Key Column Information (Time Series Data)¶

Here we define settings to prepare the data for processing. For example in neptoon we standardise all neutron counts to counts per hour (cph). So if your data is in another format, state it here and neptoon will take care of the conversion.

Other things include if you have multiple columns of certain data readings (e.g., multiple pressure sensors). State the names in a list under the specific column section and state how you wish to merge them into a single column. priority means it will use the first value in the list and gap fill with the next if missing. average means it will take the mean.

Parameter	Required	Type	Example	Description
epithermal_neutron_columns	Yes	list	`[N2Cts]`	Columns containing epithermal neutron counts
thermal_neutron_columns	No	list	`[N1Cts]`	Columns containing thermal neutron counts
neutron_count_units	Yes	string	`absolute_count` or `counts_per_hour` or `counts_per_second`	Units for neutron measurements
pressure_columns	Yes	list	`[P4_mb, P3_mb, P1_mb]`	Pressure columns in priority order
pressure_units	Yes	string	`hectopascals`	Units for pressure measurements
pressure_merge_method	Yes	string	`priority`	How to handle multiple pressure columns
temperature_columns	Yes	list	`[T1_C, T2_C]`	Temperature measurement columns
temperature_units	Yes	string	`celcius`	Units for temperature measurements
temperature_merge_method	Yes	string	`priority`	How to handle multiple temperature columns
relative_humidity_columns	Yes	list	`[RH1]`	Relative humidity measurement columns
relative_humidity_units	Yes	string	`percent`	Units for humidity measurements
date_time_columns	Yes	list	`[Date Time(UTC)]`	Columns containing date/time data
date_time_format	Yes	string	`"%Y/%m/%d %H:%M:%S"`	Format string for parsing dates

Time Formats

DateTime format strings must be enclosed in quotes (e.g., "%Y/%m/%d %H:%M:%S") to comply with YAML syntax.

Quality Assessment Settings¶

We include some simple options for quality assessment in neptoon using SaQC as the back-end. More information about how to do this is provided further below on this page (and examples are shown in the example config above).

We do not plan to expand this further to avoid scope creep. Neptoon is designed to process CRNS data. We provide some options to QA data used directly in this process. To QA any additional co-located sensors, we would recommend using a system designed for QA specifically (e.g., SaQC).

Calibration¶

Calibration finds your N0 term. For this we need sample data acquired from the site. When available the following section tells neptoon where the data is and what the format is.

Parameter	Required	Type	Example	Description
calibrate	Yes	boolean	`True`	Toggle for whether calibration will be done
location	No	string	`home_dir/example_data/FSCD001_calibration.csv`	Location of the calibration data
date_time_format	No	string	`"%d.%m.%Y %H:%M"`	DateTime format of the calibration data

Key Column Names (Calibration)¶

These values are required if calibrate is set to true in the above section.

Parameter	Required	Type	Example	Description
date_time	No	string	`"DateTime_utc"`	Name of the column with DateTime
profile_id	No	string	`"Profile_ID"`	Name of the column with profile ID
sample_depth	No	string	`"Profile_Depth_cm"`	Name of the column with sample depth values
radial_distance_from_sensor	No	string	`"Distance_to_CRNS_m"`	Name of the column with distance of the sample from the sensor (m)
bulk_density_of_sample	No	string	`"DryBulkDensity_g_cm3"`	Name of the column with bulk density of the samples
gravimetric_soil_moisture	No	string	`"SoilMoisture_g_g"`	Name of the column with gravimetric soil moisture values
soil_organic_carbon	No	string	`"SoilOrganicCarbon_g_g"`	Name of the column with soil organic carbon values
lattice_water	No	string	`"LatticeWater_g_g"`	Name of the column with lattice water values

Data Storage¶

Parameter	Required	Type	Example	Description
save_location	No	string	`~/outputs`, `/data/outputs`, or `./output`	Directory for saving outputs. Accepts absolute, home-relative (`~/`), or relative paths (resolved against the config file's directory). If blank, outputs go to the current working directory. Can be overridden at run-time with `--save-location`.
append_timestamp_to_folder_name	No	boolean	`True`	Whether to append a timestamp to the output folder name. Useful when experimenting to avoid overwriting data.
create_report	No	boolean	`true`	Whether to create a detailed report of your data outputs during the processing run and save it into the output folder

Figures¶

Figures are tightly coupled to the create_report feature above. Neptoon will produce some useful figures helping to describe your data for quick visual checks post processing. These can be turned off if not required. Otherwise the figures are saved into a folder in the output folder, and included in the report if this is turned on.

Parameter	Required	Type	Example	Description
create_figures	Yes	boolean	`True`	Generate visualization figures
make_all_figures	No	boolean	`True`	Generate all available figure types in figure registry
custom_list	No	list	`[nmdb_incoming_radiation]`	List of specific figures to generate

Paths in config files¶

All path fields (data_location, path_to_data, calibration.location, save_location) follow the same four-rule policy:

Input	Resolved to
blank / `null`	`None` — triggers any applicable fallback (e.g. cwd in `save_data`)
`~/foo`	User home directory
`/abs/path`	As-is (normalised)
`./rel` or `rel`	Config file's own directory

This means you can use relative paths freely — neptoon always knows which directory to anchor them to.

Deprecated: working_directory:

An older top-level working_directory: key was previously used to override the resolution basis for relative paths. It is now deprecated and will be removed in a future release. Use relative paths instead — they already resolve against the config file's directory automatically.

Detailed Configuration Reference¶

Below is more details on some of the features of sensor config file.

Sensor Information (`sensor_info`)¶

This section contains essential metadata about your Cosmic-Ray Neutron Sensor (CRNS) station and site characteristics. This information is crucial for accurate soil moisture estimation and data organization.

`name`¶

Description
A unique identifier for the monitoring station that will be used in file naming and outputs.

Specification

Type: string
Required: Yes
Example: "Cunnesdorf_test_site"

Technical Details

Should be URL-safe (avoid special characters)
Used as default folder name for outputs
No spaces recommended (use underscores)

`identifier`¶

Description
The unique hardware identifier for the CRNS unit. This can be an additional was to identify the site.

Specification

Type: string
Required: No
Example: "A102"

`install_date`¶

Description
The date when the CRNS was installed at the monitoring site.

Specification

Type: Date string
Format: YYYY-MM-DD
Required: Yes
Example: "2016-10-21"

Technical Details

Used as cutoff for data processing
Single-digit months/days require leading zeros
Must be in format YYYY-MM-DD to be registered as a date

`latitude` and `longitude`¶

Description
Geographic coordinates of the CRNS installation location.

Specification

Type: float
Required: Yes
Range: -90 to 90 (latitude), -180 to 180 (longitude)
Example: 51.369597, 12.557120

Technical Details

Decimal degrees format

`elevation`¶

Description

Height above sea level of the CRNS installation site.

Specification

Type: float
Required: Yes
Units: meters above sea level
Example: 113

Technical Details

Used in atmospheric pressure corrections
Important for neutron flux calculations

`site_cutoff_rigidity`¶

Description
The geomagnetic cutoff rigidity at the installation site, which affects cosmic ray flux.

Specification

Type: float
Required: No
Units: GV (gigavolts)
Example: 2.94

Technical Details

Affects incoming neutron corrections
Location-dependent parameter
www.crnslab.org provides methods to calculate this with latitude and longitude values
If not supplied in config it will use a lookup table to find the value with lat and lon

`avg_lattice_water`¶

Description
The average lattice water content in soil minerals at the monitoring site.

Specification

Type: float
Required: No
Units: g/g (decimal percentage)
Example: 0.0043

Technical Details

Represented as decimal (0.0043 = 0.43%)
Site-specific constant
Used in soil moisture conversion
If not supplied defaults to 0
Can be automatically calculated if calibration sample data is available and lattice water content is a provided data.

`avg_soil_organic_carbon`¶

Description
The average soil organic carbon content at the monitoring site.

Specification

Type: float
Required: No
Units: g/g (decimal percentage)
Example: 0.0184

Technical Details

Represented as decimal (0.0184 = 1.84%)
If not supplied defaults to 0
Can be automatically calculated if calibration sample data is provided with this in it (a site average is used)
Used in soil moisture conversion equations

`avg_dry_soil_bulk_density`¶

Description
The average dry soil bulk density across the CRNS footprint. This parameter is essential for converting gravimetric to volumetric soil moisture content.

Specification

Type: float
Required: No
Units: g/cm³
Example: 1.6

Technical Details

Important for use in converting neutrons to soil moisture (particularly converting gravimetric to volumetric soil moisture)
Influences effective measurement depth
Can be automatically calculated if calibration sample data is provided with this data in it (a site average is used)

`N0`¶

Description
Site-specific calibration parameter that converts corrected neutron counts to soil moisture. This parameter is crucial for the accuracy of soil moisture measurements.

Specification

Type: float
Required: No
Example: 1100

Technical Details

Determined through field calibration
Can be calibrated with soil sampling data if this option turned on
If no calibration data is availble, you will have to guess it. Although this will mean there is a bias problem in your data.

`beta_coefficient`¶

Description
Site-specific coefficient used in the atmospheric pressure correction of neutron count rates.

Specification

Type: float
Required: No
Units: hPa⁻¹
Example: 0.0076

Technical Details

Used in pressure correction equations
Location and elevation dependent
Affects neutron count normalization
Will be automatically calculated in neptoon if not provided using supplied elevation and latitude data

`mean_pressure`¶

Description
The long-term average atmospheric pressure at the monitoring site. Used as a reference pressure for neutron count corrections.

Specification

Type: float
Required: No
Units: hPa (hectopascals)
Example: 1013.25

Technical Details

Used for pressure corrections
Elevation dependent
Will be automatically calculated in neptoon if not provided using elevation and lat/lon data

`time_zone`¶

Description
The time zone offset from UTC for the monitoring site. Essential for proper temporal alignment of data.

Specification

Type: string
Required: Yes
Format: ±H
Example: "+1" or "-10"

Raw Data Parse Options (`raw_data_parse_options`)¶

This section configures how neptoon reads and interprets raw data files from CRNS sensors. These settings are crucial for correctly importing data from various sensor manufacturers and file formats.

`parse_raw_data`¶

Description
Primary toggle that determines whether neptoon should process raw data files or expect pre-processed data.

Specification

Type: boolean
Required: Yes
Example: True

Technical Details

Controls entire raw data processing pipeline
Determines workflow path

`data_location`¶

Description
Path to the raw data files or archive. Supports individual files, directories, or compressed archives.

Specification

Type: string (path)
Required: Yes (if parse_raw_data is True)
Example: "data/CRNS-station_data.zip" or "../raw_data/"

Technical Details

Supports absolute or relative paths
Handles zip and tar archives automatically
Recursive directory scanning
Path resolution relative to config file
Supported archive formats: .zip, .tar

`column_names`¶

Description
Explicit list of column names in the order they appear in the raw data files. Provides direct control over column identification and naming.

Specification

Type: list[string]
Required: No

Example:

column_names:
  - date_time
  - neutron_counts
  - pressure_hpa
  - temperature_c
  - humidity_percent

Technical Details

Overrides automatic header detection
Case-sensitive matching
Maintains column order

`prefix`¶

Description

String pattern used to filter raw data files by their filename prefix.

Specification

Type: string
Required: No
Example: "CRNS_"

Technical Details

Case-sensitive matching
Used in file selection phase

suffix

Description String pattern used to filter raw data files by their filename suffix. Specification

Type: string
Required: No
Example: ".dat" or ".txt"

Technical Details

Case-sensitive matching
Include the dot for file extensions
Applied after prefix filtering

`skip_lines`¶

Description
Number of lines to skip at the beginning of each data file.

Specification

Type: integer
Required: No
Default: 0
Example: 3

Technical Details

Affects all files in batch

`encoding`¶

Description
Specifies the character encoding used in the raw data files. Critical for correct text interpretation, especially with international characters.

Specification

Type: string
Required: No
Default: "utf-8"
Example: "cp850"

Technical Details

Common options:
- "utf-8": Universal encoding
- "cp850": Windows Western European
- "ascii": 7-bit ASCII
- "latin1": ISO-8859-1

`separator`¶

Description
Character used to separate columns in the raw data files. Must be explicitly defined to ensure correct data parsing.

Specification

Type: string
Required: Yes
Example: ","

Technical Details - Common separators: - ",": CSV files - "\t": Tab-separated - ";": European CSV - "|": Pipe-separated - Must be in quotes

`decimal`¶

Description
Character used as decimal separator in numeric values.

Specification

Type: string
Required: No
Default: "."
Example: ","

Technical Details

Must be in quotes

`skip_initial_space`¶

Description
Controls whether leading whitespace in data fields should be removed during parsing.

Specification

Type: boolean
Required: No
Default: True
Example: True

`parser_kw`¶

Description
AAdditional parser key words

Specification

Type: object
Required: No
Properties:
- strip_left: boolean - Remove leading whitespace
- digit_first: boolean - Expect numeric data at start

Technical Details

Specialized parsing behavior
Applied during data import

`starts_with`¶

Description
String pattern that identifies header lines in the data files.

Specification

Type: string
Required: No
Default: ""
Example: "#"

Technical Details

Used in header detection
Case-sensitive matching

`multi_header`¶

Description
Indicates whether data files contain multiple header lines that need special processing.

Specification

Type: boolean
Required: No
Default: False
Example: False

`strip_names`¶

Description
Controls whether whitespace should be removed from column names during parsing.

Specification

Type: boolean
Required: No
Default: True
Example: True

Technical Details

Applied to column headers
Affects column name matching

`remove_prefix`¶

Description
String pattern to be removed from the beginning of column names.

Specification

Type: string
Required: No
Example: "//"

Technical Details

Must be in quotes
Used for cleanup of raw headers

Time Series Data (`time_series_data`)¶

This section defines how data is formatted ready for use in neptoon. It presumes that the format has already been compiled into a .csv format.

`path_to_data`¶

Description
The path to the .csv containing time series data

Specification

Type: string
Required: Yes (if no parsing done)
Example: /path/to/data.csv

Temporal Configuration (`time_series_data.temporal`)¶

`input_resolution`¶

Description
Specifies the time step of the input data, critical for proper temporal processing and aggregation.

Specification

Type: string
Required: Yes
Format: <number><unit>
Example: "15mins" or "1hour"

Technical Details

Valid units:
- Minutes: "min", "minute", "minutes"
- Hours: "hour", "hours", "hr", "hrs"
- Days: "day", "days"
Number must be positive integer

`output_resolution`¶

Description
Desired time step for processed data output. Determines the temporal resolution of final results.

Specification

Type: string
Required: Yes
Format: <number><unit> or "None"
Example: "1hour"

Technical Details

Must be greater than or equal to input_resolution
Use "None" to maintain input resolution
When different from input aggregation will occur

`align_timestamps`¶

Description
Controls whether timestamps should be aligned to regular intervals.

Specification

Type: boolean
Required: Yes
Example: true

Technical Details

Ensures consistent temporal spacing
Affects data aggregation methods
If aggregation occurs this is ignored (already aligned)

`alignment_method`¶

Description
Specifies how timestamps should be aligned when processing data.

Specification

Type: string
Required: Yes
Example: "time", "nshift"

Technical Details

"time": Aligns to clock intervals
"index": Maintains equal spacing
See here for more details

Key Column Configuration¶

`epithermal_neutron_columns`¶

Description
Specifies which columns contain epithermal neutron count data, the primary measurement for soil moisture estimation.

Specification

Type: list[string]
Required: Yes

Example:

epithermal_neutron_columns:
  - N2Cts
  - ModNeutrons

Technical Details

Must match column names exactly

`thermal_neutron_columns`¶

Description

Identifies columns containing thermal neutron count data, used for advanced corrections and quality control.

Specification

Type: list[string]
Required: No
Example:
```
thermal_neutron_columns:
  - N1Cts
```

Technical Details

Optional, recommended if available

`neutron_count_units`¶

Description
Specifies the units of the neutron count measurements.

Specification

Type: string
Required: Yes
Options:
- "absolute_count"
- "counts_per_hour"
- "counts_per_second"
Example: "absolute_count"

Technical Details

Affects count rate calculations
Critical for cross-site comparisons
Must match sensor configuration
Internally counts are converted into absolute counts (raw) and counts_per_hour (corrected)

`pressure_columns`¶

Description
List of columns containing atmospheric pressure measurements, in order of priority.

Specification

Type: list[string]
Required: Yes

Example:

pressure_columns:
  - P4_mb  # Primary sensor
  - P3_mb  # Backup sensor
  - P1_mb  # Tertiary sensor

Technical Details

Order determines priority in 'priority' merge method
All must use same units

`pressure_units`¶

Description
Units of the pressure measurements in the specified columns.

Specification

Type: string
Required: Yes
Options: "hectopascals", "millibars"
Example: "hectopascals"

Technical Details

Must be consistent across all pressure columns
Standard is hectopascals

`pressure_merge_method`¶

Description
Method used to combine multiple pressure measurements when available.

Specification

Type: string
Required: Yes
Options: "priority", "mean"
Example: "priority"

Technical Details

"priority": Uses highest priority available
"mean": Averages all available values
Handles missing data automatically

`temperature_columns`¶

Description
List of columns containing air temperature measurements, in order of priority.

Specification

Type: string
Required: Yes
Example:
```
temperature_columns:
  - T1_C
  - T2_C
```

Technical Details

Order determines priority
All must use same units
Used in humidity corrections

`temperature_units`¶

Description
Units of the temperature measurements.

Specification

Type: string
Required: Yes
Options: "celcius", "kelvin", "fahrenheit"
Example: "celcius"

Technical Details

Must be consistent across all temperature columns

`temperature_merge_method`¶

Description
Method used to combine multiple temperature measurements when available.

Specification

Type: string
Required: Yes
Options: "priority", "mean"
Example: "priority"

Technical Details

Follows same logic as pressure_merge_method

`date_time_columns`¶

Description
Columns containing temporal information for measurements.

Specification

Type: list[string]
Required: Yes
Example: ["Date Time(UTC)"]

Technical Details

Must contain valid datetime information
Used for all temporal alignment
Critical for data processing
Multiple columns can be merged e.g., ['Date', 'Time']

`date_time_format`¶

Description
Format string specifying how datetime information is encoded.

Specification

Type: string
Required: Yes
Format: Python datetime format string
Example: "%Y/%m/%d %H:%M:%S"

Technical Details

Must be in quotes
Follows Python strftime format

`initial_time_zone`¶

Description
Timezone of data. Most CRNS data is given in UTC, but if it's not we can handle that here.

Specification

Type: string
Required: Yes
Example: utc or Europe/Berlin

`convert_time_zone_to`¶

Description
Timezone to convert data to STRONG recommendation to leave this as utc.

Specification

Type: string
Required: Yes
Example: utc

Technical Details

Quality Assessment Configuration¶

The Quality Assessment (QA) system in neptoon allows you to validate meteorological data used in soil moisture estimation. The system currently supports QA checks on three key meteorological variables and provides two different assessment methods.

Supported Variables¶

Quality assessment can be performed on the following meteorological variables:

air_relative_humidity
air_pressure
air_temperature

For QA on soil moisture data the style is like:

soil_moisture_qa:
  soil_moisture:
    flag_range:
      min: 0
      max: 1

Assessment Methods¶

1. Range Check (`flag_range`)¶

The range check method flags values that fall outside specified minimum and maximum thresholds.

Required Parameters¶

min: Minimum acceptable value (in data units)
max: Maximum acceptable value (in data units)

Example Configuration¶

input_data_qa:
  air_pressure:
    flag_range:
      min: 850  # hPa
      max: 1050 # hPa

  air_relative_humidity:
    flag_range:
      min: 0    # %
      max: 100  # %

  air_temperature:
    flag_range:
      min: -30  # °C
      max: 50   # °C

2. Univariate Local Outlier Factor (`spike_uni_lof`)¶

This method uses the Local Outlier Factor algorithm to detect anomalies in univariate time series data. More information on this here

Optional Parameters¶

periods_in_calculation: Number of time steps included in LOF calculation
- Default: 20
- Units: time steps
threshold: Threshold for flagging outliers
- Default: 1.5
- Units: decimal
algorithm: Algorithm for calculating nearest neighbors
- Default: "ball_tree"
- Options: ["ball_tree", "kd_tree", "brute", "auto"]

Example Configuration¶

input_data_qa:
  air_temperature:
    spike_uni_lof:
      periods_in_calculation: 24  # Use 24 time steps
      threshold: 2.0             # More permissive threshold
      algorithm: "ball_tree"     # Default algorithm

Complete Example¶

Here's a complete example showing how to combine both methods:

input_data_qa:
  air_pressure:
    flag_range:
      min: 850
      max: 1050
    spike_uni_lof:
      periods_in_calculation: 12
      threshold: 1.8

  air_relative_humidity:
    flag_range:
      min: 0
      max: 100
    spike_uni_lof:
      periods_in_calculation: 6
      threshold: 1.3

  air_temperature:
    flag_range:
      min: -30
      max: 50

Best Practices¶

Range Selection
- Choose ranges based on physically possible values for your location
- Consider seasonal variations when setting thresholds
LOF Parameters
- periods_in_calculation: Choose based on your data's temporal resolution
- Hourly data: 24 periods = 1 day window
- 15-min data: 96 periods = 1 day window
- threshold: Start conservative (1.5) and adjust based on results
- algorithm: Use default unless you have specific performance requirements

Notes¶

QA configuration is optional but recommended
Methods can be applied individually or in combination
Configuration is applied during data processing via the CRNSDataHub
Flagged data will be excluded from subsequent processing steps

Calibration (`calibration`)¶

`calibrate`¶

Description
Toggle for whether calibration will be done.

Specification

Type: boolean
Required: Yes
Example: True

`data_format`¶

Description
(WIP) automatic formatting for set styles.

Specification

Type: string
Required: No
Example: custom

`location`¶

Description
Location of the calibration data.

Specification

Type: string
Required: No
Example: home_dir/example_data/FSCD001_calibration.csv

`date_time_format`¶

Description
DateTime format of the calibration data.

Specification

Type: string
Required: No
Example: "%d.%m.%Y %H:%M"

Key Column Names (`calibration.key_column_names`)¶

`date_time`¶

Description
Name of the column with DateTime.

Specification

Type: string
Required: No
Example: "DateTime_utc"

`profile_id`¶

Description
Name of the column with profile ID.

Specification

Type: string
Required: No
Example: "Profile_ID"

`sample_depth`¶

Description
Name of the column with sample depth values.

Specification

Type: string
Required: No
Example: "Profile_Depth_cm"

`radial_distance_from_sensor`¶

Description
Name of the column with distance of the sample from the sensor (m).

Specification

Type: string
Required: No
Example: "Distance_to_CRNS_m"

`bulk_density_of_sample`¶

Description
Name of the column with bulk density of the samples.

Specification

Type: string
Required: No
Example: "DryBulkDensity_g_cm3"

`gravimetric_soil_moisture`¶

Description
Name of the column with gravimetric soil moisture values.

Specification

Type: string
Required: No
Example: "SoilMoisture_g_g"

`soil_organic_carbon`¶

Description
Name of the column with soil organic carbon values.

Specification

Type: string
Required: No
Example: "SoilOrganicCarbon_g_g"

`lattice_water`¶

Description
Name of the column with lattice water values.

Specification

Type: string
Required: No
Example: "LatticeWater_g_g"

Data Storage Options (`data_storage`)¶

`save_location`¶

Description
Directory for saving outputs (data, flags, figures, report).

Specification

Type: string
Required: No
Example: ~/neptoon-outputs/station_A101 or /data/outputs

Technical Details

Accepts absolute paths (/home/user/outputs) or home-relative paths (~/outputs); the ~ is expanded to the user's home directory.
Relative paths are rejected at config load time with a clear error. Output directories are ambiguous when relative (config-file-relative vs. cwd-relative), so neptoon refuses to guess.
Missing parent directories are created automatically.
If left blank, outputs are written to the current working directory.
Used by save_data() as the default output directory for both the ProcessWithConfig and DataHubFromConfig (step-by-step / notebook) workflows. Passing save_folder_location to save_data() explicitly still takes precedence.

`append_yaml_hash_to_folder_name`¶

Description
(WIP) Add configuration hash to folder names.

Specification

Type: boolean
Required: No
Example: False

Technical Details

Work In Progress - check back soon

`create_report`¶

Description
Whether to create the pdf report during the processing run. When selected the Magazine system is turned on and information and figures are prepared in a report and saved with the data.

Specification

Type: boolean
Required: Yes
Example: True

---¶

Figures Options (`figures`)¶

`create_figures`¶

Description
Generate visualization figures and automatically store them when saved.

Specification

Type: boolean
Required: Yes
Example: True

Technical Details

Figures are saved in the folder when saved

`make_all_figures`¶

Description
Generate all available figure types in figure registry.

Specification

Type: boolean
Required: Yes
Example: True

`custom_list`¶

Description
List of specific figures to generate if not doing all

Specification

Type: list
Required: No
Example: [nmdb_incoming_radiation]

Technical Details

If make_all_figures is true this is ignored.

Overview¶

File Structure¶

Configuration Quick Reference Guide¶

Sensor Information¶

Raw Data Parse Options¶

Time Series Data¶

Key Column Information (Time Series Data)¶

Quality Assessment Settings¶

Calibration¶

Key Column Names (Calibration)¶

Data Storage¶

Figures¶

Paths in config files¶

Detailed Configuration Reference¶

Sensor Information (sensor_info)¶

name¶

identifier¶

install_date¶

latitude and longitude¶

elevation¶

site_cutoff_rigidity¶

avg_lattice_water¶

avg_soil_organic_carbon¶

avg_dry_soil_bulk_density¶

N0¶

beta_coefficient¶

mean_pressure¶

time_zone¶

Raw Data Parse Options (raw_data_parse_options)¶

parse_raw_data¶

data_location¶

column_names¶

prefix¶

skip_lines¶

encoding¶

separator¶

decimal¶

skip_initial_space¶

parser_kw¶

starts_with¶

multi_header¶

strip_names¶

remove_prefix¶

Time Series Data (time_series_data)¶

path_to_data¶

Temporal Configuration (time_series_data.temporal)¶

input_resolution¶

output_resolution¶

align_timestamps¶

alignment_method¶

Key Column Configuration¶

epithermal_neutron_columns¶

thermal_neutron_columns¶

neutron_count_units¶

pressure_columns¶

pressure_units¶

pressure_merge_method¶

temperature_columns¶

temperature_units¶

temperature_merge_method¶

date_time_columns¶

date_time_format¶

initial_time_zone¶

convert_time_zone_to¶

Quality Assessment Configuration¶

Supported Variables¶

Assessment Methods¶

1. Range Check (flag_range)¶

Required Parameters¶

Example Configuration¶

2. Univariate Local Outlier Factor (spike_uni_lof)¶

Optional Parameters¶

Example Configuration¶

Complete Example¶

Best Practices¶

Notes¶

Calibration (calibration)¶

calibrate¶

data_format¶

location¶

Sensor Information (`sensor_info`)¶

`name`¶

`identifier`¶

`install_date`¶

`latitude` and `longitude`¶

`elevation`¶

`site_cutoff_rigidity`¶

`avg_lattice_water`¶

`avg_soil_organic_carbon`¶

`avg_dry_soil_bulk_density`¶

`N0`¶

`beta_coefficient`¶

`mean_pressure`¶

`time_zone`¶

Raw Data Parse Options (`raw_data_parse_options`)¶

`parse_raw_data`¶

`data_location`¶

`column_names`¶

`prefix`¶

`skip_lines`¶

`encoding`¶

`separator`¶

`decimal`¶

`skip_initial_space`¶

`parser_kw`¶

`starts_with`¶

`multi_header`¶

`strip_names`¶

`remove_prefix`¶

Time Series Data (`time_series_data`)¶

`path_to_data`¶

Temporal Configuration (`time_series_data.temporal`)¶

`input_resolution`¶

`output_resolution`¶

`align_timestamps`¶

`alignment_method`¶

`epithermal_neutron_columns`¶

`thermal_neutron_columns`¶

`neutron_count_units`¶

`pressure_columns`¶

`pressure_units`¶

`pressure_merge_method`¶

`temperature_columns`¶

`temperature_units`¶

`temperature_merge_method`¶

`date_time_columns`¶

`date_time_format`¶

`initial_time_zone`¶

`convert_time_zone_to`¶

1. Range Check (`flag_range`)¶

2. Univariate Local Outlier Factor (`spike_uni_lof`)¶

Calibration (`calibration`)¶

`calibrate`¶

`data_format`¶

`location`¶

`date_time_format`¶

Key Column Names (`calibration.key_column_names`)¶

`date_time`¶

`profile_id`¶

`sample_depth`¶

`radial_distance_from_sensor`¶

`bulk_density_of_sample`¶

`gravimetric_soil_moisture`¶

`soil_organic_carbon`¶

`lattice_water`¶

Data Storage Options (`data_storage`)¶

`save_location`¶

`append_yaml_hash_to_folder_name`¶

`create_report`¶

Figures Options (`figures`)¶

`create_figures`¶

`make_all_figures`¶

`custom_list`¶