Skip to content

neptoon.data_prep

timestamp_alignment

Classes:

Functions:

TimeStampAligner

TimeStampAligner(data_frame)

Uses routines from SaQC to align the time stamps of the data to a common set. When data is read in it is added to an SaQC object which is stored as an internal feature. Data can then be aligned and converted back to a pd.DataFrame.

Example

import pandas as pd from neptoon.data_ingest_and_formatting.timestamp_alignment import ( ... TimeStampAligner ... ) data = {'value': [1, 2, 3, 4]} index = pd.to_datetime( ... [ ... "2021-01-01 00:04:00", ... "2021-01-01 01:10:00", ... "2021-01-01 02:05:00", ... "2021-01-01 02:58:00", ... ] ... ) df = pd.DataFrame(data, index=index)

Initialize the TimeStampAligner

time_stamp_aligner = TimeStampAligner(df)

Align timestamps

time_stamp_aligner.align_timestamps(method='nshift', freq='1H')

Get the aligned dataframe

aligned_df = time_stamp_aligner.return_dataframe() print(aligned_df)

Parameters:

Name Type Description Default
data_frame DataFrame

DataFrame containing time series data.

required

align_timestamps

align_timestamps(method='time')

Aligns the time stamp of the SaQC feature. Will automatically do this for all data columns. For more information on the values for method and freq see:

https://rdm-software.pages.ufz.de/saqc/

Parameters:

Name Type Description Default
method str

Defaults to the nearest shift method to align time stamps. This means data is adjusted to the nearest time stamp without interpolation, by default "time".

'time'
freq str

The frequency of time stamps wanted, by default "1Hour"

required

return_dataframe

return_dataframe()

Returns a pd.DataFrame from the SaQC object. Run this after alignment to return the aligned dataframe

Returns:

Name Type Description
df DataFrame

DataFrame of time series data

TimeStampAggregator

TimeStampAggregator(data_frame, output_resolution, max_na_fraction)

Uses routines from SaQC to aggregate the data to a new sample rate. When data is read in it is added to an SaQC object which is stored as an internal feature. Data can then be aggregated and converted back to a pd.DataFrame.

Parameters:

Name Type Description Default
data_frame DataFrame

DataFrame containing time series data.

required

ensure_output_res_is_str

ensure_output_res_is_str(output_resolution)

Ensures that the output_resolution input is either a str representation (e.g., '1h') or a datetime.timedelta. If a datetime.timedelta it will convert it to string automatically.

Parameters:

Name Type Description Default
output_resolution str | timedelta

The desired output temporal resolution

required

Returns:

Type Description
str

Output resolution as str

Raises:

Type Description
ValueError

If neither str or datetime.timedelta supplied

convert_na_fraction_to_int

convert_na_fraction_to_int(max_na_fraction)

Returns the maximum number of na values allowed in the aggregation window. Converted from a percentage into an absolute value

Parameters:

Name Type Description Default
max_na_fraction float

Decimal fraction of max nan values in aggregation window

required

Returns:

Type Description
int

max nan vals

aggregate_data

aggregate_data(method='bagg')

Aggregates the data of the SaQC feature. Will automatically do this for all data columns. For more information on the values for method and freq see:

https://rdm-software.pages.ufz.de/saqc/

Parameters:

Name Type Description Default
method str

Defaults to the nearest shift method to align time stamps. This means data is adjusted to the nearest time stamp without interpolation, by default "bagg".

'bagg'
freq str

The frequency of time stamps wanted, by default "1Hour"

required

return_dataframe

return_dataframe()

Returns a pd.DataFrame from the SaQC object. Run this after alignment to return the aligned dataframe

Returns:

Name Type Description
df DataFrame

DataFrame of time series data