neptoon.data_prep

timestamp_alignment¶

Classes:

TimeStampAligner
TimeStampAggregator

Functions:

return_frequency_str
align_timestamps
return_dataframe
ensure_output_res_is_str
convert_na_fraction_to_int
aggregate_data
return_dataframe

TimeStampAligner ¶

TimeStampAligner(data_frame)

Uses routines from SaQC to align the time stamps of the data to a common set. When data is read in it is added to an SaQC object which is stored as an internal feature. Data can then be aligned and converted back to a pd.DataFrame.

Example

import pandas as pd from neptoon.data_ingest_and_formatting.timestamp_alignment import ( ... TimeStampAligner ... ) data = {'value': [1, 2, 3, 4]} index = pd.to_datetime( ... [ ... "2021-01-01 00:04:00", ... "2021-01-01 01:10:00", ... "2021-01-01 02:05:00", ... "2021-01-01 02:58:00", ... ] ... ) df = pd.DataFrame(data, index=index)

Initialize the TimeStampAligner¶

time_stamp_aligner = TimeStampAligner(df)

Align timestamps¶

time_stamp_aligner.align_timestamps(method='nshift', freq='1H')

Get the aligned dataframe¶

aligned_df = time_stamp_aligner.return_dataframe() print(aligned_df)

Parameters:

Name	Type	Description	Default
`data_frame`	`DataFrame`	DataFrame containing time series data.	required

align_timestamps ¶

align_timestamps(method='time')

Aligns the time stamp of the SaQC feature. Will automatically do this for all data columns. For more information on the values for method and freq see:

https://rdm-software.pages.ufz.de/saqc/

Parameters:

Name	Type	Description	Default
`method`	`str`	Defaults to the nearest shift method to align time stamps. This means data is adjusted to the nearest time stamp without interpolation, by default "time".	`'time'`
`freq`	`str`	The frequency of time stamps wanted, by default "1Hour"	required

return_dataframe ¶

return_dataframe()

Returns a pd.DataFrame from the SaQC object. Run this after alignment to return the aligned dataframe

Returns:

Name	Type	Description
`df`	`DataFrame`	DataFrame of time series data

TimeStampAggregator ¶

TimeStampAggregator(data_frame, output_resolution, max_na_fraction)

Uses routines from SaQC to aggregate the data to a new sample rate. When data is read in it is added to an SaQC object which is stored as an internal feature. Data can then be aggregated and converted back to a pd.DataFrame.

Parameters:

Name	Type	Description	Default
`data_frame`	`DataFrame`	DataFrame containing time series data.	required

ensure_output_res_is_str ¶

ensure_output_res_is_str(output_resolution)

Ensures that the output_resolution input is either a str representation (e.g., '1h') or a datetime.timedelta. If a datetime.timedelta it will convert it to string automatically.

Parameters:

Name	Type	Description	Default
`output_resolution`	`str \| timedelta`	The desired output temporal resolution	required

Returns:

Type	Description
`str`	Output resolution as str

Raises:

Type	Description
`ValueError`	If neither str or datetime.timedelta supplied

convert_na_fraction_to_int ¶

convert_na_fraction_to_int(max_na_fraction)

Returns the maximum number of na values allowed in the aggregation window. Converted from a percentage into an absolute value

Parameters:

Name	Type	Description	Default
`max_na_fraction`	`float`	Decimal fraction of max nan values in aggregation window	required

Returns:

Type	Description
`int`	max nan vals

aggregate_data ¶

aggregate_data(method='bagg')

Aggregates the data of the SaQC feature. Will automatically do this for all data columns. For more information on the values for method and freq see:

https://rdm-software.pages.ufz.de/saqc/

Parameters:

Name	Type	Description	Default
`method`	`str`	Defaults to the nearest shift method to align time stamps. This means data is adjusted to the nearest time stamp without interpolation, by default "bagg".	`'bagg'`
`freq`	`str`	The frequency of time stamps wanted, by default "1Hour"	required

return_dataframe ¶

return_dataframe()

Returns a pd.DataFrame from the SaQC object. Run this after alignment to return the aligned dataframe

Returns:

Name	Type	Description
`df`	`DataFrame`	DataFrame of time series data