neptoon.data_prep
timestamp_alignment¶
Classes:
Functions:
- return_frequency_str
- align_timestamps
- return_dataframe
- ensure_output_res_is_str
- convert_na_fraction_to_int
- aggregate_data
- return_dataframe
TimeStampAligner ¶
Uses routines from SaQC to align the time stamps of the data to a common set. When data is read in it is added to an SaQC object which is stored as an internal feature. Data can then be aligned and converted back to a pd.DataFrame.
Example
import pandas as pd from neptoon.data_ingest_and_formatting.timestamp_alignment import ( ... TimeStampAligner ... ) data = {'value': [1, 2, 3, 4]} index = pd.to_datetime( ... [ ... "2021-01-01 00:04:00", ... "2021-01-01 01:10:00", ... "2021-01-01 02:05:00", ... "2021-01-01 02:58:00", ... ] ... ) df = pd.DataFrame(data, index=index)
Initialize the TimeStampAligner¶
time_stamp_aligner = TimeStampAligner(df)
Align timestamps¶
time_stamp_aligner.align_timestamps(method='nshift', freq='1H')
Get the aligned dataframe¶
aligned_df = time_stamp_aligner.return_dataframe() print(aligned_df)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_frame
|
DataFrame
|
DataFrame containing time series data. |
required |
align_timestamps ¶
Aligns the time stamp of the SaQC feature. Will automatically do this for all data columns. For more information on the values for method and freq see:
https://rdm-software.pages.ufz.de/saqc/
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
Defaults to the nearest shift method to align time stamps. This means data is adjusted to the nearest time stamp without interpolation, by default "time". |
'time'
|
freq
|
str
|
The frequency of time stamps wanted, by default "1Hour" |
required |
return_dataframe ¶
Returns a pd.DataFrame from the SaQC object. Run this after alignment to return the aligned dataframe
Returns:
| Name | Type | Description |
|---|---|---|
df |
DataFrame
|
DataFrame of time series data |
TimeStampAggregator ¶
Uses routines from SaQC to aggregate the data to a new sample rate. When data is read in it is added to an SaQC object which is stored as an internal feature. Data can then be aggregated and converted back to a pd.DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_frame
|
DataFrame
|
DataFrame containing time series data. |
required |
ensure_output_res_is_str ¶
Ensures that the output_resolution input is either a str representation (e.g., '1h') or a datetime.timedelta. If a datetime.timedelta it will convert it to string automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_resolution
|
str | timedelta
|
The desired output temporal resolution |
required |
Returns:
| Type | Description |
|---|---|
str
|
Output resolution as str |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither str or datetime.timedelta supplied |
convert_na_fraction_to_int ¶
Returns the maximum number of na values allowed in the aggregation window. Converted from a percentage into an absolute value
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_na_fraction
|
float
|
Decimal fraction of max nan values in aggregation window |
required |
Returns:
| Type | Description |
|---|---|
int
|
max nan vals |
aggregate_data ¶
Aggregates the data of the SaQC feature. Will automatically do this for all data columns. For more information on the values for method and freq see:
https://rdm-software.pages.ufz.de/saqc/
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
Defaults to the nearest shift method to align time stamps. This means data is adjusted to the nearest time stamp without interpolation, by default "bagg". |
'bagg'
|
freq
|
str
|
The frequency of time stamps wanted, by default "1Hour" |
required |
return_dataframe ¶
Returns a pd.DataFrame from the SaQC object. Run this after alignment to return the aligned dataframe
Returns:
| Name | Type | Description |
|---|---|---|
df |
DataFrame
|
DataFrame of time series data |