neptoon.external
nmdb_data_collection¶
Classes:
- NMDBDataAttacher
- DateTimeHandler
- NMDBConfig
- TermsDisplayManager
- CacheHandler
- DataFetcher
- DataManager
- NMDBDataHandler
Functions:
- fetch_nmdb_data
- new_column_name
- configure
- fetch_data
- attach_data
- return_data_frame
- convert_string_to_standard_date
- format_datetime_to_standard_string
- standardize_date_input
- start_date_wanted
- start_date_wanted
- end_date_wanted
- end_date_wanted
- cache_dir
- display_terms
- update_cache_file_path
- cache_file_path
- cache_file_path
- check_cache_file_exists
- read_cache
- write_cache
- delete_cache
- check_cache_range
- get_ymd_from_date
- create_nmdb_url
- fetch_data_http
- parse_http_data
- fetch_and_parse_http_data
- check_if_need_extra_data
- set_dates_for_nmdb_download
- combine_cache_and_new_data
- collect_nmdb_data
NMDBDataAttacher ¶
This is the core class that a user interacts with when wanting to attach data from the NMDB.eu database to a dataframe. It includes methods for configuring the NMDBConfig class which is then used by other classes for fetching and parsing data from the NMDB.eu API.
TODO - add validation steps to ensure dataframe is correct format
Initialisation parameters
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_frame
|
DataFrame
|
DataFrame which requires data to be attached. It must have a datetime index. |
required |
new_column_name
|
str
|
column name for the new column were neutron count data is appended, by default "incoming_neutron_intensity" |
str(INCOMING_NEUTRON_INTENSITY)
|
fetch_data ¶
Creates a NMDBDataHandler using the config and collects the data whilst storing it under self.tmp_data
attach_data ¶
Attaches the data stored in self.tmp_data to self.data_frame. This occurs inplace.
Raises:
| Type | Description |
|---|---|
ValueError
|
When index of the data is not Datetime an error occurs |
return_data_frame ¶
Returns the DataFrame attached in the object.
Returns:
| Type | Description |
|---|---|
DataFrame
|
The DataFrame |
DateTimeHandler ¶
Class that holds Date standardization methods.
This class provides static methods for converting and standardizing date formats to a common format (YYYY-mm-dd) used throughout the NMDB data collection process.
convert_string_to_standard_date
staticmethod
¶
Converts a string date to the standard format (YYYY-mm-dd).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date_str
|
str
|
The date string to convert. |
required |
Returns:
| Type | Description |
|---|---|
str or None
|
The standardized date string in format (YYYY-mm-dd), or None if the input date string is not a recognizable date format. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the input string cannot be parsed into a valid date. |
format_datetime_to_standard_string
staticmethod
¶
Converts a datetime or pd.Timestamp date to the standard format (YYYY-mm-dd)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date_datetime
|
datetime or Timestamp
|
The datetime object to convert. |
required |
Returns:
| Type | Description |
|---|---|
str or None The standardized date string in format (YYYY-mm-dd),
|
|
or None if the input
|
is not a datetime.datetime or pandas.Timestamp instance. |
standardize_date_input
staticmethod
¶
Takes a date as input, checks type, and converts it to the standard format (YYYY-mm-dd)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date_input
|
str or TimeStamp or datetime
|
Input date to be converted |
required |
Returns:
| Type | Description |
|---|---|
str
|
String of the date as YYYY-mm-dd |
Raises:
| Type | Description |
|---|---|
ValueError
|
Raise error when neither str or datetime is given |
NMDBConfig ¶
NMDBConfig(start_date_wanted, end_date_wanted, station='JUNG', reference_value=None, cache_dir=None, nmdb_table='revori', resolution='60', cache_exists=False, cache_start_date=None, cache_end_date=None, start_date_needed=None, end_date_needed=None, use_cache=True)
Configuration class for NMDB data retrieval and processing.
This class encapsulates configuration settings required for NMDB data retrieval, including date ranges for data collection, station identification, cache management, and data resolution settings. It ensures that all components of the NMDB data collection module use consistent and standardized configuration parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_date_wanted
|
str
|
Start date for data retrieval, formatted as 'YYYY-MM-DD'. |
required |
end_date_wanted
|
str
|
End date for data retrieval, formatted as 'YYYY-MM-DD'. |
required |
station
|
str
|
NMDB station code from which data is retrieved. Defaults to 'JUNG'. |
'JUNG'
|
cache_dir
|
str or None
|
Path to the cache directory for storing retrieved data. If None, uses the default OS cache directory. |
None
|
nmdb_table
|
str
|
Specific NMDB table to query data from. Defaults to 'revori'. |
'revori'
|
resolution
|
str
|
Resolution of the data in minutes. Defaults to '60'. |
'60'
|
cache_exists
|
bool
|
Indicates whether cached data exists for the given parameters. Defaults to False. |
False
|
cache_start_date
|
str or None
|
Start date of the available cached data, formatted as 'YYYY-MM-DD'. None if no cache exists. |
None
|
cache_end_date
|
str or None
|
End date of the available cached data, formatted as 'YYYY-MM-DD'. None if no cache exists. |
None
|
start_date_needed
|
str or None
|
Start date for which data needs to be fetched, considering cached data. None if all data is cached. |
None
|
end_date_needed
|
str or None
|
End date for which data needs to be fetched, considering cached data. None if all data is cached. |
None
|
use_cache
|
bool
|
whether to use cached data, ignore the cache entirely, defaults to True |
True
|
TermsDisplayManager ¶
Manages display of NMDB station terms of use
display_terms
classmethod
¶
Display terms of use for a station (once per session)
CacheHandler ¶
Class to handle cache management using downloaded NMDB data
The cache handler managed file paths, naming of files, storage and deletion of the cache. As default it will be stored in the usual operating system cache location.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
NMDBConfig
|
An instance of the NMDBConfig class containing configuration settings |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
config |
NMDBConfig
|
Stores the configuration settings for NMDB data retrieval. |
_cache_file_path |
Path or None
|
The file path to the cache file, dynamically determined based on the NMDBConfig settings. |
Methods:
| Name | Description |
|---|---|
update_cache_file_path |
Updates the cache file path based on the current NMDBConfig settings. |
check_cache_file_exists |
Checks for the existence of the cache file and updates the configuration accordingly. |
read_cache |
Reads the cached NMDB data from the file and returns it as a DataFrame. |
write_cache |
Writes a DataFrame to the cache file location. |
delete_cache |
Deletes the cache file associated with the current NMDBConfig settings. |
check_cache_range |
Determines the range of dates available in the cache and updates the NMDBConfig settings. |
Examples:
>>> config = NMDBConfig(start_date_wanted='2023-01-01',
>>> end_date_wanted='2023-01-31', station='JUNG')
>>> cache_handler = CacheHandler(config)
>>> cache_handler.update_cache_file_path()
>>> print(cache_handler.cache_file_path)
update_cache_file_path ¶
Update the cache file path based on the current configuration.
check_cache_file_exists ¶
Checks the existence of the cache file and sets the property in config
Returns:
| Type | Description |
|---|---|
None
|
|
read_cache ¶
Reads cache nmdb file and formats index
Returns:
| Name | Type | Description |
|---|---|---|
df |
DataFrame
|
DataFrame from the cache file |
write_cache ¶
Write NMDB data to the cache location using the cache_file_path attribute as a name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_df
|
DataFrame
|
|
required |
Returns:
| Type | Description |
|---|---|
None
|
|
delete_cache ¶
Delete the cache file related to the current instance. E.g. if downloading hourly data for JUNG it will delete the file associated with hourly JUNG from the cache.
Return
None
check_cache_range ¶
Function to find the range of data already available in the cache. It updates the config file depending on availability. It will either declare none existance of the cache, or update the start and end date of the cache.
Returns:
| Type | Description |
|---|---|
None
|
|
DataFetcher ¶
Class to handle sending external requests and fetching data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
NMDBConfig
|
An instance of the configuration file. |
required |
Methods:
| Name | Description |
|---|---|
get_ymd_from_date |
static method which parses the date into seperate values to represent year, month and day |
create_nmdb_url |
creates the url to request data from NMDB.eu, based on values in the configuration file |
fetch_data_http |
uses the created url to request data from NMDB.eu and returns the text from the response |
parse_http_date |
uses the returned text from fetch_data_http() and parses it into a standard format. The format is a pd.Dataframe with a datetime.datetime index |
get_ymd_from_date
staticmethod
¶
Parses a given date into year, month, and day.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date
|
datetime or str
|
The date to be parsed. |
required |
Returns:
| Type | Description |
|---|---|
tuple
|
Tuple containing the year, month, and day of the date. |
create_nmdb_url ¶
Creates the URL for obtaining the data using HTTP
Returns:
| Name | Type | Description |
|---|---|---|
url |
str
|
URL as a string |
fetch_data_http ¶
Fetches the data using http from NMDB.eu and processes it
Returns:
| Name | Type | Description |
|---|---|---|
Text |
str
|
Returns the text from the http site |
parse_http_data ¶
Parse the HTTP response data into a dataframe
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_data
|
str
|
The raw text file collected from NMDB.eu |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame with index DateTime and Counts per second |
Raises:
| Type | Description |
|---|---|
ValueError
|
Raised if the requested date is not available at the specified NMDB station, indicated by a specific error message in the raw data. |
RequestException
|
Raised if there's an issue parsing the HTTP response into a DataFrame, such as an incorrect format or network-related errors during the fetch. |
Examples:
Assuming an instance nmdb_data_handler of a class that
includes this method:
fetch_and_parse_http_data ¶
Fetches raw NMDB data via HTTP and parses it into a pandas DataFrame.
This method combines the functionalities of fetching NMDB data
from the designated HTTP source and subsequently parsing that
raw data into a structured DataFrame. It leverages
fetch_data_http to retrieve the data and parse_http_data to
transform it into a usable format.
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame containing the NMDB data, indexed by datetime with counts per second. |
Raises:
| Type | Description |
|---|---|
Refer to documentation for fetch_data_http() and
|
|
parse_http_data()
|
|
Examples:
Assuming an instance nmdb_data_handler of a class that
includes this method:
DataManager ¶
Manages the integration of cached and newly fetched NMDB data.
This class is responsible for determining the necessity of fetching new NMDB data based on the existing cache and the desired date range. It also handles the combination of cached data with newly fetched data to provide a complete dataset for analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
NMDBConfig
|
Configuration settings for NMDB data retrieval. |
required |
cache_handler
|
CacheHandler
|
Handles operations related to caching of NMDB data. |
required |
data_fetcher
|
DataFetcher
|
Responsible for fetching new data from NMDB. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
need_data_before_cache |
bool or None
|
Indicates if data before the cached range is needed. |
need_data_after_cache |
bool or None
|
Indicates if data after the cached range is needed. |
Methods:
| Name | Description |
|---|---|
check_if_need_extra_data |
Evaluates the need for fetching data outside the current cache range. |
set_dates_for_nmdb_download |
Updates the configuration with the date ranges that need to be fetched. |
combine_cache_and_new_data |
Merges newly fetched data with existing cached data, ensuring no duplication. |
Initializes the DataManager with the given configuration, cache handler, and data fetcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
NMDBConfig
|
Configuration settings for NMDB data retrieval. |
required |
cache_handler
|
CacheHandler
|
Handles operations related to caching of NMDB data. |
required |
data_fetcher
|
DataFetcher
|
Responsible for fetching new data from NMDB. |
required |
check_if_need_extra_data ¶
Updates configuration instance with boolean values stating whether a download of data is required before or after the desired dates.
Returns:
| Type | Description |
|---|---|
None
|
|
set_dates_for_nmdb_download ¶
Updates the configuration instance with the download range for NMDB data based upon the desired data and the available data in the cache.
Returns:
| Type | Description |
|---|---|
None
|
|
combine_cache_and_new_data ¶
Combines cached and newly downloaded NMDB data into a single DataFrame, ensuring data continuity and no duplication.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df_cache
|
DataFrame
|
The DataFrame containing cached data. |
required |
df_download
|
DataFrame
|
The DataFrame containing newly downloaded data. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
The combined DataFrame, sorted by datetime. |
NMDBDataHandler ¶
Handles the retrieval and management of NMDB data.
This class integrates the CacheHandler, DataFetcher, and
DataManager to manage NMDB data. It ensures that data is fetched
from the NMDB source only when necessary, preferring cached data to
minimize network requests. The class handles cases where new data
needs to be fetched either because it's not present in the cache or
only partial data is available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
NMDBConfig
|
Configuration settings for NMDB data retrieval, including desired date ranges, station information, and caching preferences. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
config |
NMDBConfig
|
Stores the provided NMDB configuration settings. |
cache_handler |
CacheHandler
|
Manages caching operations for NMDB data. |
data_fetcher |
DataFetcher
|
Responsible for fetching new NMDB data when required. |
data_manager |
DataManager
|
Determines the need for and manages the retrieval of new data based on cache status and configuration settings. |
Methods:
| Name | Description |
|---|---|
collect_nmdb_data |
Retrieves NMDB data, prioritizing cached data and fetching new data as needed. Returns a DataFrame containing the relevant NMDB data. |
Examples:
>>> config = NMDBConfig(start_date_wanted='2023-01-01',
>>> end_date_wanted='2023-01-31', station='JUNG')
>>> nmdb_handler = NMDBDataHandler(config)
>>> nmdb_data = nmdb_handler.collect_nmdb_data()
>>> print(nmdb_data.head())
Initializes the NMDBDataHandler with the given NMDBConfig instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
NMDBConfig
|
Configuration settings for NMDB data retrieval. |
required |
collect_nmdb_data ¶
Collects NMDB data based on the specified configuration, using cached data when available and fetching new data as necessary.
Returns:
| Type | Description |
|---|---|
DataFrame
|
A DataFrame containing NMDB data for the requested range. This may be a combination of cached and newly fetched data, or solely from one source, depending on availability. |
Examples:
Assuming config has been defined and passed to
NMDBDataHandler:
Note: This example assumes that nmdb_handler has been
instantiated with a valid NMDBConfig.
fetch_nmdb_data ¶
Returns a dataframe of data from nmdb.eu
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_date
|
str
|
Start date of desired data, format "YYYY-MM-DD" |
required |
end_date
|
str
|
End date of desired data, format "YYYY-MM-DD" |
required |
station
|
str
|
Desired station as string, as available from https://www.nmdb.eu. E.g., "JUNG" or "OULU" |
required |
resolution
|
int
|
The desired resolution in minutes |
required |
nmdb_table
|
str
|
The table to collect from nmdb.eu, by default "revori" |
'revori'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Datetime indexed dataframe |