Skip to content

neptoon.external

nmdb_data_collection

Classes:

Functions:

NMDBDataAttacher

NMDBDataAttacher(data_frame, new_column_name=str(INCOMING_NEUTRON_INTENSITY))

This is the core class that a user interacts with when wanting to attach data from the NMDB.eu database to a dataframe. It includes methods for configuring the NMDBConfig class which is then used by other classes for fetching and parsing data from the NMDB.eu API.

TODO - add validation steps to ensure dataframe is correct format

Initialisation parameters

Parameters:

Name Type Description Default
data_frame DataFrame

DataFrame which requires data to be attached. It must have a datetime index.

required
new_column_name str

column name for the new column were neutron count data is appended, by default "incoming_neutron_intensity"

str(INCOMING_NEUTRON_INTENSITY)

fetch_data

fetch_data()

Creates a NMDBDataHandler using the config and collects the data whilst storing it under self.tmp_data

attach_data

attach_data()

Attaches the data stored in self.tmp_data to self.data_frame. This occurs inplace.

Raises:

Type Description
ValueError

When index of the data is not Datetime an error occurs

return_data_frame

return_data_frame()

Returns the DataFrame attached in the object.

Returns:

Type Description
DataFrame

The DataFrame

DateTimeHandler

Class that holds Date standardization methods.

This class provides static methods for converting and standardizing date formats to a common format (YYYY-mm-dd) used throughout the NMDB data collection process.

convert_string_to_standard_date staticmethod

convert_string_to_standard_date(date_str)

Converts a string date to the standard format (YYYY-mm-dd).

Parameters:

Name Type Description Default
date_str str

The date string to convert.

required

Returns:

Type Description
str or None

The standardized date string in format (YYYY-mm-dd), or None if the input date string is not a recognizable date format.

Raises:

Type Description
ValueError

If the input string cannot be parsed into a valid date.

format_datetime_to_standard_string staticmethod

format_datetime_to_standard_string(date_datetime)

Converts a datetime or pd.Timestamp date to the standard format (YYYY-mm-dd)

Parameters:

Name Type Description Default
date_datetime datetime or Timestamp

The datetime object to convert.

required

Returns:

Type Description
str or None The standardized date string in format (YYYY-mm-dd),
or None if the input

is not a datetime.datetime or pandas.Timestamp instance.

standardize_date_input staticmethod

standardize_date_input(date_input)

Takes a date as input, checks type, and converts it to the standard format (YYYY-mm-dd)

Parameters:

Name Type Description Default
date_input str or TimeStamp or datetime

Input date to be converted

required

Returns:

Type Description
str

String of the date as YYYY-mm-dd

Raises:

Type Description
ValueError

Raise error when neither str or datetime is given

NMDBConfig

NMDBConfig(start_date_wanted, end_date_wanted, station='JUNG', reference_value=None, cache_dir=None, nmdb_table='revori', resolution='60', cache_exists=False, cache_start_date=None, cache_end_date=None, start_date_needed=None, end_date_needed=None, use_cache=True)

Configuration class for NMDB data retrieval and processing.

This class encapsulates configuration settings required for NMDB data retrieval, including date ranges for data collection, station identification, cache management, and data resolution settings. It ensures that all components of the NMDB data collection module use consistent and standardized configuration parameters.

Parameters:

Name Type Description Default
start_date_wanted str

Start date for data retrieval, formatted as 'YYYY-MM-DD'.

required
end_date_wanted str

End date for data retrieval, formatted as 'YYYY-MM-DD'.

required
station str

NMDB station code from which data is retrieved. Defaults to 'JUNG'.

'JUNG'
cache_dir str or None

Path to the cache directory for storing retrieved data. If None, uses the default OS cache directory.

None
nmdb_table str

Specific NMDB table to query data from. Defaults to 'revori'.

'revori'
resolution str

Resolution of the data in minutes. Defaults to '60'.

'60'
cache_exists bool

Indicates whether cached data exists for the given parameters. Defaults to False.

False
cache_start_date str or None

Start date of the available cached data, formatted as 'YYYY-MM-DD'. None if no cache exists.

None
cache_end_date str or None

End date of the available cached data, formatted as 'YYYY-MM-DD'. None if no cache exists.

None
start_date_needed str or None

Start date for which data needs to be fetched, considering cached data. None if all data is cached.

None
end_date_needed str or None

End date for which data needs to be fetched, considering cached data. None if all data is cached.

None
use_cache bool

whether to use cached data, ignore the cache entirely, defaults to True

True

TermsDisplayManager

Manages display of NMDB station terms of use

display_terms classmethod

display_terms(station)

Display terms of use for a station (once per session)

CacheHandler

CacheHandler(config)

Class to handle cache management using downloaded NMDB data

The cache handler managed file paths, naming of files, storage and deletion of the cache. As default it will be stored in the usual operating system cache location.

Parameters:

Name Type Description Default
config NMDBConfig

An instance of the NMDBConfig class containing configuration settings

required

Attributes:

Name Type Description
config NMDBConfig

Stores the configuration settings for NMDB data retrieval.

_cache_file_path Path or None

The file path to the cache file, dynamically determined based on the NMDBConfig settings.

Methods:

Name Description
update_cache_file_path

Updates the cache file path based on the current NMDBConfig settings.

check_cache_file_exists

Checks for the existence of the cache file and updates the configuration accordingly.

read_cache

Reads the cached NMDB data from the file and returns it as a DataFrame.

write_cache

Writes a DataFrame to the cache file location.

delete_cache

Deletes the cache file associated with the current NMDBConfig settings.

check_cache_range

Determines the range of dates available in the cache and updates the NMDBConfig settings.

Examples:

>>> config = NMDBConfig(start_date_wanted='2023-01-01',
>>>             end_date_wanted='2023-01-31', station='JUNG')
>>> cache_handler = CacheHandler(config)
>>> cache_handler.update_cache_file_path()
>>> print(cache_handler.cache_file_path)

update_cache_file_path

update_cache_file_path()

Update the cache file path based on the current configuration.

check_cache_file_exists

check_cache_file_exists()

Checks the existence of the cache file and sets the property in config

Returns:

Type Description
None

read_cache

read_cache()

Reads cache nmdb file and formats index

Returns:

Name Type Description
df DataFrame

DataFrame from the cache file

write_cache

write_cache(cache_df)

Write NMDB data to the cache location using the cache_file_path attribute as a name.

Parameters:

Name Type Description Default
cache_df DataFrame
required

Returns:

Type Description
None

delete_cache

delete_cache()

Delete the cache file related to the current instance. E.g. if downloading hourly data for JUNG it will delete the file associated with hourly JUNG from the cache.

Return

None

check_cache_range

check_cache_range()

Function to find the range of data already available in the cache. It updates the config file depending on availability. It will either declare none existance of the cache, or update the start and end date of the cache.

Returns:

Type Description
None

DataFetcher

DataFetcher(config)

Class to handle sending external requests and fetching data.

Parameters:

Name Type Description Default
config NMDBConfig

An instance of the configuration file.

required

Methods:

Name Description
get_ymd_from_date

static method which parses the date into seperate values to represent year, month and day

create_nmdb_url

creates the url to request data from NMDB.eu, based on values in the configuration file

fetch_data_http

uses the created url to request data from NMDB.eu and returns the text from the response

parse_http_date

uses the returned text from fetch_data_http() and parses it into a standard format. The format is a pd.Dataframe with a datetime.datetime index

get_ymd_from_date staticmethod

get_ymd_from_date(date)

Parses a given date into year, month, and day.

Parameters:

Name Type Description Default
date datetime or str

The date to be parsed.

required

Returns:

Type Description
tuple

Tuple containing the year, month, and day of the date.

create_nmdb_url

create_nmdb_url()

Creates the URL for obtaining the data using HTTP

Returns:

Name Type Description
url str

URL as a string

fetch_data_http

fetch_data_http()

Fetches the data using http from NMDB.eu and processes it

Returns:

Name Type Description
Text str

Returns the text from the http site

parse_http_data

parse_http_data(raw_data)

Parse the HTTP response data into a dataframe

Parameters:

Name Type Description Default
raw_data str

The raw text file collected from NMDB.eu

required

Returns:

Type Description
DataFrame

A DataFrame with index DateTime and Counts per second

Raises:

Type Description
ValueError

Raised if the requested date is not available at the specified NMDB station, indicated by a specific error message in the raw data.

RequestException

Raised if there's an issue parsing the HTTP response into a DataFrame, such as an incorrect format or network-related errors during the fetch.

Examples:

Assuming an instance nmdb_data_handler of a class that includes this method:

>>> df = nmdb_data_handler.fetch_and_parse_http_data()
>>> print(df.head())

fetch_and_parse_http_data

fetch_and_parse_http_data()

Fetches raw NMDB data via HTTP and parses it into a pandas DataFrame.

This method combines the functionalities of fetching NMDB data from the designated HTTP source and subsequently parsing that raw data into a structured DataFrame. It leverages fetch_data_http to retrieve the data and parse_http_data to transform it into a usable format.

Returns:

Type Description
DataFrame

A DataFrame containing the NMDB data, indexed by datetime with counts per second.

Raises:

Type Description
Refer to documentation for fetch_data_http() and
parse_http_data()

Examples:

Assuming an instance nmdb_data_handler of a class that includes this method:

>>> df = nmdb_data_handler.fetch_and_parse_http_data()
>>> print(df.head())

DataManager

DataManager(config, cache_handler, data_fetcher)

Manages the integration of cached and newly fetched NMDB data.

This class is responsible for determining the necessity of fetching new NMDB data based on the existing cache and the desired date range. It also handles the combination of cached data with newly fetched data to provide a complete dataset for analysis.

Parameters:

Name Type Description Default
config NMDBConfig

Configuration settings for NMDB data retrieval.

required
cache_handler CacheHandler

Handles operations related to caching of NMDB data.

required
data_fetcher DataFetcher

Responsible for fetching new data from NMDB.

required

Attributes:

Name Type Description
need_data_before_cache bool or None

Indicates if data before the cached range is needed.

need_data_after_cache bool or None

Indicates if data after the cached range is needed.

Methods:

Name Description
check_if_need_extra_data

Evaluates the need for fetching data outside the current cache range.

set_dates_for_nmdb_download

Updates the configuration with the date ranges that need to be fetched.

combine_cache_and_new_data

Merges newly fetched data with existing cached data, ensuring no duplication.

Initializes the DataManager with the given configuration, cache handler, and data fetcher.

Parameters:

Name Type Description Default
config NMDBConfig

Configuration settings for NMDB data retrieval.

required
cache_handler CacheHandler

Handles operations related to caching of NMDB data.

required
data_fetcher DataFetcher

Responsible for fetching new data from NMDB.

required

check_if_need_extra_data

check_if_need_extra_data()

Updates configuration instance with boolean values stating whether a download of data is required before or after the desired dates.

Returns:

Type Description
None

set_dates_for_nmdb_download

set_dates_for_nmdb_download()

Updates the configuration instance with the download range for NMDB data based upon the desired data and the available data in the cache.

Returns:

Type Description
None

combine_cache_and_new_data

combine_cache_and_new_data(df_cache, df_download)

Combines cached and newly downloaded NMDB data into a single DataFrame, ensuring data continuity and no duplication.

Parameters:

Name Type Description Default
df_cache DataFrame

The DataFrame containing cached data.

required
df_download DataFrame

The DataFrame containing newly downloaded data.

required

Returns:

Type Description
DataFrame

The combined DataFrame, sorted by datetime.

NMDBDataHandler

NMDBDataHandler(config)

Handles the retrieval and management of NMDB data.

This class integrates the CacheHandler, DataFetcher, and DataManager to manage NMDB data. It ensures that data is fetched from the NMDB source only when necessary, preferring cached data to minimize network requests. The class handles cases where new data needs to be fetched either because it's not present in the cache or only partial data is available.

Parameters:

Name Type Description Default
config NMDBConfig

Configuration settings for NMDB data retrieval, including desired date ranges, station information, and caching preferences.

required

Attributes:

Name Type Description
config NMDBConfig

Stores the provided NMDB configuration settings.

cache_handler CacheHandler

Manages caching operations for NMDB data.

data_fetcher DataFetcher

Responsible for fetching new NMDB data when required.

data_manager DataManager

Determines the need for and manages the retrieval of new data based on cache status and configuration settings.

Methods:

Name Description
collect_nmdb_data

Retrieves NMDB data, prioritizing cached data and fetching new data as needed. Returns a DataFrame containing the relevant NMDB data.

Examples:

>>> config = NMDBConfig(start_date_wanted='2023-01-01',
>>>             end_date_wanted='2023-01-31', station='JUNG')
>>> nmdb_handler = NMDBDataHandler(config)
>>> nmdb_data = nmdb_handler.collect_nmdb_data()
>>> print(nmdb_data.head())

Initializes the NMDBDataHandler with the given NMDBConfig instance.

Parameters:

Name Type Description Default
config NMDBConfig

Configuration settings for NMDB data retrieval.

required

collect_nmdb_data

collect_nmdb_data()

Collects NMDB data based on the specified configuration, using cached data when available and fetching new data as necessary.

Returns:

Type Description
DataFrame

A DataFrame containing NMDB data for the requested range. This may be a combination of cached and newly fetched data, or solely from one source, depending on availability.

Examples:

Assuming config has been defined and passed to NMDBDataHandler:

>>> nmdb_data = nmdb_handler.collect_nmdb_data()
>>> print(nmdb_data.head())

Note: This example assumes that nmdb_handler has been instantiated with a valid NMDBConfig.

fetch_nmdb_data

fetch_nmdb_data(start_date, end_date, station, resolution, nmdb_table='revori')

Returns a dataframe of data from nmdb.eu

https://www.nmdb.eu

Parameters:

Name Type Description Default
start_date str

Start date of desired data, format "YYYY-MM-DD"

required
end_date str

End date of desired data, format "YYYY-MM-DD"

required
station str

Desired station as string, as available from https://www.nmdb.eu. E.g., "JUNG" or "OULU"

required
resolution int

The desired resolution in minutes

required
nmdb_table str

The table to collect from nmdb.eu, by default "revori"

'revori'

Returns:

Type Description
DataFrame

Datetime indexed dataframe