Skip to content

Documentation for aalibrary

Modules:

Name Description
config

Used for storing environment-specific settings such as database URIs and

ices_ship_names

This file contains the code to parse through the ICES API found here:

ingestion

This file contains functions used to ingest Active Acoustics data into GCP

conversion

This file is used to store conversion functions for the AALibrary.

metadata

This file contains functions that have to do with metadata.

queries

This script contains classes that have SQL queries used for interaction

config

Used for storing environment-specific settings such as database URIs and such.

ices_ship_names

This file contains the code to parse through the ICES API found here: https://vocab.ices.dk/?ref=315 Specifically the SHIPC platform code which refers to ship names.

Functions:

Name Description
get_all_ices_ship_codes_and_names

Gets all of the ices ship codes and their corresponding names in a

get_all_ices_ship_names

Gets all of the ICES ship names. You can normalize them to our standards

get_all_ship_info

Gets all of the ship's info from the following URL:

get_ices_code_from_ship_name

Gets the ICES Code for a ship given a ship's name.

get_all_ices_ship_codes_and_names(normalize_ship_names=False)

Gets all of the ices ship codes and their corresponding names in a dictionary format. The keys are the ICES code, and the name is the value.

Parameters:

Name Type Description Default
normalize_ship_names bool

Whether or not to format the ship name according to our own standards. Defaults to False.

False

Returns:

Name Type Description
dict dict

A dict with all of the ICES ships. The keys are the ICES code, and the name is the value.

Source code in src\aalibrary\ices_ship_names.py
def get_all_ices_ship_codes_and_names(
    normalize_ship_names: bool = False,
) -> dict:
    """Gets all of the ices ship codes and their corresponding names in a
    dictionary format. The keys are the ICES code, and the name is the value.

    Args:
        normalize_ship_names (bool, optional): Whether or not to format the
            ship name according to our own standards. Defaults to False.

    Returns:
        dict: A dict with all of the ICES ships. The keys are the ICES code,
            and the name is the value.
    """

    all_ship_info = get_all_ship_info()
    all_ship_codes_and_names = {}
    for ship_info in all_ship_info:
        all_ship_codes_and_names[ship_info["key"]] = ship_info["description"]

    if normalize_ship_names:
        all_ship_codes_and_names = {
            code: normalize_ship_name(name)
            for code, name in all_ship_codes_and_names.items()
        }

    return all_ship_codes_and_names

get_all_ices_ship_names(normalize_ship_names=False)

Gets all of the ICES ship names. You can normalize them to our standards if you wish.

Parameters:

Name Type Description Default
normalize_ship_names bool

Whether or not to format the ship name according to our own standards. Defaults to False.

False

Returns:

Name Type Description
List List

A list containing strings of all of the ship names.

Source code in src\aalibrary\ices_ship_names.py
def get_all_ices_ship_names(normalize_ship_names: bool = False) -> List:
    """Gets all of the ICES ship names. You can normalize them to our standards
    if you wish.

    Args:
        normalize_ship_names (bool, optional): Whether or not to format the
            ship name according to our own standards. Defaults to False.

    Returns:
        List: A list containing strings of all of the ship names.
    """

    all_ship_info = get_all_ship_info()
    all_ship_names = []
    for ship_info in all_ship_info:
        # Here `ship_info` is a dict
        all_ship_names.append(ship_info["description"])
    if normalize_ship_names:
        all_ship_names = [
            normalize_ship_name(ship_name=ship_name)
            for ship_name in all_ship_names
        ]

    return all_ship_names

get_all_ship_info()

Gets all of the ship's info from the following URL: https:/vocab.ices.dk/services/api/Code/7f9a91e1-fb57-464a-8eb0-697e4b0235b5

Returns:

Name Type Description
List List

A list with dicts of all the ships, including name, ices code, uuids and other fields.

Source code in src\aalibrary\ices_ship_names.py
def get_all_ship_info() -> List:
    """Gets all of the ship's info from the following URL:
    https:/vocab.ices.dk/services/api/Code/7f9a91e1-fb57-464a-8eb0-697e4b0235b5


    Returns:
        List: A list with dicts of all the ships, including name, ices code,
            uuids and other fields.
    """

    response = requests.get(
        url=(
            "https://vocab.ices.dk/services/api/Code/"
            "7f9a91e1-fb57-464a-8eb0-697e4b0235b5"
        ),
        timeout=10
    )
    all_ship_info = response.json()

    return all_ship_info

get_ices_code_from_ship_name(ship_name='', is_normalized=False)

Gets the ICES Code for a ship given a ship's name.

Parameters:

Name Type Description Default
ship_name str

The ship name string. Defaults to "".

''
is_normalized bool

Whether or not the ship name is already normalized according to aalibrary standards. Defaults to False.

False

Returns:

Name Type Description
str str

The ICES Code if one has been found. Empty string if it has not.

Source code in src\aalibrary\ices_ship_names.py
def get_ices_code_from_ship_name(
    ship_name: str = "", is_normalized: bool = False
) -> str:
    """Gets the ICES Code for a ship given a ship's name.

    Args:
        ship_name (str, optional): The ship name string. Defaults to "".
        is_normalized (bool, optional): Whether or not the ship name is already
            normalized according to aalibrary standards. Defaults to False.

    Returns:
        str: The ICES Code if one has been found. Empty string if it has not.
    """

    # Get all of the ship codes and names.
    all_codes_and_names = get_all_ices_ship_codes_and_names(
        normalize_ship_names=is_normalized
    )
    # Reverse it to make the ship names the keys.
    all_codes_and_names = {v: k for k, v in all_codes_and_names.items()}
    valid_ices_ship_names = list(all_codes_and_names.keys())
    # Try to find the correct ICES code based on the ship name.
    try:
        return all_codes_and_names[ship_name]
    except KeyError:
        # Here the ship name does not exactly match any in the ICES DB.
        # Check for spell check using custom list
        spell_check_list = get_close_matches(
            ship_name, valid_ices_ship_names, n=3, cutoff=0.6
        )
        if len(spell_check_list) > 0:
            print(
                f"This `ship_name` {ship_name} does not"
                " exist in the ICES database. Did you mean one of the"
                f" following?\n{spell_check_list}"
            )
        else:
            print(
                f"This `ship_name` {ship_name} does not"
                " exist in the ICES database. A close match could not be "
                "found."
            )
        return ""

ingestion

This file contains functions used to ingest Active Acoustics data into GCP from various sources such as AWS buckets and Azure Data Lake.

Functions:

Name Description
download_file_from_azure_directory

Downloads a single file from an azure directory using the

download_netcdf_file

ENTRYPOINT FOR END-USERS

download_raw_file

ENTRYPOINT FOR END-USERS

download_raw_file_from_azure

ENTRYPOINT FOR END-USERS

download_raw_file_from_ncei

ENTRYPOINT FOR END-USERS

download_specific_file_from_azure

Creates a DataLakeFileClient and downloads a specific file from

download_survey_from_ncei

Downloads an entire survey from NCEI to a local directory while

find_and_upload_survey_metadata_from_s3

Finds the metadata that is associated with a particular survey in s3,

find_data_source_for_file

Finds the data source of a given filename by checking all possible data

download_file_from_azure_directory(directory_client, file_system='testcontainer', download_directory='./', file_path='')

Downloads a single file from an azure directory using the DataLakeDirectoryClient. Useful for numerous operations, as authentication is only required once for the creation of each DataLakeDirectoryClient.

Parameters:

Name Type Description Default
directory_client DataLakeDirectoryClient

The DataLakeDirectoryClient that will be used to connect to a download from an azure file system in the data lake.

required
file_system str

The file system (container) you wish to download your file from. Defaults to "testcontainer" for testing purposes.

'testcontainer'
download_directory str

The local directory you want to download to. Defaults to "./".

'./'
file_path str

The file path you want to download.

''
Source code in src\aalibrary\ingestion.py
def download_file_from_azure_directory(
    directory_client: DataLakeDirectoryClient,
    file_system: str = "testcontainer",
    download_directory: str = "./",
    file_path: str = "",
):
    """Downloads a single file from an azure directory using the
    DataLakeDirectoryClient. Useful for numerous operations, as authentication
    is only required once for the creation of each DataLakeDirectoryClient.

    Args:
        directory_client (DataLakeDirectoryClient): The
            DataLakeDirectoryClient that will be used to connect to a
            download from an azure file system in the data lake.
        file_system (str): The file system (container) you wish to download
            your file from. Defaults to "testcontainer" for testing purposes.
        download_directory (str): The local directory you want to download to.
            Defaults to "./".
        file_path (str): The file path you want to download.
    """

    # User-error-checking
    check_for_assertion_errors(
        data_lake_directory_client=directory_client,
        file_download_directory=download_directory,
    )

    file_client = directory_client.get_file_client(
        file_path=file_path, file_system=file_system
    )

    download_directory = os.path.normpath(download_directory)
    file_name = os.path.normpath(file_path).split(os.path.sep)[-1]

    with open(
        file=os.sep.join([download_directory, file_name]), mode="wb"
    ) as local_file:
        download = file_client.download_file()
        local_file.write(download.readall())
        local_file.close()

download_netcdf_file(raw_file_name='', file_type='netcdf', ship_name='', survey_name='', echosounder='', data_source='', file_download_directory='', gcp_bucket=None, debug=False)

ENTRYPOINT FOR END-USERS Downloads a netcdf file from GCP storage bucket for use on your workstation. Works as follows: 1. Checks if the exact netcdf exists in gcp. a. If it doesn't exists, prompts user to download it first. b. If it exists, downloads to the file_download_directory.

Parameters:

Name Type Description Default
raw_file_name str

The raw file name (includes extension). Defaults to "".

''
file_type str

The file type (do not include the dot "."). Defaults to "netcdf".

'netcdf'
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
data_source str

The source of the file. Necessary due to the way the storage bucket is organized. Can be one of ["NCEI", "OMAO", "HDD"]. Defaults to "".

''
file_download_directory str

The local directory you want to store your file in. Defaults to "".

''
gcp_bucket bucket

The GCP bucket object used to download the file. Defaults to None.

None
debug bool

Whether or not to print debug statements. Defaults to False.

False
Source code in src\aalibrary\ingestion.py
def download_netcdf_file(
    raw_file_name: str = "",
    file_type: str = "netcdf",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "",
    file_download_directory: str = "",
    gcp_bucket: storage.Client.bucket = None,
    debug: bool = False,
):
    """ENTRYPOINT FOR END-USERS
    Downloads a netcdf file from GCP storage bucket for use on your
    workstation.
    Works as follows:
        1. Checks if the exact netcdf exists in gcp.
            a. If it doesn't exists, prompts user to download it first.
            b. If it exists, downloads to the `file_download_directory`.

    Args:
        raw_file_name (str, optional): The raw file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (do not include the dot ".").
            Defaults to "netcdf".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the file. Necessary due to
            the way the storage bucket is organized. Can be one of
            ["NCEI", "OMAO", "HDD"]. Defaults to "".
        file_download_directory (str, optional): The local directory you want
            to store your file in. Defaults to "".
        gcp_bucket (storage.Client.bucket, optional): The GCP bucket object
            used to download the file. Defaults to None.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.
    """

    _, s3_resource, _ = utils.cloud_utils.create_s3_objs()

    rf = RawFile(
        file_name=raw_file_name,
        file_type=file_type,
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        data_source=data_source,
        file_download_directory=file_download_directory,
        gcp_bucket=gcp_bucket,
        debug=debug,
        s3_resource=s3_resource,
    )

    if rf.netcdf_file_exists_in_gcp:
        print(
            (
                f"NETCDF FILE LOCATED IN GCP"
                f": `{rf.netcdf_gcp_storage_bucket_location}`\nDOWNLOADING..."
            )
        )
        utils.cloud_utils.download_file_from_gcp(
            gcp_bucket=gcp_bucket,
            blob_file_path=rf.netcdf_gcp_storage_bucket_location,
            local_file_path=rf.netcdf_file_download_path,
            debug=debug,
        )
        print(
            f"FILE `{raw_file_name}` DOWNLOADED "
            f"TO `{rf.netcdf_file_download_path}`"
        )
        return
    else:
        logging.error(
            "NETCDF FILE `%s` DOES NOT EXIST IN GCP AT THE LOCATION: `%s`.",
            raw_file_name,
            rf.netcdf_gcp_storage_bucket_location,
        )
        logging.error(
            "PLEASE CONVERT AND UPLOAD THE RAW FILE FIRST VIA"
            " `download_raw_file`."
        )
        raise FileNotFoundError

download_raw_file(file_name='', file_type='raw', ship_name='', survey_name='', echosounder='', data_source='', file_download_directory='.', gcp_bucket=None, debug=False)

ENTRYPOINT FOR END-USERS Downloads a raw and idx file from NCEI for use on your workstation. Works as follows: 1. Checks if raw file exists in GCP. a. If it exists, checks if a netcdf version also exists or not lets the user know. i. If force_download_from_ncei is True downloads the raw and idx file from NCEI instead. b. If it doesn't exist, downloads .raw from NCEI and uploads to GCP for caching downloads .idx from NCEI and uploads to GCP for caching

Parameters:

Name Type Description Default
file_name str

The file name (includes extension). Defaults to "".

''
file_type str

The file type (do not include the dot "."). Defaults to "".

'raw'
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
data_source str

The source of the file. Necessary due to the way the storage bucket is organized. Can be one of ["NCEI", "OMAO", "HDD"]. Defaults to "".

''
file_download_directory str

The local file directory you want to store your file in. Defaults to current directory. Defaults to ".".

'.'
gcp_bucket bucket

The GCP bucket object used to download the file. Defaults to None.

None
debug bool

Whether or not to print debug statements. Defaults to False.

False
Source code in src\aalibrary\ingestion.py
def download_raw_file(
    file_name: str = "",
    file_type: str = "raw",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "",
    file_download_directory: str = ".",
    gcp_bucket: storage.Bucket = None,
    debug: bool = False,
):
    """ENTRYPOINT FOR END-USERS
    Downloads a raw and idx file from NCEI for use on your workstation.
    Works as follows:
        1. Checks if raw file exists in GCP.
            a. If it exists,
                checks if a netcdf version also exists or not
                lets the user know.
                i. If `force_download_from_ncei` is True
                    downloads the raw and idx file from NCEI instead.
            b. If it doesn't exist,
                downloads .raw from NCEI and uploads to GCP for caching
                downloads .idx from NCEI and uploads to GCP for caching

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (do not include the dot ".").
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier. Defaults
            to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the file. Necessary due to
            the way the storage bucket is organized. Can be one of
            ["NCEI", "OMAO", "HDD"]. Defaults to "".
        file_download_directory (str, optional): The local file directory you
            want to store your file in. Defaults to current directory.
            Defaults to ".".
        gcp_bucket (storage.Client.bucket, optional): The GCP bucket object
            used to download the file. Defaults to None.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.
    """

    if gcp_bucket is None:
        _, _, gcp_bucket = utils.cloud_utils.setup_gcp_storage_objs()
    _, s3_resource, _ = utils.cloud_utils.create_s3_objs()

    rf = RawFile(
        file_name=file_name,
        file_type=file_type,
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        data_source=data_source,
        file_download_directory=file_download_directory,
        debug=debug,
        gcp_bucket=gcp_bucket,
        s3_resource=s3_resource,
    )

    if rf.raw_file_exists_in_gcp:
        # Inform user if file exists in GCP.
        print(
            f"INFO: FILE `{rf.raw_file_name}` ALREADY EXISTS IN"
            " GOOGLE STORAGE BUCKET."
        )
        # Here we download the raw file from GCP. We also check for a netcdf
        # version and let the user know.
        print("CHECKING FOR NETCDF VERSION...")
        if rf.netcdf_file_exists_in_gcp:
            # Inform the user if a netcdf version exists in cache.
            print(
                (
                    f"FILE `{rf.raw_file_name}` EXISTS AS A NETCDF ALREADY."
                    " PLEASE DOWNLOAD THE NETCDF VERSION IF NEEDED."
                )
            )
        else:
            print(
                (
                    f"FILE `{rf.raw_file_name}` DOES NOT EXIST AS NETCDF."
                    " CONSIDER RUNNING A CONVERSION FUNCTION"
                )
            )

        # Here we download the raw from GCP.
        print(
            (
                f"DOWNLOADING FILE `{rf.raw_file_name}` FROM GCP TO"
                f" `{rf.raw_file_download_path}`"
            )
        )
        utils.cloud_utils.download_file_from_gcp(
            gcp_bucket=rf.gcp_bucket,
            blob_file_path=rf.raw_gcp_storage_bucket_location,
            local_file_path=rf.raw_file_download_path,
            debug=rf.debug,
        )
        print("DOWNLOADED.")

    elif rf.raw_file_exists_in_ncei and (
        not rf.raw_file_exists_in_gcp
    ):  # File does not exist in gcp and needs to be downloaded from NCEI
        download_raw_file_from_ncei(
            file_name=rf.raw_file_name,
            file_type="raw",
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            data_source=rf.data_source,
            file_download_directory=rf.file_download_directory,
            upload_to_gcp=True,
            debug=rf.debug,
        )

    # Checking to make sure the idx exists in GCP...
    if rf.idx_file_exists_in_gcp:
        print("CORRESPONDING IDX FILE FOUND IN GCP. DOWNLOADING...")
        # Here we download the idx from GCP.
        print(
            (
                f"DOWNLOADING FILE `{rf.idx_file_name}` FROM GCP TO "
                f"`{rf.idx_file_download_path}`"
            )
        )
        utils.cloud_utils.download_file_from_gcp(
            gcp_bucket=rf.gcp_bucket,
            blob_file_path=rf.idx_gcp_storage_bucket_location,
            local_file_path=rf.idx_file_download_path,
            debug=rf.debug,
        )
        print("DOWNLOADED.")
    elif rf.idx_file_exists_in_ncei and (not rf.idx_file_exists_in_gcp):
        print(
            (
                "CORRESPONDING IDX FILE NOT FOUND IN GCP."
                " DOWNLOADING FROM NCEI AND UPLOADING TO GCP..."
            )
        )
        # Safely download and upload the idx file.
        download_single_file_from_aws(
            file_url=rf.idx_file_ncei_url,
            download_location=rf.idx_file_download_path,
        )
        # Upload to GCP at the correct storage bucket location.
        upload_file_to_gcp_storage_bucket(
            file_name=rf.idx_file_name,
            file_type="idx",
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            file_location=rf.idx_file_download_path,
            gcp_bucket=rf.gcp_bucket,
            data_source=rf.data_source,
            debug=rf.debug,
        )

    # Checking to make sure the bot exists in GCP...
    if rf.bot_file_exists_in_gcp:
        print("CORRESPONDING BOT FILE FOUND IN GCP. DOWNLOADING...")
        # Here we download the bot from GCP.
        print(
            (
                f"DOWNLOADING FILE `{rf.bot_file_name}` FROM GCP"
                f" TO `{rf.bot_file_download_path}`"
            )
        )
        utils.cloud_utils.download_file_from_gcp(
            gcp_bucket=rf.gcp_bucket,
            blob_file_path=rf.bot_gcp_storage_bucket_location,
            local_file_path=rf.bot_file_download_path,
            debug=rf.debug,
        )
        print("DOWNLOADED.")
    elif rf.bot_file_exists_in_ncei and (not rf.bot_file_exists_in_gcp):
        print(
            (
                "CORRESPONDING BOT FILE NOT FOUND IN GCP. TRYING TO "
                "DOWNLOAD FROM NCEI AND UPLOADING TO GCP..."
            )
        )
        # Safely download and upload the bot file.
        download_single_file_from_aws(
            file_url=rf.bot_file_ncei_url,
            download_location=rf.bot_file_download_path,
        )
        # Upload to GCP at the correct storage bucket location.
        upload_file_to_gcp_storage_bucket(
            file_name=rf.bot_file_name,
            file_type="bot",
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            file_location=rf.bot_file_download_path,
            gcp_bucket=rf.gcp_bucket,
            data_source=rf.data_source,
            debug=rf.debug,
        )

    return

download_raw_file_from_azure(file_name='', file_type='raw', ship_name='', survey_name='', echosounder='', data_source='OMAO', file_download_directory='.', config_file_path='', upload_to_gcp=False, debug=False)

ENTRYPOINT FOR END-USERS

Parameters:

Name Type Description Default
file_name str

The file name (includes extension). Defaults to "".

''
file_type str

The file type (do not include the dot "."). Defaults to "".

'raw'
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
data_source str

The source of the file. Necessary due to the way the storage bucket is organized. Can be one of ["NCEI", "OMAO", "HDD"]. Defaults to "".

'OMAO'
file_download_directory str

The local directory you want to store your file in. Defaults to current directory. Defaults to ".".

'.'
config_file_path str

The location of the config file. Needs a [DEFAULT] section with a azure_connection_string variable defined. Defaults to "".

''
upload_to_gcp bool

Whether or not you want to upload to GCP. Defaults to False.

False
debug bool

Whether or not to print debug statements. Defaults to False.

False
Source code in src\aalibrary\ingestion.py
def download_raw_file_from_azure(
    file_name: str = "",
    file_type: str = "raw",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "OMAO",
    file_download_directory: str = ".",
    config_file_path: str = "",
    upload_to_gcp: bool = False,
    debug: bool = False,
):
    """ENTRYPOINT FOR END-USERS

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (do not include the dot ".").
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the file. Necessary due to
            the way the storage bucket is organized. Can be one of
            ["NCEI", "OMAO", "HDD"]. Defaults to "".
        file_download_directory (str, optional): The local directory you want
            to store your file in. Defaults to current directory. Defaults
            to ".".
        config_file_path (str, optional): The location of the config file.
            Needs a `[DEFAULT]` section with a `azure_connection_string`
            variable defined. Defaults to "".
        upload_to_gcp (bool, optional): Whether or not you want to upload to
            GCP. Defaults to False.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.
    """
    # Create gcp bucket objects
    _, _, gcp_bucket = utils.cloud_utils.setup_gcp_storage_objs()
    try:
        _, s3_resource, _ = utils.cloud_utils.create_s3_objs()
    except Exception as e:
        logging.error("CANNOT ESTABLISH CONNECTION TO S3 BUCKET..\n %s", e)
        raise

    rf = RawFile(
        file_name=file_name,
        file_type=file_type,
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        data_source=data_source,
        file_download_directory=file_download_directory,
        is_metadata=False,
        upload_to_gcp=upload_to_gcp,
        debug=debug,
        gcp_bucket=gcp_bucket,
        s3_resource=s3_resource,
    )

    # Location of temporary file in sandbox environment.
    # https://contracttest4.blob.core.windows.net/testcontainer/Reuben_Lasker/RL_1601/EK_60/1601RL-D20160107-T074016.bot

    # Create Azure Directory Client
    azure_datalake_directory_client = get_data_lake_directory_client(
        config_file_path=config_file_path
    )

    # TODO: check to see if you want to download from gcp instead.

    # TODO: add if statement to check if the file exists in azure or not.
    print(f"DOWNLOADING FILE {rf.raw_file_name} FROM OMAO")
    download_file_from_azure_directory(
        directory_client=azure_datalake_directory_client,
        download_directory=rf.file_download_directory,
        file_path=rf.raw_omao_file_path,
    )

    # Force download the idx file.
    print(f"DOWNLOADING IDX FILE {rf.idx_file_name} FROM OMAO")
    download_file_from_azure_directory(
        directory_client=azure_datalake_directory_client,
        download_directory=rf.file_download_directory,
        file_path=rf.idx_omao_file_path,
    )

    # Force download the bot file.
    print(f"DOWNLOADING BOT FILE {rf.bot_file_name} FROM OMAO")
    download_file_from_azure_directory(
        directory_client=azure_datalake_directory_client,
        download_directory=rf.file_download_directory,
        file_path=rf.bot_omao_file_path,
    )

    if upload_to_gcp:
        if rf.raw_file_exists_in_gcp:
            print(
                (
                    "INFO: RAW FILE ALREADY EXISTS IN GCP AT "
                    f"`{rf.raw_gcp_storage_bucket_location}`"
                )
            )
        else:
            # TODO: try out a background process if possible -- file might
            # have a lock. only async options, otherwise subprocess gsutil to
            # upload it.
            # Upload raw to GCP at the correct storage bucket location.
            upload_file_to_gcp_storage_bucket(
                file_name=file_name,
                file_type=file_type,
                ship_name=ship_name,
                survey_name=survey_name,
                echosounder=echosounder,
                file_location=rf.raw_file_download_path,
                gcp_bucket=gcp_bucket,
                data_source=data_source,
                debug=debug,
            )
            # Upload the metadata file as well.
            metadata.create_and_upload_metadata_df_for_raw(
                rf=rf,
                debug=debug,
            )

        if rf.idx_file_exists_in_gcp:
            print(
                (
                    "INFO: IDX FILE ALREADY EXISTS IN GCP AT "
                    f"`{rf.idx_gcp_storage_bucket_location}`"
                )
            )
        else:
            # Upload idx to GCP at the correct storage bucket location.
            upload_file_to_gcp_storage_bucket(
                file_name=rf.idx_file_name,
                file_type=file_type,
                ship_name=ship_name,
                survey_name=survey_name,
                echosounder=echosounder,
                file_location=rf.idx_file_download_path,
                gcp_bucket=gcp_bucket,
                data_source=data_source,
                debug=debug,
            )

        if rf.bot_file_exists_in_gcp:
            print(
                (
                    "INFO: BOT FILE ALREADY EXISTS IN GCP AT"
                    f" `{rf.bot_gcp_storage_bucket_location}`"
                )
            )
        else:
            # Upload bot to GCP at the correct storage bucket location.
            upload_file_to_gcp_storage_bucket(
                file_name=rf.bot_file_name,
                file_type=file_type,
                ship_name=ship_name,
                survey_name=survey_name,
                echosounder=echosounder,
                file_location=rf.bot_file_download_path,
                gcp_bucket=gcp_bucket,
                data_source=data_source,
                debug=debug,
            )

        return

download_raw_file_from_ncei(file_name='', file_type='raw', ship_name='', survey_name='', echosounder='', data_source='NCEI', file_download_directory='.', upload_to_gcp=False, debug=False)

ENTRYPOINT FOR END-USERS Downloads a raw, idx, and bot file from NCEI. If upload_to_gcp is enabled, the downloaded files will also upload to the GCP storage bucket if they do not exist.

Parameters:

Name Type Description Default
file_name str

The file name (includes extension). Defaults to "".

''
file_type str

The file type (do not include the dot "."). Defaults to "".

'raw'
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
data_source str

The source of the file. Necessary due to the way the storage bucket is organized. Can be one of ["NCEI", "OMAO", "HDD"]. Defaults to "".

'NCEI'
file_download_directory str

The local file directory you want to store your file in. Defaults to current directory. Defaults to ".".

'.'
upload_to_gcp bool

Whether or not you want to upload to GCP. Defaults to False.

False
debug bool

Whether or not to print debug statements. Defaults to False.

False
Source code in src\aalibrary\ingestion.py
def download_raw_file_from_ncei(
    file_name: str = "",
    file_type: str = "raw",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "NCEI",
    file_download_directory: str = ".",
    upload_to_gcp: bool = False,
    debug: bool = False,
):
    """ENTRYPOINT FOR END-USERS
    Downloads a raw, idx, and bot file from NCEI. If `upload_to_gcp` is
    enabled, the downloaded files will also upload to the GCP storage bucket
    if they do not exist.

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (do not include the dot ".").
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the file. Necessary due to
            the way the storage bucket is organized. Can be one of
            ["NCEI", "OMAO", "HDD"]. Defaults to "".
        file_download_directory (str, optional): The local file directory you
            want to store your file in. Defaults to current directory.
            Defaults to ".".
        upload_to_gcp (bool, optional): Whether or not you want to upload to
            GCP. Defaults to False.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.
    """
    _, _, gcp_bucket = utils.cloud_utils.setup_gcp_storage_objs()
    try:
        _, s3_resource, _ = utils.cloud_utils.create_s3_objs()
    except Exception as e:
        logging.error("CANNOT ESTABLISH CONNECTION TO S3 BUCKET..\n%s", e)
        raise

    rf = RawFile(
        file_name=file_name,
        file_type=file_type,
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        data_source=data_source,
        file_download_directory=file_download_directory,
        upload_to_gcp=upload_to_gcp,
        debug=debug,
        gcp_bucket=gcp_bucket,
        s3_resource=s3_resource,
    )

    if rf.raw_file_exists_in_ncei:
        download_single_file_from_aws(
            file_url=rf.raw_file_ncei_url,
            download_location=rf.raw_file_download_path,
        )
    if rf.idx_file_exists_in_ncei:
        # Force download the idx file.
        download_single_file_from_aws(
            file_url=rf.idx_file_ncei_url,
            download_location=rf.idx_file_download_path,
        )
    if rf.bot_file_exists_in_ncei:
        # Force download the bot file.
        download_single_file_from_aws(
            file_url=rf.bot_file_ncei_url,
            download_location=rf.bot_file_download_path,
        )

    if upload_to_gcp:
        if rf.raw_file_exists_in_gcp:
            print(
                (
                    "INFO: RAW FILE ALREADY EXISTS IN GCP AT "
                    f"`{rf.raw_gcp_storage_bucket_location}`"
                )
            )
        else:
            # TODO: try out a background process if possible -- file might
            # have a lock. only async options, otherwise subprocess gsutil to
            # upload it.

            # Upload raw to GCP at the correct storage bucket location.
            upload_file_to_gcp_storage_bucket(
                file_name=rf.file_name,
                file_type="raw",
                ship_name=rf.ship_name,
                survey_name=rf.survey_name,
                echosounder=rf.echosounder,
                file_location=rf.raw_file_download_path,
                gcp_bucket=rf.gcp_bucket,
                data_source=rf.data_source,
                debug=rf.debug,
            )
            # Upload the metadata file as well.
            metadata.create_and_upload_metadata_df_for_raw(
                rf=rf,
                debug=rf.debug,
            )

        if rf.idx_file_exists_in_gcp:
            print(
                (
                    "INFO: IDX FILE ALREADY EXISTS IN GCP AT "
                    f"`{rf.idx_gcp_storage_bucket_location}`"
                )
            )
        elif rf.idx_file_exists_in_ncei and (not rf.idx_file_exists_in_gcp):
            # Upload idx to GCP at the correct storage bucket location.
            upload_file_to_gcp_storage_bucket(
                file_name=rf.idx_file_name,
                file_type="idx",
                ship_name=rf.ship_name,
                survey_name=rf.survey_name,
                echosounder=echosounder,
                file_location=rf.idx_file_download_path,
                gcp_bucket=rf.gcp_bucket,
                data_source=rf.data_source,
                is_metadata=False,
                debug=rf.debug,
            )

        if rf.bot_file_exists_in_gcp:
            print(
                (
                    "INFO: BOT FILE ALREADY EXISTS IN GCP AT "
                    f"`{rf.bot_gcp_storage_bucket_location}`"
                )
            )
        elif rf.bot_file_exists_in_ncei and (not rf.bot_file_exists_in_gcp):
            # Upload bot to GCP at the correct storage bucket location.
            upload_file_to_gcp_storage_bucket(
                file_name=rf.bot_file_name,
                file_type="bot",
                ship_name=rf.ship_name,
                survey_name=rf.survey_name,
                echosounder=rf.echosounder,
                file_location=rf.bot_file_download_path,
                gcp_bucket=rf.gcp_bucket,
                data_source=rf.data_source,
                is_metadata=False,
                debug=rf.debug,
            )

        return

download_specific_file_from_azure(config_file_path='', container_name='testcontainer', file_path_in_container='')

Creates a DataLakeFileClient and downloads a specific file from container_name.

Parameters:

Name Type Description Default
config_file_path str

The location of the config file. Needs a [DEFAULT] section with a azure_connection_string variable defined. Defaults to "".

''
container_name str

The container within Azure Data Lake you are trying to access. Defaults to "testcontainer".

'testcontainer'
file_path_in_container str

The file path of the file you would like downloaded. Defaults to "".

''
Source code in src\aalibrary\ingestion.py
def download_specific_file_from_azure(
    config_file_path: str = "",
    container_name: str = "testcontainer",
    file_path_in_container: str = "",
):
    """Creates a DataLakeFileClient and downloads a specific file from
    `container_name`.

    Args:
        config_file_path (str, optional): The location of the config file.
            Needs a `[DEFAULT]` section with a `azure_connection_string`
            variable defined. Defaults to "".
        container_name (str, optional): The container within Azure Data Lake
            you are trying to access. Defaults to "testcontainer".
        file_path_in_container (str, optional): The file path of the file you
            would like downloaded. Defaults to "".
    """

    conf = configparser.ConfigParser()
    conf.read(config_file_path)

    file = DataLakeFileClient.from_connection_string(
        conf["DEFAULT"]["azure_connection_string"],
        file_system_name=container_name,
        file_path=file_path_in_container,
    )

    file_name = file_path_in_container.split("/")[-1]

    with open(f"./{file_name}", "wb") as my_file:
        download = file.download_file()
        download.readinto(my_file)

download_survey_from_ncei(ship_name='', survey_name='', download_directory='', max_limit=None, debug=False)

Downloads an entire survey from NCEI to a local directory while maintaining folder structure.

Parameters:

Name Type Description Default
ship_name str

The ship name. Defaults to "".

''
survey_name str

The name of the survey you would like to download. Defaults to "".

''
download_directory str

The directory to which the files will be downloaded. Creates a directory in the cwd if not specified. Defaults to "". NOTE: The directory specified will have the ship_name/survey_name folders created within it.

''
max_limit int

The maximum number of random files to download. Defaults to include all files.

None
debug bool

Whether or not you want to print debug statements. Defaults to False.

False
Source code in src\aalibrary\ingestion.py
def download_survey_from_ncei(
    ship_name: str = "",
    survey_name: str = "",
    download_directory: str = "",
    max_limit: int = None,
    debug: bool = False,
):
    """Downloads an entire survey from NCEI to a local directory while
    maintaining folder structure.

    Args:
        ship_name (str, optional): The ship name. Defaults to "".
        survey_name (str, optional): The name of the survey you would like to
            download. Defaults to "".
        download_directory (str, optional): The directory to which the files
            will be downloaded. Creates a directory in the cwd if not
            specified. Defaults to "".
            NOTE: The directory specified will have the `ship_name/survey_name`
            folders created within it.
        max_limit (int, optional): The maximum number of random files to
            download.
            Defaults to include all files.
        debug (bool, optional): Whether or not you want to print debug
            statements. Defaults to False.
    """

    # User-error-checking
    # Normalize ship name to NCEI format
    if ship_name:
        ship_name = utils.ncei_utils.get_closest_ncei_formatted_ship_name(
            ship_name
        )

    if download_directory == "":
        # Create a directory in the cwd
        download_directory = os.sep.join(
            [os.path.normpath("./"), f"{ship_name}", f"{survey_name}"]
        )
    else:
        download_directory = os.sep.join(
            [
                os.path.normpath(download_directory),
                f"{ship_name}",
                f"{survey_name}",
            ]
        )
    # normalize the path
    download_directory = os.path.normpath(download_directory)

    # Create the directory if it doesn't exist.
    if not os.path.isdir(download_directory):
        os.makedirs(download_directory, exist_ok=True)
    print("CREATED DOWNLOAD DIRECTORY.")

    if debug:
        print(f"FORMATTED DOWNLOAD DIRECTORY: {download_directory}")

    # Get all s3 objects for the survey
    print(f"GETTING ALL S3 OBJECTS FOR SURVEY {survey_name}...", end="")
    _, s3_resource, _ = utils.cloud_utils.create_s3_objs()
    s3_objects = cloud_utils.list_all_objects_in_s3_bucket_location(
        prefix=f"data/raw/{ship_name}/{survey_name}/",
        s3_resource=s3_resource,
        return_full_paths=True,
    )
    print(f"FOUND {len(s3_objects)} FILES.")

    # Set the max limit if not specified or if greater than the number of
    # files.
    if max_limit is None or max_limit > len(s3_objects):
        max_limit = len(s3_objects)

    # Create all the subdirectories first
    print("CREATING SUBDIRECTORIES...", end="")
    subdirs = set()
    # Get the subfolders from object keys
    for s3_object in s3_objects:
        # Skip folders
        if s3_object.endswith("/"):
            continue
        # Get the subfolder structure from the object key
        subfolder_key = os.sep.join(
            s3_object.replace(
                f"data/raw/{ship_name}/{survey_name}/", ""
            ).split("/")[:-1]
        )
        subdirs.add(subfolder_key)
    for subdir in subdirs:
        os.makedirs(os.sep.join([download_directory, subdir]), exist_ok=True)
    print("SUBDIRECTORIES CREATED.")

    for _, object_key in enumerate(
        tqdm(s3_objects[:max_limit], desc="Downloading")
    ):
        # file_name = object_key.split("/")[-1]
        local_object_path = object_key.replace(
            f"data/raw/{ship_name}/{survey_name}/", ""
        )
        download_location = os.path.normpath(
            os.sep.join([download_directory, local_object_path])
        )
        download_single_file_from_aws(
            file_url=object_key, download_location=download_location
        )
    print(f"DOWNLOAD COMPLETE {os.path.abspath(download_directory)}.")

find_and_upload_survey_metadata_from_s3(ship_name='', survey_name='', gcp_bucket=None, debug=False)

Finds the metadata that is associated with a particular survey in s3, then uploads all of those files into the correct gcp location.

Parameters:

Name Type Description Default
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
gcp_bucket bucket

The GCP bucket object used to download the file. Defaults to None.

None
debug bool

Whether or not to print debug statements. Defaults to False.

False
Source code in src\aalibrary\ingestion.py
def find_and_upload_survey_metadata_from_s3(
    ship_name: str = "",
    survey_name: str = "",
    gcp_bucket: storage.Client.bucket = None,
    debug: bool = False,
):
    """Finds the metadata that is associated with a particular survey in s3,
    then uploads all of those files into the correct gcp location.

    Args:
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier. Defaults
            to "".
        gcp_bucket (storage.Client.bucket, optional): The GCP bucket object
            used to download the file. Defaults to None.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.
    """

    metadata_location_in_s3 = f"data/raw/{ship_name}/{survey_name}/metadata/"

    try:
        _, _, s3_bucket = utils.cloud_utils.create_s3_objs()
    except Exception as e:
        logging.error("CANNOT ESTABLISH CONNECTION TO S3 BUCKET..\n%s", e)
        raise

    num_metadata_objects = cloud_utils.count_objects_in_s3_bucket_location(
        prefix=metadata_location_in_s3, bucket=s3_bucket
    )

    if debug:
        logging.debug(
            "%d num_metadata_objects FOUND IN S3 FOR %s - %s",
            num_metadata_objects,
            ship_name,
            survey_name,
        )

    if num_metadata_objects >= 1:
        # Get object keys
        s3_objects = cloud_utils.list_all_objects_in_s3_bucket_location(
            prefix=metadata_location_in_s3, s3_resource=s3_bucket
        )
        # Download and upload each object
        for full_path, file_name in s3_objects:
            # Get the correct full file download location
            file_download_directory = os.sep.join(
                [os.path.normpath("./"), file_name]
            )
            # Download from aws
            download_single_file_from_aws(
                file_url=full_path, download_location=file_download_directory
            )
            # Upload to gcp
            upload_file_to_gcp_storage_bucket(
                file_name=file_name,
                ship_name=ship_name,
                survey_name=survey_name,
                file_location=file_download_directory,
                gcp_bucket=gcp_bucket,
                data_source="NCEI",
                is_metadata=False,
                is_survey_metadata=True,
                debug=debug,
            )
            # Remove local file (it's temporary)
            os.remove(file_download_directory)

find_data_source_for_file()

Finds the data source of a given filename by checking all possible data sources.

Source code in src\aalibrary\ingestion.py
def find_data_source_for_file():
    """Finds the data source of a given filename by checking all possible data
    sources."""

conversion

This file is used to store conversion functions for the AALibrary.

Functions:

Name Description
convert_local_raw_to_ices_netcdf

ENTRYPOINT FOR END-USERS

convert_local_raw_to_netcdf

ENTRYPOINT FOR END-USERS

convert_raw_to_netcdf

ENTRYPOINT FOR END-USERS

convert_raw_to_netcdf_ices

ENTRYPOINT FOR END-USERS

convert_local_raw_to_ices_netcdf(raw_file_location='', netcdf_file_download_directory='', echosounder='', delete_raw_after=False)

ENTRYPOINT FOR END-USERS Converts a local (on your computer) file from raw into netcdf using echopype.

Parameters:

Name Type Description Default
raw_file_location str

The location of the raw file. Defaults to "".

''
netcdf_file_download_directory str

The location you want to download your netcdf file to. Defaults to "".

''
echosounder str

The echosounder used. Can be one of ["EK80", "EK70"]. Defaults to "".

''
overwrite bool

Whether or not to overwrite the netcdf file. Defaults to False.

required
delete_raw_after bool

Whether or not to delete the raw file after conversion is complete. Defaults to False.

False
Source code in src\aalibrary\conversion.py
def convert_local_raw_to_ices_netcdf(
    raw_file_location: str = "",
    netcdf_file_download_directory: str = "",
    echosounder: str = "",
    delete_raw_after: bool = False,
):
    """ENTRYPOINT FOR END-USERS
    Converts a local (on your computer) file from raw into netcdf using
    echopype.

    Args:
        raw_file_location (str, optional): The location of the raw file.
            Defaults to "".
        netcdf_file_download_directory (str, optional): The location you want
            to download your netcdf file to. Defaults to "".
        echosounder (str, optional): The echosounder used. Can be one of
            ["EK80", "EK70"]. Defaults to "".
        overwrite (bool, optional): Whether or not to overwrite the netcdf
            file. Defaults to False.
        delete_raw_after (bool, optional): Whether or not to delete the raw
            file after conversion is complete. Defaults to False.
    """

    netcdf_file_download_directory = os.sep.join(
        [os.path.normpath(netcdf_file_download_directory)]
    )
    print(f"netcdf_file_download_directory {netcdf_file_download_directory}")

    # Create the download directory (path) if it doesn't exist
    if not os.path.exists(netcdf_file_download_directory):
        os.makedirs(netcdf_file_download_directory)

    # Make sure the echosounder specified matches the raw file data.
    if echosounder.lower() == "ek80":
        assert sonar_checker.is_EK80(
            raw_file=raw_file_location, storage_options={}
        ), (
            f"THE ECHOSOUNDER SPECIFIED `{echosounder}` DOES NOT MATCH THE "
            "ECHOSOUNDER FOUND WITHIN THE RAW FILE."
        )
    elif echosounder.lower() == "ek60":
        assert sonar_checker.is_EK60(
            raw_file=raw_file_location, storage_options={}
        ), (
            f"THE ECHOSOUNDER SPECIFIED `{echosounder}` DOES NOT MATCH THE "
            "ECHOSOUNDER FOUND WITHIN THE RAW FILE."
        )
    else:
        print(
            f"THE ECHOSOUNDER SPECIFIED `{echosounder}` IS NOT SUPPORTED FOR "
            "ICES NETCDF CONVERSION. PLEASE USE `EK80` OR `EK60`."
        )

    try:
        print("CONVERTING RAW TO NETCDF...")
        raw_file_echopype = open_raw(
            raw_file=raw_file_location, sonar_model=echosounder
        )
        if echosounder.lower() == "ek80":
            echopype_ek80_raw_to_ices_netcdf(
                echodata=raw_file_echopype,
                export_file=netcdf_file_download_directory,
            )
        elif echosounder.lower() == "ek60":
            echopype_ek60_raw_to_ices_netcdf(
                echodata=raw_file_echopype,
                export_file=netcdf_file_download_directory,
            )
        print("CONVERTED.")
        if delete_raw_after:
            try:
                print("DELETING RAW FILE...")
                os.remove(raw_file_location)
                print("DELETED.")
            except Exception as e:
                print(e)
                print(
                    "THE RAW FILE COULD NOT BE DELETED DUE TO THE ERROR ABOVE."
                )
    except Exception as e:
        logging.error(
            "COULD NOT CONVERT `%s` DUE TO ERROR %s", raw_file_location, e
        )
        raise e

convert_local_raw_to_netcdf(raw_file_location='', netcdf_file_download_directory='', echosounder='', overwrite=False, delete_raw_after=False)

ENTRYPOINT FOR END-USERS Converts a local (on your computer) file from raw into netcdf using echopype.

Parameters:

Name Type Description Default
raw_file_location str

The location of the raw file. Defaults to "".

''
netcdf_file_download_directory str

The location you want to download your netcdf file to. Defaults to "".

''
echosounder str

The echosounder used. Can be one of ["EK80", "EK70"]. Defaults to "".

''
overwrite bool

Whether or not to overwrite the netcdf file. Defaults to False.

False
delete_raw_after bool

Whether or not to delete the raw file after conversion is complete. Defaults to False.

False
Source code in src\aalibrary\conversion.py
def convert_local_raw_to_netcdf(
    raw_file_location: str = "",
    netcdf_file_download_directory: str = "",
    echosounder: str = "",
    overwrite: bool = False,
    delete_raw_after: bool = False,
):
    """ENTRYPOINT FOR END-USERS
    Converts a local (on your computer) file from raw into netcdf using
    echopype.

    Args:
        raw_file_location (str, optional): The location of the raw file.
            Defaults to "".
        netcdf_file_download_directory (str, optional): The location you want
            to download your netcdf file to. Defaults to "".
        echosounder (str, optional): The echosounder used. Can be one of
            ["EK80", "EK70"]. Defaults to "".
        overwrite (bool, optional): Whether or not to overwrite the netcdf
            file. Defaults to False.
        delete_raw_after (bool, optional): Whether or not to delete the raw
            file after conversion is complete. Defaults to False.
    """

    netcdf_file_download_directory = os.sep.join(
        [os.path.normpath(netcdf_file_download_directory)]
    )
    print(f"netcdf_file_download_directory {netcdf_file_download_directory}")

    # Create the download directory (path) if it doesn't exist
    if not os.path.exists(netcdf_file_download_directory):
        os.makedirs(netcdf_file_download_directory)

    # Make sure the echosounder specified matches the raw file data.
    if echosounder.lower() == "ek80":
        assert sonar_checker.is_EK80(
            raw_file=raw_file_location, storage_options={}
        ), (
            f"THE ECHOSOUNDER SPECIFIED `{echosounder}` DOES NOT MATCH THE "
            "ECHOSOUNDER FOUND WITHIN THE RAW FILE."
        )
    elif echosounder.lower() == "ek60":
        assert sonar_checker.is_EK60(
            raw_file=raw_file_location, storage_options={}
        ), (
            f"THE ECHOSOUNDER SPECIFIED `{echosounder}` DOES NOT MATCH THE "
            "ECHOSOUNDER FOUND WITHIN THE RAW FILE."
        )
    elif echosounder.lower() == "azfp6":
        assert sonar_checker.is_AZFP6(
            raw_file=raw_file_location, storage_options={}
        ), (
            f"THE ECHOSOUNDER SPECIFIED `{echosounder}` DOES NOT MATCH THE "
            "ECHOSOUNDER FOUND WITHIN THE RAW FILE."
        )
    elif echosounder.lower() == "azfp":
        assert sonar_checker.is_AZFP(
            raw_file=raw_file_location, storage_options={}
        ), (
            f"THE ECHOSOUNDER SPECIFIED `{echosounder}` DOES NOT MATCH THE "
            "ECHOSOUNDER FOUND WITHIN THE RAW FILE."
        )
    elif echosounder.lower() == "ad2cp":
        assert sonar_checker.is_AD2CP(
            raw_file=raw_file_location, storage_options={}
        ), (
            f"THE ECHOSOUNDER SPECIFIED `{echosounder}` DOES NOT MATCH THE "
            "ECHOSOUNDER FOUND WITHIN THE RAW FILE."
        )
    elif echosounder.lower() == "er60":
        assert sonar_checker.is_ER60(
            raw_file=raw_file_location, storage_options={}
        ), (
            f"THE ECHOSOUNDER SPECIFIED `{echosounder}` DOES NOT MATCH THE "
            "ECHOSOUNDER FOUND WITHIN THE RAW FILE."
        )

    try:
        print("CONVERTING RAW TO NETCDF...")
        raw_file_echopype = open_raw(
            raw_file=raw_file_location, sonar_model=echosounder
        )
        raw_file_echopype.to_netcdf(
            save_path=netcdf_file_download_directory, overwrite=overwrite
        )
        print("CONVERTED.")
        if delete_raw_after:
            try:
                print("DELETING RAW FILE...")
                os.remove(raw_file_location)
                print("DELETED.")
            except Exception as e:
                print(e)
                print(
                    "THE RAW FILE COULD NOT BE DELETED DUE TO THE ERROR ABOVE."
                )
    except Exception as e:
        logging.error(
            "COULD NOT CONVERT `%s` DUE TO ERROR %s", raw_file_location, e
        )
        raise e

convert_raw_to_netcdf(file_name='', file_type='raw', ship_name='', survey_name='', echosounder='', data_source='', file_download_directory='', overwrite=False, delete_raw_after=False, gcp_bucket=None, is_metadata=False, debug=False)

ENTRYPOINT FOR END-USERS This function allows one to convert a file from raw to netcdf. Then uploads the file to GCP storage for caching.

Parameters:

Name Type Description Default
file_name str

The file name (includes extension). Defaults to "".

''
file_type str

The file type (do not include the dot "."). Defaults to "".

'raw'
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
data_source str

The source of the file. Necessary due to the way the storage bucket is organized. Can be one of ["NCEI", "OMAO", "HDD"]. Defaults to "".

''
file_download_directory str

The local directory you want to store your file in. Defaults to "".

''
overwrite bool

Whether or not to overwrite the netcdf file. Defaults to False.

False
delete_raw_after bool

Whether or not to delete the raw file after conversion is complete. Defaults to False.

False
gcp_bucket bucket

The GCP bucket object used to download the file. Defaults to None.

None
is_metadata bool

Whether or not the file is a metadata file. Necessary since files that are considered metadata (metadata json, or readmes) are stored in a separate directory. Defaults to False.

False
debug bool

Whether or not to print debug statements. Defaults to False.

False
Source code in src\aalibrary\conversion.py
def convert_raw_to_netcdf(
    file_name: str = "",
    file_type: str = "raw",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "",
    file_download_directory: str = "",
    overwrite: bool = False,
    delete_raw_after: bool = False,
    gcp_bucket: storage.Client.bucket = None,
    is_metadata: bool = False,
    debug: bool = False,
):
    """ENTRYPOINT FOR END-USERS
    This function allows one to convert a file from raw to netcdf. Then uploads
    the file to GCP storage for caching.

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (do not include the dot ".").
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier. Defaults
            to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the file. Necessary due to
            the way the storage bucket is organized. Can be one of
            ["NCEI", "OMAO", "HDD"]. Defaults to "".
        file_download_directory (str, optional): The local directory you want
            to store your file in. Defaults to "".
        overwrite (bool, optional): Whether or not to overwrite the netcdf
            file. Defaults to False.
        delete_raw_after (bool, optional): Whether or not to delete the raw
            file after conversion is complete. Defaults to False.
        gcp_bucket (storage.Client.bucket, optional): The GCP bucket object
            used to download the file. Defaults to None.
        is_metadata (bool, optional): Whether or not the file is a metadata
            file. Necessary since files that are considered metadata (metadata
            json, or readmes) are stored in a separate directory. Defaults to
            False.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.
    """
    # TODO: Implement an 'upload' param default to True.

    _, _, gcp_bucket = utils.cloud_utils.setup_gcp_storage_objs()
    _, s3_resource, _ = utils.cloud_utils.create_s3_objs()

    rf = RawFile(
        file_name=file_name,
        file_type=file_type,
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        data_source=data_source,
        file_download_directory=file_download_directory,
        overwrite=overwrite,
        gcp_bucket=gcp_bucket,
        is_metadata=is_metadata,
        debug=debug,
        s3_resource=s3_resource,
        s3_bucket_name="noaa-wcsd-pds",
    )

    # Here we check for a netcdf version of the raw file on GCP
    print("CHECKING FOR NETCDF VERSION ON GCP...")
    if rf.netcdf_file_exists_in_gcp:
        # Inform the user if a netcdf version exists in cache.
        download_netcdf_file(
            raw_file_name=rf.netcdf_file_name,
            file_type="netcdf",
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            data_source=rf.data_source,
            file_download_directory=rf.file_download_directory,
            gcp_bucket=gcp_bucket,
            debug=rf.debug,
        )
    else:
        logging.info(
            "FILE `%s` DOES NOT EXIST AS NETCDF. DOWNLOADING/CONVERTING/"
            "UPLOADING RAW...",
            rf.raw_file_name,
        )

        # Download the raw file.
        # This function should take care of checking whether the raw file
        # exists in any of the data sources, and fetching it.
        download_raw_file(
            file_name=rf.file_name,
            file_type=rf.file_type,
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            data_source=rf.data_source,
            file_download_directory=rf.file_download_directory,
            debug=rf.debug,
        )

        # Convert the raw file to netcdf.
        convert_local_raw_to_netcdf(
            raw_file_location=rf.raw_file_download_path,
            netcdf_file_download_directory=rf.file_download_directory,
            echosounder=rf.echosounder,
            overwrite=overwrite,
            delete_raw_after=delete_raw_after,
        )

        # Upload the netcdf to the correct location for parsing.
        upload_file_to_gcp_storage_bucket(
            file_name=rf.netcdf_file_name,
            file_type="netcdf",
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            file_location=rf.netcdf_file_download_path,
            gcp_bucket=gcp_bucket,
            data_source=rf.data_source,
            is_metadata=False,
            debug=rf.debug,
        )
        # Upload the metadata file associated with this
        metadata.create_and_upload_metadata_df_for_netcdf(
            file_name=rf.netcdf_file_name,
            file_type="netcdf",
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            data_source=rf.data_source,
            gcp_bucket=gcp_bucket,
            netcdf_local_file_location=rf.netcdf_file_download_path,
            debug=debug,
        )

convert_raw_to_netcdf_ices(file_name='', file_type='raw', ship_name='', survey_name='', echosounder='', data_source='', file_download_directory='', overwrite=False, delete_raw_after=False, gcp_bucket=None, is_metadata=False, debug=False)

ENTRYPOINT FOR END-USERS This function allows one to convert a file from raw to netcdf. Then uploads the file to GCP storage for caching.

Parameters:

Name Type Description Default
file_name str

The file name (includes extension). Defaults to "".

''
file_type str

The file type (do not include the dot "."). Defaults to "".

'raw'
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
data_source str

The source of the file. Necessary due to the way the storage bucket is organized. Can be one of ["NCEI", "OMAO", "HDD"]. Defaults to "".

''
file_download_directory str

The local directory you want to store your file in. Defaults to "".

''
overwrite bool

Whether or not to overwrite the netcdf file. Defaults to False.

False
delete_raw_after bool

Whether or not to delete the raw file after conversion is complete. Defaults to False.

False
gcp_bucket bucket

The GCP bucket object used to download the file. Defaults to None.

None
is_metadata bool

Whether or not the file is a metadata file. Necessary since files that are considered metadata (metadata json, or readmes) are stored in a separate directory. Defaults to False.

False
debug bool

Whether or not to print debug statements. Defaults to False.

False
Source code in src\aalibrary\conversion.py
def convert_raw_to_netcdf_ices(
    file_name: str = "",
    file_type: str = "raw",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "",
    file_download_directory: str = "",
    overwrite: bool = False,
    delete_raw_after: bool = False,
    gcp_bucket: storage.Client.bucket = None,
    is_metadata: bool = False,
    debug: bool = False,
):
    """ENTRYPOINT FOR END-USERS
    This function allows one to convert a file from raw to netcdf. Then uploads
    the file to GCP storage for caching.

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (do not include the dot ".").
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier. Defaults
            to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the file. Necessary due to
            the way the storage bucket is organized. Can be one of
            ["NCEI", "OMAO", "HDD"]. Defaults to "".
        file_download_directory (str, optional): The local directory you want
            to store your file in. Defaults to "".
        overwrite (bool, optional): Whether or not to overwrite the netcdf
            file. Defaults to False.
        delete_raw_after (bool, optional): Whether or not to delete the raw
            file after conversion is complete. Defaults to False.
        gcp_bucket (storage.Client.bucket, optional): The GCP bucket object
            used to download the file. Defaults to None.
        is_metadata (bool, optional): Whether or not the file is a metadata
            file. Necessary since files that are considered metadata (metadata
            json, or readmes) are stored in a separate directory. Defaults to
            False.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.
    """

    _, _, gcp_bucket = utils.cloud_utils.setup_gcp_storage_objs()
    _, s3_resource, _ = utils.cloud_utils.create_s3_objs()

    rf = RawFile(
        file_name=file_name,
        file_type=file_type,
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        data_source=data_source,
        file_download_directory=file_download_directory,
        overwrite=overwrite,
        gcp_bucket=gcp_bucket,
        is_metadata=is_metadata,
        debug=debug,
        s3_resource=s3_resource,
        s3_bucket_name="noaa-wcsd-pds",
    )

    # Here we check for a netcdf version of the raw file on GCP
    print("CHECKING FOR NETCDF VERSION ON GCP...")
    if rf.netcdf_file_exists_in_gcp:
        # Inform the user if a netcdf version exists in cache.
        download_netcdf_file(
            raw_file_name=rf.netcdf_file_name,
            file_type="netcdf",
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            data_source=rf.data_source,
            file_download_directory=rf.file_download_directory,
            gcp_bucket=gcp_bucket,
            debug=rf.debug,
        )
    else:
        logging.info(
            "FILE `%s` DOES NOT EXIST AS NETCDF. DOWNLOADING/CONVERTING/"
            "UPLOADING RAW...",
            rf.raw_file_name,
        )

        # Download the raw file.
        # This function should take care of checking whether the raw file
        # exists in any of the data sources, and fetching it.
        download_raw_file(
            file_name=rf.file_name,
            file_type=rf.file_type,
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            data_source=rf.data_source,
            file_download_directory=rf.file_download_directory,
            debug=rf.debug,
        )

        # Convert the raw file to netcdf.
        convert_local_raw_to_ices_netcdf(
            raw_file_location=rf.raw_file_download_path,
            netcdf_file_download_directory=rf.file_download_directory,
            echosounder=rf.echosounder,
            delete_raw_after=delete_raw_after,
        )

        # Upload the netcdf to the correct location for parsing.
        upload_file_to_gcp_storage_bucket(
            file_name=rf.netcdf_file_name,
            file_type="netcdf",
            ship_name=rf.ship_name,
            survey_name=rf.survey_name,
            echosounder=rf.echosounder,
            file_location=rf.netcdf_file_download_path,
            gcp_bucket=gcp_bucket,
            data_source=rf.data_source,
            is_metadata=False,
            debug=rf.debug,
        )

metadata

This file contains functions that have to do with metadata.

Functions:

Name Description
create_and_upload_metadata_df_for_netcdf

Creates a metadata file with appropriate information for netcdf files.

create_and_upload_metadata_df_for_raw

Creates a metadata file with appropriate information. Then uploads it

create_metadata_json_for_netcdf_files

Creates a JSON object containing metadata for the current user.

create_metadata_json_for_raw_files

Creates a JSON object containing metadata for the current user.

get_current_gcp_user_email

Gets the current gcloud user's email.

get_metadata_in_df_format

Retrieves the metadata associated with all objects in GCP in DataFrame

upload_ncei_metadata_df_to_bigquery

Finds the metadata obtained from a survey on NCEI, and uploads it to the

create_and_upload_metadata_df_for_netcdf(rf=None, debug=False)

Creates a metadata file with appropriate information for netcdf files. Then uploads it to the correct table in GCP.

Parameters:

Name Type Description Default
rf RawFile

The RawFile object associated with this file. Defaults to None.

None
debug bool

Whether or not to print debug statements. Defaults to False.

False
Source code in src\aalibrary\metadata.py
def create_and_upload_metadata_df_for_netcdf(
    rf: RawFile = None,
    debug: bool = False,
):
    """Creates a metadata file with appropriate information for netcdf files.
    Then uploads it to the correct table in GCP.

    Args:
        rf (RawFile, optional): The RawFile object associated with this file.
            Defaults to None.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.
    """

    metadata_df = create_metadata_json_for_netcdf_files(
        rf=rf,
        debug=debug,
    )

    # Upload to GCP BigQuery
    metadata_df.to_gbq(
        destination_table="metadata.aalibrary_netcdf_metadata",
        project_id="ggn-nmfs-aa-dev-1",
        if_exists="append",
    )

    return

create_and_upload_metadata_df_for_raw(rf=None, debug=False)

Creates a metadata file with appropriate information. Then uploads it to the correct table in GCP. Used for .raw files.

Parameters:

Name Type Description Default
rf RawFile

The RawFile object associated with this file. Defaults to None.

None
debug bool

Whether or not to print debug statements. Defaults to False.

False
Source code in src\aalibrary\metadata.py
def create_and_upload_metadata_df_for_raw(
    rf: RawFile = None,
    debug: bool = False,
):
    """Creates a metadata file with appropriate information. Then uploads it
    to the correct table in GCP. Used for .raw files.

    Args:
        rf (RawFile, optional): The RawFile object associated with this file.
            Defaults to None.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.
    """

    # Create the metadata file to be uploaded.
    metadata_df = create_metadata_json_for_raw_files(
        rf=rf,
        debug=debug,
    )

    # Upload to GCP BigQuery
    metadata_df.to_gbq(
        destination_table="metadata.aalibrary_file_metadata",
        project_id="ggn-nmfs-aa-dev-1",
        if_exists="append",
    )

    return

create_metadata_json_for_netcdf_files(rf=None, debug=False)

Creates a JSON object containing metadata for the current user.

Parameters:

Name Type Description Default
rf RawFile

The RawFile object associated with this file. Defaults to None.

None
debug bool

Whether or not to print out the metadata json. Defaults to False.

False

Returns:

Type Description
DataFrame

pd.DataFrame: The metadata dataframe for the aalibrary_file_metadata database table.

Source code in src\aalibrary\metadata.py
def create_metadata_json_for_netcdf_files(
    rf: RawFile = None,
    debug: bool = False,
) -> pd.DataFrame:
    """Creates a JSON object containing metadata for the current user.

    Args:
        rf (RawFile, optional): The RawFile object associated with this file.
            Defaults to None.
        debug (bool, optional): Whether or not to print out the metadata json.
            Defaults to False.

    Returns:
        pd.DataFrame: The metadata dataframe for the `aalibrary_file_metadata`
            database table.
    """

    # Get the current user's email
    email = get_current_gcp_user_email()

    # get the survey datetime.
    file_datetime = datetime.strptime(
        rf.get_file_datetime_str(), "%Y-%m-%d %H:%M:%S"
    )

    # calculate the deletion datetime
    curr_datetime = datetime.now()
    deletion_datetime = curr_datetime + timedelta(days=90)
    deletion_datetime = deletion_datetime.strftime("%Y-%m-%d %H:%M:%S")

    metadata_json = {
        "FILE_NAME": rf.netcdf_file_name,
        "DATE_CREATED": datetime.now(timezone.utc).strftime(
            "%Y-%m-%d %H:%M:%S"
        ),
        "UPLOADED_BY": email,
        "ECHOPYPE_VERSION": echopype.__version__,
        "PYTHON_VERSION": sys.version.split(" ")[0],
        "NUMPY_VERSION": np.version.version,
        # maybe just add in echopype's reqs.
        # pip lock file - for current environment
        "NCEI_CRUISE_ID": rf.survey_name,
        "GCP_URI": rf.netcdf_gcp_storage_bucket_location,
        "FILE_DATETIME": file_datetime,
        "DELETION_DATETIME": deletion_datetime,
        "ICES_CODE": rf.ices_code,
    }

    aalibrary_metadata_df = pd.json_normalize(metadata_json)
    # make sure data types are conserved before upload to BigQuery.
    aalibrary_metadata_df["DATE_CREATED"] = pd.to_datetime(
        aalibrary_metadata_df["DATE_CREATED"], format="%Y-%m-%d %H:%M:%S"
    )
    aalibrary_metadata_df["FILE_DATETIME"] = pd.to_datetime(
        aalibrary_metadata_df["FILE_DATETIME"], format="%Y-%m-%d %H:%M:%S"
    )
    aalibrary_metadata_df["DELETION_DATETIME"] = pd.to_datetime(
        aalibrary_metadata_df["DELETION_DATETIME"], format="%Y-%m-%d %H:%M:%S"
    )

    if debug:
        print(aalibrary_metadata_df)
        logging.debug(aalibrary_metadata_df)

    return aalibrary_metadata_df

create_metadata_json_for_raw_files(rf=None, debug=False)

Creates a JSON object containing metadata for the current user.

Parameters:

Name Type Description Default
rf RawFile

The RawFile object associated with this file. Defaults to None.

None
debug bool

Whether or not to print out the metadata json. Defaults to False.

False

Returns:

Type Description
DataFrame

pd.DataFrame: The metadata dataframe for the aalibrary_file_metadata database table.

Source code in src\aalibrary\metadata.py
def create_metadata_json_for_raw_files(
    rf: RawFile = None,
    debug: bool = False,
) -> pd.DataFrame:
    """Creates a JSON object containing metadata for the current user.

    Args:
        rf (RawFile, optional): The RawFile object associated with this file.
            Defaults to None.
        debug (bool, optional): Whether or not to print out the metadata json.
            Defaults to False.

    Returns:
        pd.DataFrame: The metadata dataframe for the `aalibrary_file_metadata`
            database table.
    """
    # Get the current user's email
    email = get_current_gcp_user_email()

    # get the survey datetime.
    file_datetime = datetime.strptime(
        rf.get_file_datetime_str(), "%Y-%m-%d %H:%M:%S"
    )

    # calculate the deletion datetime
    curr_datetime = datetime.now()
    deletion_datetime = curr_datetime + timedelta(days=90)
    deletion_datetime = deletion_datetime.strftime("%Y-%m-%d %H:%M:%S")

    metadata_json = {
        "FILE_NAME": rf.raw_file_name,
        "DATE_CREATED": datetime.now(timezone.utc).strftime(
            "%Y-%m-%d %H:%M:%S"
        ),
        "UPLOADED_BY": email,
        "ECHOPYPE_VERSION": echopype.__version__,
        "PYTHON_VERSION": sys.version.split(" ")[0],
        "NUMPY_VERSION": np.version.version,
        # maybe just add in echopype's reqs.
        # pip lock file - for current environment
        "NCEI_CRUISE_ID": rf.survey_name,
        "NCEI_URI": rf.raw_file_s3_object_key,
        "GCP_URI": rf.raw_gcp_storage_bucket_location,
        "FILE_DATETIME": file_datetime,
        "DELETION_DATETIME": deletion_datetime,
        "ICES_CODE": rf.ices_code,
    }

    aalibrary_metadata_df = pd.json_normalize(metadata_json)
    # make sure data types are conserved before upload to BigQuery.
    aalibrary_metadata_df["DATE_CREATED"] = pd.to_datetime(
        aalibrary_metadata_df["DATE_CREATED"], format="%Y-%m-%d %H:%M:%S"
    )
    aalibrary_metadata_df["FILE_DATETIME"] = pd.to_datetime(
        aalibrary_metadata_df["FILE_DATETIME"], format="%Y-%m-%d %H:%M:%S"
    )
    aalibrary_metadata_df["DELETION_DATETIME"] = pd.to_datetime(
        aalibrary_metadata_df["DELETION_DATETIME"], format="%Y-%m-%d %H:%M:%S"
    )

    if debug:
        print(aalibrary_metadata_df)
        logging.debug(aalibrary_metadata_df)

    return aalibrary_metadata_df

get_current_gcp_user_email()

Gets the current gcloud user's email.

Returns:

Name Type Description
str str

A string containing the current gcloud user's email.

Source code in src\aalibrary\metadata.py
def get_current_gcp_user_email() -> str:
    """Gets the current gcloud user's email.

    Returns:
        str: A string containing the current gcloud user's email.
    """

    # Gets the current gcloud user's email
    get_curr_user_email_cmd = ["gcloud", "config", "get-value", "account"]
    if platform.system() == "Windows":
        email = subprocess.run(
            get_curr_user_email_cmd,
            shell=True,
            capture_output=True,
            text=True,
            check=False,
        ).stdout
    else:
        email = subprocess.run(
            get_curr_user_email_cmd,
            capture_output=True,
            text=True,
            check=False,
        ).stdout
    email = email.replace("\n", "")
    return email

get_metadata_in_df_format()

Retrieves the metadata associated with all objects in GCP in DataFrame format.

Source code in src\aalibrary\metadata.py
def get_metadata_in_df_format():
    """Retrieves the metadata associated with all objects in GCP in DataFrame
    format."""

upload_ncei_metadata_df_to_bigquery(ship_name='', survey_name='', download_location='', s3_bucket=None)

Finds the metadata obtained from a survey on NCEI, and uploads it to the ncei_cruise_metadata database table in bigquery. Also handles for extra database entries that are needed, such as uploading to the ncei_instrument_metadata when necessary.

Parameters:

Name Type Description Default
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
download_location str

The local download location for the file. Defaults to "".

''
s3_bucket resource

The bucket resource object. Defaults to None.

None
Source code in src\aalibrary\metadata.py
def upload_ncei_metadata_df_to_bigquery(
    ship_name: str = "",
    survey_name: str = "",
    download_location: str = "",
    s3_bucket: boto3.resource = None,
):
    """Finds the metadata obtained from a survey on NCEI, and uploads it to the
    `ncei_cruise_metadata` database table in bigquery. Also handles for extra
    database entries that are needed, such as uploading to the
    `ncei_instrument_metadata` when necessary.

    Args:
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        download_location (str, optional): The local download location for the
            file. Defaults to "".
        s3_bucket (boto3.resource, optional): The bucket resource object.
            Defaults to None.
    """

    # This var can either be a string with the file's location, or None.
    metadata_file_exists = check_if_tugboat_metadata_json_exists_in_survey(
        ship_name=ship_name, survey_name=survey_name, s3_bucket=s3_bucket
    )

    if metadata_file_exists:
        # TODO: Download all metadata files to local for download? Even
        # calibration files?
        # Handle for main metadata file for upload to BigQuery.
        s3_bucket.download_file(metadata_file_exists, download_location)
        # Subroutine to parse this file and upload to gcp.
        _parse_and_upload_ncei_survey_level_metadata(
            survey_name=survey_name, file_location=download_location
        )

queries

This script contains classes that have SQL queries used for interaction with the metadata database in BigQuery.

Classes:

Name Description
MetadataQueries

This class contains queries related to the upload, alteration, and

MetadataQueries dataclass

This class contains queries related to the upload, alteration, and retrieval of metadata from our BigQuery instance.

Source code in src\aalibrary\queries.py
@dataclass
class MetadataQueries:
    """This class contains queries related to the upload, alteration, and
    retrieval of metadata from our BigQuery instance.
    """

    get_all_aalibrary_metadata_records: str = """
    SELECT * FROM `ggn-nmfs-aa-dev-1.metadata.aalibrary_file_metadata`"""

    # TODO for mike ryan
    get_all_possible_ship_names_from_database: str = """
    SELECT ship_name from `ggn-nmfs-aa-dev-1.metadata.aalibrary_file_metadata`
    """

    def get_all_surveys_associated_with_a_ship_name(self, ship_name: str = ""):
        get_all_surveys_associated_with_a_ship_name_query: str = """"""
        return get_all_surveys_associated_with_a_ship_name_query

    def get_all_echosounders_used_in_a_survey(self, survey: str = ""): ...

    def get_all_netcdf_files_in_database(self): ...